Coder Social home page Coder Social logo

Implementation of Tests? about examples HOT 22 CLOSED

mlpack avatar mlpack commented on July 17, 2024 1
Implementation of Tests?

from examples.

Comments (22)

zoq avatar zoq commented on July 17, 2024 3

I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:

1. Models should be included as libraries / classes (sample can be seen in PR #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.

A unified interface would be useful, that said I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples; so not sure if CompileModel is needed we could just construct the model in the constructor, or I guess SaveModel is another function that is probably just a wrapper around data::Save, we don't necessarily have to replicate the interface from other libs, but I think that are just details we can discuss on the PR itself.

Also, I think it would be useful if we use the CLI implementation from mlpack, this would bring us one step further to use the bindings framework, so support for python, Julia, etc.

2. DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.

Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?

3. A unified samples or as @favre49 suggested an examples folder (Refer #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.

Not sure what that means, I think each model could serve as an example?

4. A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.

Sounds good.

5. All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.

Yes, we can do that.

6. Weights folder in models folder where weights of each models is stored.

Right, same as for data this can be stored at some location.

from examples.

zoq avatar zoq commented on July 17, 2024 2

There are a lot of things I want to do, fell a little back due to exams but it's ok.

I am in a similar situation right now.

No worries, best of luck with you exams.

from examples.

zoq avatar zoq commented on July 17, 2024 1

Good idea, I guess any easy way to test the models is to provide predefined weights for each model, let them train for a number of iterations and see if they return the correct results

from examples.

favre49 avatar favre49 commented on July 17, 2024 1

You've pretty much read my mind as to the structure of the codebase. This would mean some restructuring of the code already on the repo, but I don't see any problem with this.

from examples.

prince776 avatar prince776 commented on July 17, 2024 1

I was also imagining it like this @kartikdutt18 , I would love to help in this. I'll also add some new models like LeNet and VGG.

from examples.

prince776 avatar prince776 commented on July 17, 2024 1

@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.

Ok sure. There are a lot of things I want to do, fell a little back due to exams but it's ok.

from examples.

zoq avatar zoq commented on July 17, 2024 1

Sounds good, I think it's easier to discuss changes if we have a minimal example.

from examples.

prince776 avatar prince776 commented on July 17, 2024

I agree, if we have a models library(like keras.models) with and without pretrained weights then transfer learning will also become easier to implement. If you need any help, I'll be happy to help.

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

Great, If this is something that is worth implementing we can split up the task, I think I have a clear idea of implementation and it would end up requiring 3 changes and we could split them up so that we can get more done.

from examples.

favre49 avatar favre49 commented on July 17, 2024

I think this ties into the more general question of how to structure this repository, because as it stands it isn't as nice and clean as our other repositories. In my opinion, we should decide on a clear file heirarchy before we continue.

Doesn't this also mean we should also change VAE? Also, for the Kaggle folder, we have "examples" instead of these extensible "models", so maybe we should be making an examples folder for cases like that? Maybe we could have the .cpp files for the models in "examples"?

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:

  1. Models should be included as libraries / classes (sample can be seen in PR #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.

  2. DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.

  3. A unified samples or as @favre49 suggested an examples folder (Refer #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.

  4. A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.

  5. All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.

  6. Weights folder in models folder where weights of each models is stored.

Why it might be useful:

  1. Shorter and cleaner code for user. (Load data using data loaders,import model and train).

  2. Better for addition of models.

  3. Refined UI.

  4. Helpful for new users to easily train models.

This is also the API (Basic) that I was taking about in my GSOC proposal for ANN Algorithms on the mailing list.
Does this seem okay?

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

Great, This should be fun to work on. @zoq, what do you think?

from examples.

favre49 avatar favre49 commented on July 17, 2024

Not sure what that means, I think each model could serve as an example?

What I was thinking for the examples folder is a set of programs that a user could run directly, using the models defined in the header files. This is also better in cases like the digit recognizer, where I don't see the point of making a model - it just serves as a popular basic example.

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples

Agreed, It would be much cleaner and simpler that way.

Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?

Yes, the user could provide name of supported dataset (like mnist) or path to his own dataset (ReadME would have structure in which csv should be in).

Not sure what that means, I think each model could serve as an example?

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

There are a lot of things I want to do, fell a little back due to exams but it's ok.

I am in a similar situation right now.

from examples.

zoq avatar zoq commented on July 17, 2024

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?

Agreed, This was also discussed in Video meetup yesterday, some other points that were mentioned are:

  1. Ensure optional download of weight (testing is optional for user).
  2. Same for datasets.

I'll start with one thing and take it from there. I think this would be great.
Thanks.

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

Hi everyone I changed the DigitRecognizerCNN a bit to support LeNet1, LeNet4 and LeNet5.

Here are some intial results for LeNet 4(Ran this on my local machine, will run more epochs on my college workstation.)

Epoch 1/150 loss: 10.2334
Epoch 2/150 loss: 1.66515
Epoch 3/150 loss: 0.953763
Validation Accuracy : 89.5476

I haven't added comments and there are some style issues. Once I fix them, I hope to open a PR for this issue tomorrow.

Also in the new script (without comments), it takes only 10 lines to train and test a model.
Thanks.

from examples.

mlpack-bot avatar mlpack-bot commented on July 17, 2024

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

Keep Open (At least till mlpack/models#3 is merged).

from examples.

kartikdutt18 avatar kartikdutt18 commented on July 17, 2024

Tests have been implemented in models repo. Thanks everyone for all the help!

from examples.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.