Hi everyone, This is in reference to PR <a class="issue-link js-issue-link" data-error

I was also imagining it like this <a class="user-mention notranslate" data-hovercard-t

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Implementation of Tests? about examples HOT 22 CLOSED

mlpack commented on July 17, 2024 1

Implementation of Tests?

from examples.

Comments (22)

zoq commented on July 17, 2024 3

I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:
1. Models should be included as libraries / classes (sample can be seen in PR #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.

A unified interface would be useful, that said I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples; so not sure if CompileModel is needed we could just construct the model in the constructor, or I guess SaveModel is another function that is probably just a wrapper around data::Save, we don't necessarily have to replicate the interface from other libs, but I think that are just details we can discuss on the PR itself.

Also, I think it would be useful if we use the CLI implementation from mlpack, this would bring us one step further to use the bindings framework, so support for python, Julia, etc.

2. DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.

Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?

3. A unified samples or as @favre49 suggested an examples folder (Refer #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.

Not sure what that means, I think each model could serve as an example?

4. A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.

Sounds good.

5. All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.

Yes, we can do that.

6. Weights folder in models folder where weights of each models is stored.

Right, same as for data this can be stored at some location.

from examples.

zoq commented on July 17, 2024 2

There are a lot of things I want to do, fell a little back due to exams but it's ok.

I am in a similar situation right now.

No worries, best of luck with you exams.

from examples.

zoq commented on July 17, 2024 1

Good idea, I guess any easy way to test the models is to provide predefined weights for each model, let them train for a number of iterations and see if they return the correct results

from examples.

favre49 commented on July 17, 2024 1

You've pretty much read my mind as to the structure of the codebase. This would mean some restructuring of the code already on the repo, but I don't see any problem with this.

from examples.

prince776 commented on July 17, 2024 1

I was also imagining it like this @kartikdutt18 , I would love to help in this. I'll also add some new models like LeNet and VGG.

from examples.

prince776 commented on July 17, 2024 1

@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.

Ok sure. There are a lot of things I want to do, fell a little back due to exams but it's ok.

from examples.

zoq commented on July 17, 2024 1

Sounds good, I think it's easier to discuss changes if we have a minimal example.

from examples.

prince776 commented on July 17, 2024

I agree, if we have a models library(like keras.models) with and without pretrained weights then transfer learning will also become easier to implement. If you need any help, I'll be happy to help.

from examples.

kartikdutt18 commented on July 17, 2024

Great, If this is something that is worth implementing we can split up the task, I think I have a clear idea of implementation and it would end up requiring 3 changes and we could split them up so that we can get more done.

from examples.

favre49 commented on July 17, 2024

I think this ties into the more general question of how to structure this repository, because as it stands it isn't as nice and clean as our other repositories. In my opinion, we should decide on a clear file heirarchy before we continue.

Doesn't this also mean we should also change VAE? Also, for the Kaggle folder, we have "examples" instead of these extensible "models", so maybe we should be making an examples folder for cases like that? Maybe we could have the .cpp files for the models in "examples"?

from examples.

kartikdutt18 commented on July 17, 2024

I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:

Models should be included as libraries / classes (sample can be seen in PR #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.
DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.
A unified samples or as @favre49 suggested an examples folder (Refer #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.
A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.
All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.
Weights folder in models folder where weights of each models is stored.

Why it might be useful:

Shorter and cleaner code for user. (Load data using data loaders,import model and train).
Better for addition of models.
Refined UI.
Helpful for new users to easily train models.

This is also the API (Basic) that I was taking about in my GSOC proposal for ANN Algorithms on the mailing list.
Does this seem okay?

from examples.

kartikdutt18 commented on July 17, 2024

Great, This should be fun to work on. @zoq, what do you think?

from examples.

favre49 commented on July 17, 2024

Not sure what that means, I think each model could serve as an example?

What I was thinking for the examples folder is a set of programs that a user could run directly, using the models defined in the header files. This is also better in cases like the digit recognizer, where I don't see the point of making a model - it just serves as a popular basic example.

from examples.

kartikdutt18 commented on July 17, 2024

I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples

Agreed, It would be much cleaner and simpler that way.

Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?

Yes, the user could provide name of supported dataset (like mnist) or path to his own dataset (ReadME would have structure in which csv should be in).

Not sure what that means, I think each model could serve as an example?

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

from examples.

kartikdutt18 commented on July 17, 2024

@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.

from examples.

kartikdutt18 commented on July 17, 2024

There are a lot of things I want to do, fell a little back due to exams but it's ok.

I am in a similar situation right now.

from examples.

zoq commented on July 17, 2024

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?

from examples.

kartikdutt18 commented on July 17, 2024

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?

Agreed, This was also discussed in Video meetup yesterday, some other points that were mentioned are:

Ensure optional download of weight (testing is optional for user).
Same for datasets.

I'll start with one thing and take it from there. I think this would be great.
Thanks.

from examples.

kartikdutt18 commented on July 17, 2024

Hi everyone I changed the DigitRecognizerCNN a bit to support LeNet1, LeNet4 and LeNet5.

Here are some intial results for LeNet 4(Ran this on my local machine, will run more epochs on my college workstation.)

Epoch 1/150 loss: 10.2334
Epoch 2/150 loss: 1.66515
Epoch 3/150 loss: 0.953763
Validation Accuracy : 89.5476

I haven't added comments and there are some style issues. Once I fix them, I hope to open a PR for this issue tomorrow.

Also in the new script (without comments), it takes only 10 lines to train and test a model.
Thanks.

from examples.

mlpack-bot commented on July 17, 2024

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

from examples.

kartikdutt18 commented on July 17, 2024

Keep Open (At least till mlpack/models#3 is merged).

from examples.

kartikdutt18 commented on July 17, 2024

Tests have been implemented in models repo. Thanks everyone for all the help!

from examples.

Implementation of Tests? about examples HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent