Comments (22)
I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:1. Models should be included as libraries / classes (sample can be seen in PR #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.
A unified interface would be useful, that said I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples; so not sure if CompileModel
is needed we could just construct the model in the constructor, or I guess SaveModel
is another function that is probably just a wrapper around data::Save, we don't necessarily have to replicate the interface from other libs, but I think that are just details we can discuss on the PR itself.
Also, I think it would be useful if we use the CLI implementation from mlpack, this would bring us one step further to use the bindings framework, so support for python, Julia, etc.
2. DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.
Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?
3. A unified samples or as @favre49 suggested an examples folder (Refer #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.
Not sure what that means, I think each model could serve as an example?
4. A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.
Sounds good.
5. All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.
Yes, we can do that.
6. Weights folder in models folder where weights of each models is stored.
Right, same as for data this can be stored at some location.
from examples.
There are a lot of things I want to do, fell a little back due to exams but it's ok.
I am in a similar situation right now.
No worries, best of luck with you exams.
from examples.
Good idea, I guess any easy way to test the models is to provide predefined weights for each model, let them train for a number of iterations and see if they return the correct results
from examples.
You've pretty much read my mind as to the structure of the codebase. This would mean some restructuring of the code already on the repo, but I don't see any problem with this.
from examples.
I was also imagining it like this @kartikdutt18 , I would love to help in this. I'll also add some new models like LeNet and VGG.
from examples.
@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.
Ok sure. There are a lot of things I want to do, fell a little back due to exams but it's ok.
from examples.
Sounds good, I think it's easier to discuss changes if we have a minimal example.
from examples.
I agree, if we have a models library(like keras.models) with and without pretrained weights then transfer learning will also become easier to implement. If you need any help, I'll be happy to help.
from examples.
Great, If this is something that is worth implementing we can split up the task, I think I have a clear idea of implementation and it would end up requiring 3 changes and we could split them up so that we can get more done.
from examples.
I think this ties into the more general question of how to structure this repository, because as it stands it isn't as nice and clean as our other repositories. In my opinion, we should decide on a clear file heirarchy before we continue.
Doesn't this also mean we should also change VAE? Also, for the Kaggle folder, we have "examples" instead of these extensible "models", so maybe we should be making an examples folder for cases like that? Maybe we could have the .cpp files for the models in "examples"?
from examples.
I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:
-
Models should be included as libraries / classes (sample can be seen in PR #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.
-
DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.
-
A unified samples or as @favre49 suggested an examples folder (Refer #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.
-
A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.
-
All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.
-
Weights folder in models folder where weights of each models is stored.
Why it might be useful:
-
Shorter and cleaner code for user. (Load data using data loaders,import model and train).
-
Better for addition of models.
-
Refined UI.
-
Helpful for new users to easily train models.
This is also the API (Basic) that I was taking about in my GSOC proposal for ANN Algorithms on the mailing list.
Does this seem okay?
from examples.
Great, This should be fun to work on. @zoq, what do you think?
from examples.
Not sure what that means, I think each model could serve as an example?
What I was thinking for the examples folder is a set of programs that a user could run directly, using the models defined in the header files. This is also better in cases like the digit recognizer, where I don't see the point of making a model - it just serves as a popular basic example.
from examples.
I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples
Agreed, It would be much cleaner and simpler that way.
Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?
Yes, the user could provide name of supported dataset (like mnist) or path to his own dataset (ReadME would have structure in which csv should be in).
Not sure what that means, I think each model could serve as an example?
What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp
)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp
for this purpose.
What do you think, should I do this or one file for each model or something different?
from examples.
@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.
from examples.
There are a lot of things I want to do, fell a little back due to exams but it's ok.
I am in a similar situation right now.
from examples.
What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (
object_detection.cpp
)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp
for this purpose.
What do you think, should I do this or one file for each model or something different?
I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?
from examples.
What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (
object_detection.cpp
)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp
for this purpose.
What do you think, should I do this or one file for each model or something different?I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?
Agreed, This was also discussed in Video meetup yesterday, some other points that were mentioned are:
- Ensure optional download of weight (testing is optional for user).
- Same for datasets.
I'll start with one thing and take it from there. I think this would be great.
Thanks.
from examples.
Hi everyone I changed the DigitRecognizerCNN a bit to support LeNet1, LeNet4 and LeNet5.
Here are some intial results for LeNet 4(Ran this on my local machine, will run more epochs on my college workstation.)
Epoch 1/150 loss: 10.2334
Epoch 2/150 loss: 1.66515
Epoch 3/150 loss: 0.953763
Validation Accuracy : 89.5476
I haven't added comments and there are some style issues. Once I fix them, I hope to open a PR for this issue tomorrow.
Also in the new script (without comments), it takes only 10 lines to train and test a model.
Thanks.
from examples.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions!
from examples.
Keep Open (At least till mlpack/models#3 is merged).
from examples.
Tests have been implemented in models repo. Thanks everyone for all the help!
from examples.
Related Issues (20)
- MNIST CNN EXAMPLE NOT RUNNING PROPERLY HOT 19
- New Example Using YOLO HOT 10
- Error building the mnist_cnn and mnist_simple example HOT 5
- Guidelines for adding new examples HOT 12
- Fix lstm_stock_prediction dataset location HOT 3
- `mnist_batch_norm` fails with incorrect matrix multiply size HOT 7
- Problem in shebang line in "download_data_set.py" HOT 2
- Bug in RNN HOT 1
- exception: std::logic_error HOT 18
- Changing CNN weights HOT 3
- DBSCAN example datasets? HOT 3
- Put examples in folder by language HOT 14
- MNIST dataset HOT 1
- Exception in mnist_cnn.cpp HOT 5
- it is solved! HOT 1
- Set up CI workflow on Jenkins HOT 1
- Target class out of range when compiling and running mnist examples HOT 4
- Mlpack tutorials on NLP related projects HOT 3
- MNIST simple example fails training against mlpack 3.4.2 on Windows/MSVC Visual Studio 17 latest HOT 3
- Error compiling : RNN class : expected a type, got MeanSquaredError HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from examples.