mlpack / examples Goto Github PK

View Code? Open in Web Editor NEW

113.0 113.0 86.0 75.89 MB

Fully-working mlpack example programs

License: BSD 3-Clause "New" or "Revised" License

C++ 2.88% Makefile 0.16% Jupyter Notebook 96.56% Python 0.26% Shell 0.11% CMake 0.02%

examples's Introduction

a fast, header-only machine learning library

Home | Documentation | Community | Help | IRC Chat

Download: current stable version (4.4.0)

mlpack is an intuitive, fast, and flexible header-only C++ machine learning library with bindings to other languages. It is meant to be a machine learning analog to LAPACK, and aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers.

mlpack's lightweight C++ implementation makes it ideal for deployment, and it can also be used for interactive prototyping via C++ notebooks (these can be seen in action on mlpack's homepage).

In addition to its powerful C++ interface, mlpack also provides command-line programs, Python bindings, Julia bindings, Go bindings and R bindings.

Quick links:

Quickstart guides: C++, CLI, Python, R, Julia, Go
mlpack homepage
mlpack documentation
Examples repository
Tutorials
Development Site (Github)

mlpack uses an open governance model and is fiscally sponsored by NumFOCUS. Consider making a tax-deductible donation to help the project pay for developer time, professional services, travel, workshops, and a variety of other needs.

0. Contents

Citation details
Dependencies
Installing and using mlpack in C++
Building mlpack bindings to other languages
Building mlpack's test suite
Further resources

1. Citation details

If you use mlpack in your research or software, please cite mlpack using the citation below (given in BibTeX format):

@article{mlpack2023,
    title     = {mlpack 4: a fast, header-only C++ machine learning library},
    author    = {Ryan R. Curtin and Marcus Edel and Omar Shrit and 
                 Shubham Agrawal and Suryoday Basak and James J. Balamuta and 
                 Ryan Birmingham and Kartik Dutt and Dirk Eddelbuettel and 
                 Rishabh Garg and Shikhar Jaiswal and Aakash Kaushik and 
                 Sangyeon Kim and Anjishnu Mukherjee and Nanubala Gnana Sai and 
                 Nippun Sharma and Yashwant Singh Parihar and Roshan Swain and 
                 Conrad Sanderson},
    journal   = {Journal of Open Source Software},
    volume    = {8},
    number    = {82},
    pages     = {5026},
    year      = {2023},
    doi       = {10.21105/joss.05026},
    url       = {https://doi.org/10.21105/joss.05026}
}

Citations are beneficial for the growth and improvement of mlpack.

2. Dependencies

mlpack requires the following additional dependencies:

C++17 compiler
Armadillo >= 9.800
ensmallen >= 2.10.0
cereal >= 1.1.2

If the STB library headers are available, image loading support will be available.

If you are compiling Armadillo by hand, ensure that LAPACK and BLAS are enabled.

3. Installing and using mlpack in C++

See also the C++ quickstart.

Since mlpack is a header-only library, installing just the headers for use in a C++ application is trivial.

From the root of the sources, configure and install in the standard CMake way:

mkdir build && cd build/
cmake ..
sudo make install

If the cmake .. command fails due to unavailable dependencies, consider either using the -DDOWNLOAD_DEPENDENCIES=ON option as detailed in the following subsection, or ensure that mlpack's dependencies are installed, e.g. using the system package manager. For example, on Debian and Ubuntu, all relevant dependencies can be installed with sudo apt-get install libarmadillo-dev libensmallen-dev libcereal-dev libstb-dev g++ cmake.

Alternatively, since CMake v3.14.0 the cmake command can create the build folder itself, and so the above commands can be rewritten as follows:

cmake -S . -B build
sudo cmake --build build --target install

During configuration, CMake adjusts the file mlpack/config.hpp using the details of the local system. This file can be modified by hand as necessary before or after installation.

3.1. Additional build options

You can add a few arguments to the cmake command to control the behavior of the configuration and build process. Simply add these to the cmake command. Some options are given below:

-DDOWNLOAD_DEPENDENCIES=ON will automatically download mlpack's dependencies (ensmallen, Armadillo, and cereal). Installing Armadillo this way is not recommended and it is better to use your system package manager when possible (see below).
-DCMAKE_INSTALL_PREFIX=/install/root/ will set the root of the install directory to /install/root when make install is run.
-DDEBUG=ON will enable debugging symbols in any compiled bindings or tests.

There are also options to enable building bindings to each language that mlpack supports; those are detailed in the following sections.

Once headers are installed with make install, using mlpack in an application consists only of including it. So, your program should include mlpack:

#include <mlpack.hpp>

and when you link, be sure to link against Armadillo. If your example program is my_program.cpp, your compiler is GCC, and you would like to compile with OpenMP support (recommended) and optimizations, compile like this:

g++ -O3 -std=c++17 -o my_program my_program.cpp -larmadillo -fopenmp

Note that if you want to serialize (save or load) neural networks, you should add #define MLPACK_ENABLE_ANN_SERIALIZATION before including <mlpack.hpp>. If you don't define MLPACK_ENABLE_ANN_SERIALIZATION and your code serializes a neural network, a compilation error will occur.

See the C++ quickstart and the examples repository for some examples of mlpack applications in C++, with corresponding Makefiles.

3.1.a. Linking with autodownloaded Armadillo

When the autodownloader is used to download Armadillo (-DDOWNLOAD_DEPENDENCIES=ON), the Armadillo runtime library is not built and Armadillo must be used in header-only mode. The autodownloader also does not download dependencies of Armadillo such as OpenBLAS. For this reason, it is recommended to instead install Armadillo using your system package manager, which will also install the dependencies of Armadillo. For example, on Ubuntu and Debian systems, Armadillo can be installed with

sudo apt-get install libarmadillo-dev

and other package managers such as dnf and brew and pacman also have Armadillo packages available.

If the autodownloader is used to provide Armadillo, mlpack programs cannot be linked with -larmadillo. Instead, you must link directly with the dependencies of Armadillo. For example, on a system that has OpenBLAS available, compilation can be done like this:

g++ -O3 -std=c++17 -o my_program my_program.cpp -lopenblas -fopenmp

See the Armadillo documentation for more information on linking Armadillo programs.

3.2. Reducing compile time

mlpack is a template-heavy library, and if care is not used, compilation time of a project can be increased greatly. Fortunately, there are a number of ways to reduce compilation time:

Include individual headers, like <mlpack/methods/decision_tree.hpp>, if you are only using one component, instead of <mlpack.hpp>. This reduces the amount of work the compiler has to do.
Only use the MLPACK_ENABLE_ANN_SERIALIZATION definition if you are serializing neural networks in your code. When this define is enabled, compilation time will increase significantly, as the compiler must generate code for every possible type of layer. (The large amount of extra compilation overhead is why this is not enabled by default.)
If you are using mlpack in multiple .cpp files, consider using extern templates so that the compiler only instantiates each template once; add an explicit template instantiation for each mlpack template type you want to use in a .cpp file, and then use extern definitions elsewhere to let the compiler know it exists in a different file.

Other strategies exist too, such as precompiled headers, compiler options, ccache, and others.

4. Building mlpack bindings to other languages

mlpack is not just a header-only library: it also comes with bindings to a number of other languages, this allows flexible use of mlpack's efficient implementations from languages that aren't C++.

In general, you should not need to build these by hand---they should be provided by either your system package manager or your language's package manager.

Building the bindings for a particular language is done by calling cmake with different options; each example below shows how to configure an individual set of bindings, but it is of course possible to combine the options and build bindings for many languages at once.

4.i. Command-line programs

See also the command-line quickstart.

The command-line programs have no extra dependencies. The set of programs that will be compiled is detailed and documented on the command-line program documentation page.

From the root of the mlpack sources, run the following commands to build and install the command-line bindings:

mkdir build && cd build/
cmake -DBUILD_CLI_PROGRAMS=ON ../
make
sudo make install

You can use make -j<N>, where N is the number of cores on your machine, to build in parallel; e.g., make -j4 will use 4 cores to build.

4.ii. Python bindings

See also the Python quickstart.

mlpack's Python bindings are available on PyPI and conda-forge, and can be installed with either pip install mlpack or conda install -c conda-forge mlpack. These sources are recommended, as building the Python bindings by hand can be complex.

With that in mind, if you would still like to manually build the mlpack Python bindings, first make sure that the following Python packages are installed:

setuptools
wheel
cython >= 0.24
numpy
pandas >= 0.15.0

Now, from the root of the mlpack sources, run the following commands to build and install the Python bindings:

mkdir build && cd build/
cmake -DBUILD_PYTHON_BINDINGS=ON ../
make
sudo make install

You can use make -j<N>, where N is the number of cores on your machine, to build in parallel; e.g., make -j4 will use 4 cores to build. You can also specify a custom Python interpreter with the CMake option -DPYTHON_EXECUTABLE=/path/to/python.

4.iii. R bindings

See also the R quickstart.

mlpack's R bindings are available as the R package mlpack on CRAN. You can install the package by running install.packages('mlpack'), and this is the recommended way of getting mlpack in R.

If you still wish to build the R bindings by hand, first make sure the following dependencies are installed:

R >= 4.0
Rcpp >= 0.12.12
RcppArmadillo >= 0.9.800.0
RcppEnsmallen >= 0.2.10.0
roxygen2
testthat
pkgbuild

These can be installed with install.packages() inside of your R environment. Once the dependencies are available, you can configure mlpack and build the R bindings by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_R_BINDINGS=ON ../
make
sudo make install

You may need to specify the location of the R program in the cmake command with the option -DR_EXECUTABLE=/path/to/R.

Once the build is complete, a tarball can be found under the build directory in src/mlpack/bindings/R/, and then that can be installed into your R environment with a command like install.packages(mlpack_3.4.3.tar.gz, repos=NULL, type='source').

4.iv. Julia bindings

See also the Julia quickstart.

mlpack's Julia bindings are available by installing the mlpack.jl package using Pkg.add("mlpack.jl"). The process of building, packaging, and distributing mlpack's Julia bindings is very nontrivial, so it is recommended to simply use the version available in Pkg, but if you want to build the bindings by hand anyway, you can configure and build them by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_JULIA_BINDINGS=ON ../
make

If CMake cannot find your Julia installation, you can add -DJULIA_EXECUTABLE=/path/to/julia to the CMake configuration step.

Note that the make install step is not done above, since the Julia binding build system was not meant to be installed directly. Instead, to use handbuilt bindings (for instance, to test them), one option is to start Julia with JULIA_PROJECT set as an environment variable:

cd build/src/mlpack/bindings/julia/mlpack/
JULIA_PROJECT=$PWD julia

and then using mlpack should work.

4.v. Go bindings

See also the Go quickstart.

To build mlpack's Go bindings, ensure that Go >= 1.11.0 is installed, and that the Gonum package is available. You can use go get to install mlpack for Go:

go get -u -d mlpack.org/v1/mlpack
cd ${GOPATH}/src/mlpack.org/v1/mlpack
make install

The process of building the Go bindings by hand is a little tedious, so following the steps above is recommended. However, if you wish to build the Go bindings by hand anyway, you can do this by running the following commands from the root of the mlpack sources:

mkdir build && cd build/
cmake -DBUILD_GO_BINDINGS=ON ../
make
sudo make install

5. Building mlpack's test suite

mlpack contains an extensive test suite that exercises every part of the codebase. It is easy to build and run the tests with CMake and CTest, as below:

mkdir build && cd build/
cmake -DBUILD_TESTS=ON ../
make
ctest .

If you want to test the bindings, too, you will have to adapt the CMake configuration command to turn on the language bindings that you want to test---see the previous sections for details.

6. Further Resources

More documentation is available for both users and developers.

User documentation:

Tutorials:

Developer documentation:

To learn about the development goals of mlpack in the short- and medium-term future, see the vision document.

If you have problems, find a bug, or need help, you can try visiting the mlpack help page, or mlpack on Github. Alternately, mlpack help can be found on Matrix at #mlpack; see also the community page.

examples's People

Contributors

Stargazers

Watchers

Forkers

sidorov-ks sumedhghaisas rcurtin flying-sheep e-freiman yashsharan manthan-r-sheth s1998 akhandait shikharj rajatgarg97 zoq kevindarby mulx10 singhkislay soonmok sreenikss stjordanis johnsoncarl jeffin143 knakul853 kartikdutt18 ojhalakshya geekypathak21 shrit gaurav-singh1998 mrityunjay-tripathi favre49 codeboy5 joeljosephjin prince776 param-29 nishantkr18 pandeynandancse brightprogrammer bkmgit mingtzge abernauer aakash-kaushik rishabhgarg108 gauravsarkar heisenbuug steva44 allingo asll666 abilityguy zeph1yr drboo avikantsrivastava kaushal07wick psyduck1203 swaingotnochill davidportlouis jonpsy hello-fri-end kouroshd bolinjian wilsonify dnabanita7 eshaanagarwal minimaddy abendarag shubham1206agra tareknaser richelbilderbeek digitumdei rohanhbtu prateekupadhyay1997 kamil19 rodonguyen adarshsantoria jesseclin shalevy1 ooswald216 alaa-n-elsayed m-indn-sedta kklingeman shsym danijimmy19 xinyu-wu-0000 tunglinwood khanrayyan3622 ali-hossam markfischinger viswanathballa22 pdejamesbritton

examples's Issues

Adding New Models

Hey, I am thinking that we can add more models in mlpack model zoo. In GSoC 2019 it was stated that ResNet will be added but I can see that it has not been implemented yet. I was thinking of implementing SegNet
We can start by adding basic models for start. I want to help mlpack model zoo grow under GSoC 2020 and I want to start working on it. I think we can start by adding one model from each category.

Image Classification
Object Detection & Image Segmentation
Body, Face & Gesture Analysis
Image Manipulation

SegNet will come under Image Segmentation. I would like some help regarding the model selection(which model should be implemented) and discussion regarding that model.

Guidelines for adding new examples

So this issue gives out a template or a guide that can be followed or kept in mind when writing new examples:

The examples should be well documented in terms that can easily be understood by people entering the field of Machine learning or have been in it for a long time.
Make examples simple enough so the user can basically grasp what he needs to build his own full-fledged model and don't over-complicate the examples but still try to include all library functionalities related to that example.
A little bit more can be written about parameters or functionalities which are implemented differently in mlpack than what the common notion is or which might confuse the user.
When adding the comments and writing tutorials try to mention why you followed a particular strategy in the example and maybe mention what are the implications of the strategy that you took and what are the other ways a user can proceed with the example.
Some examples that show things that are particularly interesting or practical can take a divergence from these guidelines and can become a bit complicated, maybe an example of such could be GANs but not limited to it.
And please take a look at the example that exists before and try to avoid redundant examples.

And lastly, this is just a guide, not strict rules so have fun writing examples and show your creativity. :)

Spam classification example

Would like to add some spam classification examples using ML Pack. Would this be useful? Some related work here

Unknown CMake command "find_package_handle_standard_args"

Getting an error

Unknown CMake command "find_package_handle_standard_args"

when trying to use these models for finding MLPack by the following command: find_package(MLPACK REQUIRED). Tried the latest CMake 3.6

It seems that adding include(FindPackageHandleStandardArgs) into beginning of FindMLPACK.cmake file solves this issue. I could create a PR for this fix, but I don't have enough expertise in CMake to claim this is a proper solution. So, please confirm that this is ok, and I'll create a PR

Proposal for Adding New Examples to mlpack's Example Zoo

Objective

The objective of this proposal is to enhance mlpack's example repository by adding new examples that demonstrate the application of various machine learning algorithms across diverse real-world domains, as explained here.

Proposed Deliverables

This aims to add new examples to mlpack's repository, provide detailed documentation, integrate visualizations, and ensure regular updates and maintenance to ensure compatibility with the latest versions of mlpack and related dependencies.

Example Ideas

These examples demonstrate the potential of mlpack in various tasks, such as image recognition and others.

cc: @zoq

"expected ‘)’ before ‘>’ token" make error with ubuntu trusty (14)

This was encountered when trying to fix the travis Ci build in #45. It was resolved by updating ubuntu to xenial (16).
I've put the relevant travis config (and thus steps for replication) below, and the relevant make errors below that, just in case the direct link to the travis build doesn't work.

dist: trusty
language: cpp

before_install:
  - sudo apt-get update -qq
  - sudo apt-get install -qq --no-install-recommends cmake binutils-dev libopenblas-dev liblapack-dev build-essential libboost-all-dev
  - curl -O http://masterblaster.mlpack.org:5005/armadillo-8.400.0.tar.gz -o armadillo-8.400.0.tar.gz && tar xvzf armadillo-8.400.0.tar.gz && cd armadillo-8.400.0
  - cmake . && make && ARMADILLO_INCLUDE_DIR=$(pwd)/include
  - cd $TRAVIS_BUILD_DIR && git clone https://github.com/mlpack/mlpack.git --depth 1
  - cd mlpack && mkdir mlpack_build && cd mlpack_build && cmake -DUSE_OPENMP=OFF -DARMADILLO_INCLUDE_DIR=$ARMADILLO_INCLUDE_DIR -DBUILD_PYTHON_BINDINGS=OFF -DBUILD_TESTS=OFF .. && make -j2 && sudo make install

install:
  - cd $TRAVIS_BUILD_DIR && mkdir build && cd build && cmake -DUSE_OPENMP=OFF -DARMADILLO_INCLUDE_DIR=$ARMADILLO_INCLUDE_DIR ..
script:
  - make -j2

notifications:
  email:
    - [email protected]
  irc:
    channels:
      - "chat.freenode.net#mlpack"
    on_success: change
on_failure: always


[  7%] Building CXX object Kaggle/DigitRecognizer/CMakeFiles/DigitRecognizer.dir/src/DigitRecognizer.cpp.o

[ 14%] Building CXX object Kaggle/DigitRecognizerCNN/CMakeFiles/DigitRecognizerCNN.dir/src/DigitRecognizerCNN.cpp.o

In file included from /usr/local/include/mlpack/methods/ann/layer/layer.hpp:41:0,

                 from /home/travis/build/mlpack/models/Kaggle/DigitRecognizerCNN/src/DigitRecognizerCNN.cpp:19:

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:76: error: expected ‘)’ before ‘>’ token

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                                            ^

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:73: error: expected ‘;’ at end of member declaration

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                                         ^

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:76: error: expected unqualified-id before ‘>’ token

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                                            ^

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:61: error: template argument 1 is invalid

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                             ^

In file included from /usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:202:0,

                 from /usr/local/include/mlpack/methods/ann/layer/layer.hpp:41,

                 from /home/travis/build/mlpack/models/Kaggle/DigitRecognizerCNN/src/DigitRecognizerCNN.cpp:19:

/usr/local/include/mlpack/methods/ann/layer/weight_norm_impl.hpp:29:1: error: prototype for ‘mlpack::ann::WeightNorm<InputDataType, OutputDataType, CustomLayers>::WeightNorm(mlpack::ann::LayerTypes<CustomLayers ...>)’ does not match any in class ‘mlpack::ann::WeightNorm<InputDataType, OutputDataType, CustomLayers>’

 WeightNorm<InputDataType, OutputDataType, CustomLayers...>::WeightNorm(

 ^

In file included from /usr/local/include/mlpack/methods/ann/layer/layer.hpp:41:0,

                 from /home/travis/build/mlpack/models/Kaggle/DigitRecognizerCNN/src/DigitRecognizerCNN.cpp:19:

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:3: error: candidate is: mlpack::ann::WeightNorm<InputDataType, OutputDataType, CustomLayers>::WeightNorm(mlpack::ann::LayerTypes<CustomLayers ...>, ...)

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

   ^

In file included from /usr/local/include/mlpack/methods/ann/layer/layer.hpp:41:0,

                 from /home/travis/build/mlpack/models/Kaggle/DigitRecognizer/src/DigitRecognizer.cpp:19:

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:76: error: expected ‘)’ before ‘>’ token

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                                            ^

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:73: error: expected ‘;’ at end of member declaration

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                                         ^

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:76: error: expected unqualified-id before ‘>’ token

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                                            ^

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:61: error: template argument 1 is invalid

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

                                                             ^

In file included from /usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:202:0,

                 from /usr/local/include/mlpack/methods/ann/layer/layer.hpp:41,

                 from /home/travis/build/mlpack/models/Kaggle/DigitRecognizer/src/DigitRecognizer.cpp:19:

/usr/local/include/mlpack/methods/ann/layer/weight_norm_impl.hpp:29:1: error: prototype for ‘mlpack::ann::WeightNorm<InputDataType, OutputDataType, CustomLayers>::WeightNorm(mlpack::ann::LayerTypes<CustomLayers ...>)’ does not match any in class ‘mlpack::ann::WeightNorm<InputDataType, OutputDataType, CustomLayers>’

 WeightNorm<InputDataType, OutputDataType, CustomLayers...>::WeightNorm(

 ^

In file included from /usr/local/include/mlpack/methods/ann/layer/layer.hpp:41:0,

                 from /home/travis/build/mlpack/models/Kaggle/DigitRecognizer/src/DigitRecognizer.cpp:19:

/usr/local/include/mlpack/methods/ann/layer/weight_norm.hpp:68:3: error: candidate is: mlpack::ann::WeightNorm<InputDataType, OutputDataType, CustomLayers>::WeightNorm(mlpack::ann::LayerTypes<CustomLayers ...>, ...)

   WeightNorm(LayerTypes<CustomLayers...> layer = LayerTypes<CustomLayers...>());

   ^

make[2]: *** [Kaggle/DigitRecognizer/CMakeFiles/DigitRecognizer.dir/src/DigitRecognizer.cpp.o] Error 1

make[1]: *** [Kaggle/DigitRecognizer/CMakeFiles/DigitRecognizer.dir/all] Error 2

make[1]: *** Waiting for unfinished jobs....

make[2]: *** [Kaggle/DigitRecognizerCNN/CMakeFiles/DigitRecognizerCNN.dir/src/DigitRecognizerCNN.cpp.o] Error 1

make[1]: *** [Kaggle/DigitRecognizerCNN/CMakeFiles/DigitRecognizerCNN.dir/all] Error 2

make: *** [all] Error 2```

Interesting examples to add to this repo.

Hey everyone,

I am suggesting a list of examples that might be fun to add to this repo :

~~1. SRGAN and maybe ESRGAN.~~
2. DCGAN on CelebA(This is currently implemented as a test in mlpack).
3. Examples on text classification (I am currently working on this in my spare time (time aside from my WIP PRs)).

I hope that they will make nice additions to this repo. Let me know what you think.
Thanks a ton!

Implementation of Tests?

Hi everyone, This is in reference to PR #50, if models are added as libraries rather than cpp file so that they can be included in other cpp files, Rather than testing only on the system of the person who created the PR, does it makes sense to add a single pass on given dataset?
Another test would be loading of weights for the given model.
I am not sure if this needed but I think if changes are made in mlpack that might affect models here, a testing unit can help make the changes.
This will also prevent addition of models that have bugs. In the long run, this might also streamline addition of models.
I would love to hear your opinion.
Thanks.

Fix lstm_stock_prediction dataset location

If you try to make and run the lstm_stock_prediction example after downloading the dataset as per the README, it doesn't work:

$ ./lstm_stock_prediction 
Reading data ...
[FATAL] Cannot open file 'Google2016-2019.csv'. 

terminate called after throwing an instance of 'std::runtime_error'
  what():  fatal error; see Log::Fatal output
Aborted

It should be a pretty easy fix; I think the filename just needs to be fixed.

Mlpack tutorials on NLP related projects

There are only computer vision related examples in this repo. Improve the repo by adding examples related to NLP

exception: std::logic_error

Hi,
I'm trying to run the mnist_cnn example but I keep getting the following error:

Reading data ...
Start training ...

error: Mat::operator(): index out of bounds
terminate called after throwing an instance of 'std::logic_error'
what(): Mat::operator(): index out of bounds

Is it ever happen to any of you?

Thank you so much in advance for your help,
Fabio

Adding Better Documentation and Modifying ReadME.

Since this repository currently focuses more on tutorials and example like code for beginners to help get started. I think some changes that need to be made:

Generalized tutorial in each folder.
They need to be step by step format for anyone who reads the code so that one could apply the code here to his/her dataset. The tutorial should also explain in brief why some lines were added especially the shape change in dataset required for Time Series Forecasting.
Modifying readme to match the current aim of this repo. For more details refer #57.

I think it would make sense to add these documentation for the person who also takes up the task of simplifying the model.

Suggestion: Rather than one person adding all tutorial maybe we could each person could add documentation for each folder namely vae, mnist and LSTMs (taking this up).

README and usage details

Can more detail be provided on how to make use of the models in this library? I see it's still under development, but a quick description of the basic setup of everything would help beginners a lot.

Set up CI workflow on Jenkins

Since Travis no longer works for us, we should just set up a Jenkins job to run the script that Travis currently does.

Simplifying the Examples repository.

As per discussion in #61, The aim of this repo needs to be as simple as possible. The main aim might include :

Simplifying the code.
Maybe remove CMake, I think this would simplify the repo to the extent that one could user could copy the code, change the dataset path and change preprocessing a bit for his/her dataset and he should be good to go.
Looking forward to your response.

To Do:

Set Up mlpackbot and labels.
Update Readme for examples.
Update Readme for models.
Close #66
Simplify code in examples (partially solved in #55 and #56).
Simplify the repo by removing cmake completely.
Remove any examples that are too complex for this repository. ( I think VAE falls into this category). Or maybe we could simplify this too ?

Just a suggestion: For when creating documentation for datasets / models, would be a good idea to add images and tables.

Thanks a lot!!!

Error compiling : RNN class : expected a type, got MeanSquaredError

Error compiling under examples/lstm_stock_prediction

`mnist_batch_norm` fails with incorrect matrix multiply size

When you build the mnist_batch_norm example and run it after checking out the datasets as per the README, this is the output given:

$ ./mnist_batch_norm
Training ...
Epoch 1
5.12862 [====================================================================================================] 100% - ETA: 0s - loss: 5.12103
844/844 [====================================================================================================] 100% - 15s 18ms/step - loss: 5.12862
Validation loss: 27337.6.
...
Epoch 19
0.445508[====================================================================================================] 100% - ETA: 0s - loss: 0.444849
844/844 [====================================================================================================] 100% - 17s 20ms/step - loss: 0.445508
Validation loss: 11522.2.
Accuracy: train = 63.2407%, valid = 63.6106%
Predicting ...

error: matrix multiplication: incompatible matrix dimensions: 100x784 and 785x1
terminate called after throwing an instance of 'std::logic_error'
  what():  matrix multiplication: incompatible matrix dimensions: 100x784 and 785x1
Aborted

When this is fixed, .travis.yml can be updated to stop skipping this example.

error in setting mlpack local environment for jupyter

I am providing a picture of the error that I am getting, and I have just no clue why this error is occurring.

MNIST CNN EXAMPLE NOT RUNNING PROPERLY

https://github.com/mlpack/examples/blob/master/mnist_cnn/mnist_cnn.cpp

The above code gives error while executiing it! I figured out that the problem was due to mlpack::ann::NegativeLogLikelohood<> so I replaced it with mlpack::ann::MeanSquaredError. The code runs fine now but the model isn't learning anything! I tried changing the model, learning rate, normalizing the data. Nothing works... Sometimes the error reduces to a very small number like 0.00415 but the accuracy (both test and valid) is always low (around 9-11%). I am trying for about a week to get the model working properly! Please help!

Simplify `SaveResults()` in LSTM examples

This comes from the discussion here: mlpack/models#36 (comment)

The goal is to make the change to the code that I suggested, and ensure that the code still works properly (so we should compare the output before and after the change, and make sure there's not any issue).

Use ensmallen callbacks for LSTM examples

Right now, the optimizations in the LSTM examples manually run a single epoch at a time, in order to print the loss at the end of each epoch.

But now that we have ensmallen callbacks, this isn't necessary anymore, and we can significantly simplify our code.

The task is to go through the LSTM examples and make this change.

This comes from the discussion here: mlpack/models#36 (comment)

Adding Clang format support for this repository

Hello,

I have noticed that there was an issue and a pull request has been opened last year in mlpack main repository using clang for code formatting mlpack/mlpack#1203 mlpack/mlpack#1205.
Following the discussion, it seems that clang-format was not a good option, especially the integration issue with Jenkins (if I understood well).
However, It should be more comfortable for contributors and for reviewers if clang-format is in place, even if it is not integrated automatically with continuous integration systems. In fact, in most cases, there will be modifications requested by the reviewers from the contributors.
For example, if a contributor has missed some styles, a reviewer will ask him just to apply the clang-format existed in this repository, rather than losing energy from both sides to verify that all styles are well in place. I would like to have some feedback on this idea, for me, this repository and the model repository are well situated for clang-format experiment especially that the code base still small comparing to mlpack or ensmallen

DBSCAN example datasets?

hi,how can i get the file 'contact-tracing.csv'，i can not download from the url"https://lab.mlpack.org/data/contact-tracing.csv"

Binding Examples [Discussion]

I'm not confident if this is a good idea, but it would be interesting to have a simple example for each of the bindings. This could double as a more robust binding test, as well as a demonstration. It may, however, add unneeded complexity.
I'm not sure. Thoughts?

MNIST dataset

Maybe good to self-host the MNIST dataset. http://yann.lecun.com/exdb/mnist/ - suggests saving data rather than repeatedly downloading.

Addition of MobileNet and architecture to load certain datasets.

Hi I wanted to contribute to this organization by adding mobilenet (implementation similar to pytorch/tensorflow with attributes such as include top).
2. Secondly, I also wanted to add models folder where models can be stored as classes and a user can directly import them and train on their dataset.
3. A Samples folder where all models would have a test script where they can be trained directly.
4. A data-loader for standard datasets such as CIFAR 10, ImageNet (Currently only MNIST [From Kaggle] is available).
The implementation would result in model project to similar directory structure as other python ML/DL libraries where user could directly import a model, specify input size, number of classes, include top etc.) making using MLPack models very easy on any dataset.
To start off, I wanted to implement mobilenet, as it has a small architecture (less parameters) compared to other architectures such as ResNets and hence might be easier to train on CPU only.
Would it be okay if I pursue this issue?

Adding ZFNet

I am working on adding ZFNet to the mlpack model zoo !!
Reference:- https://cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf

MNIST simple example fails training against mlpack 3.4.2 on Windows/MSVC Visual Studio 17 latest

I installed mlpack 3.4.2 (also tried 3.4.1, same result) using the vcpkg port.

The mnist_simple example is the one before the mentioned API change in git log, obviously.

Here is the trace:

 	KernelBase.dll!00007ff9874a474c()	Unknown
 	vcruntime140d.dll!00007ff8e1f3b650()	Unknown
>	mlpt.exe!arma::arma_stop_bounds_error<char const *>(const char * const & x) Line 174	C++
 	mlpt.exe!arma::arma_check_bounds<char [37]>(const bool state, const char[37] & x) Line 503	C++
 	mlpt.exe!arma::Mat<double>::operator()(const unsigned __int64 in_row, const unsigned __int64 in_col) Line 6062	C++
 	mlpt.exe!mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>::Forward<arma::Mat<double>,arma::subview<double>>(const arma::Mat<double> & input, const arma::subview<double> & target) Line 42	C++
 	mlpt.exe!mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>>::EvaluateWithGradient<arma::Mat<double>>(const arma::Mat<double> & __formal, const unsigned __int64 begin, arma::Mat<double> & gradient, const unsigned __int64 batchSize) Line 368	C++
 	mlpt.exe!ens::AddSeparableEvaluateWithGradient<mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>>,arma::Mat<double>,arma::Mat<double>,1,1>::EvaluateWithGradient(const arma::Mat<double> & coordinates, const unsigned __int64 begin, arma::Mat<double> & gradient, const unsigned __int64 batchSize) Line 82	C++
 	mlpt.exe!ens::SGD<ens::AdamUpdate,ens::NoDecay>::Optimize<mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>>,arma::Mat<double>,arma::Mat<double>,ens::PrintLoss &,ens::ProgressBar &,ens::EarlyStopAtMinLossType<arma::Mat<double>> &,ens::StoreBestCoordinates<arma::Mat<double>> &>(mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>> & function, arma::Mat<double> & iterateIn, ens::PrintLoss & <callbacks_0>, ens::ProgressBar & <callbacks_1>, ens::EarlyStopAtMinLossType<arma::Mat<double>> & <callbacks_2>, ens::StoreBestCoordinates<arma::Mat<double>> & <callbacks_3>) Line 145	C++
 	mlpt.exe!ens::AdamType<ens::AdamUpdate>::Optimize<mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>>,arma::Mat<double>,arma::Mat<double>,ens::PrintLoss &,ens::ProgressBar &,ens::EarlyStopAtMinLossType<arma::Mat<double>> &,ens::StoreBestCoordinates<arma::Mat<double>> &>(mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>> & function, arma::Mat<double> & iterate, ens::PrintLoss & <callbacks_0>, ens::ProgressBar & <callbacks_1>, ens::EarlyStopAtMinLossType<arma::Mat<double>> & <callbacks_2>, ens::StoreBestCoordinates<arma::Mat<double>> & <callbacks_3>) Line 132	C++
 	mlpt.exe!ens::AdamType<ens::AdamUpdate>::Optimize<mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>>,arma::Mat<double>,ens::PrintLoss &,ens::ProgressBar &,ens::EarlyStopAtMinLossType<arma::Mat<double>> &,ens::StoreBestCoordinates<arma::Mat<double>> &>(mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>> & function, arma::Mat<double> & iterate, ens::PrintLoss & <callbacks_0>, ens::ProgressBar & <callbacks_1>, ens::EarlyStopAtMinLossType<arma::Mat<double>> & <callbacks_2>, ens::StoreBestCoordinates<arma::Mat<double>> & <callbacks_3>) Line 145	C++
 	mlpt.exe!mlpack::ann::FFN<mlpack::ann::NegativeLogLikelihood<arma::Mat<double>,arma::Mat<double>>,mlpack::ann::GlorotInitializationType<0>>::Train<ens::AdamType<ens::AdamUpdate>,ens::PrintLoss,ens::ProgressBar,ens::EarlyStopAtMinLossType<arma::Mat<double>>,ens::StoreBestCoordinates<arma::Mat<double>> &>(arma::Mat<double> predictors, arma::Mat<double> responses, ens::AdamType<ens::AdamUpdate> & optimizer, ens::PrintLoss && <callbacks_0>, ens::ProgressBar && <callbacks_1>, ens::EarlyStopAtMinLossType<arma::Mat<double>> && <callbacks_2>, ens::StoreBestCoordinates<arma::Mat<double>> & <callbacks_3>) Line 120	C++
 	mlpt.exe!mlpt::run() Line 114	C++
 	mlpt.exe!main(int argc, char * * argv) Line 179	C++
 	[External Code]

In Mat_meat.hpp there is an obviously wrong variable value in_row = 18446744073709551615:

//! element accessor; bounds checking not done when ARMA_NO_DEBUG is defined
template<typename eT>
arma_inline
arma_warn_unused
const eT&
Mat<eT>::operator() (const uword in_row, const uword in_col) const
  {
  arma_debug_check_bounds( ((in_row >= n_rows) || (in_col >= n_cols)), "Mat::operator(): index out of bounds" );
  
  return mem[in_row + in_col*n_rows];
  }

Would be great to have some Windows CI.

when I execute the project DigitRecognizerCNN I get an error

win8.1 x64
qt creator 5.12.0 , mingw 7.3.0

I copied the mlpack / models folder and indicated the paths to the models / Kaggle / kaggle_utils.hpp file
the program breaks inside the for loop (int i = 0; i <= CYCLES; i ++)

text error:
Сигнатура проблемы:
Имя события проблемы: APPCRASH
Имя приложения: mlpack.exe
Версия приложения: 0.0.0.0
Отметка времени приложения: 5dee23f7
Имя модуля с ошибкой: msvcrt.dll
Версия модуля с ошибкой: 7.0.9600.17415
Отметка времени модуля с ошибкой: 545055fe
Код исключения: c0000005
Смещение исключения: 000000000000188f
Версия ОС: 6.3.9600.2.0.0.768.101
Код языка: 1049
Дополнительные сведения 1: 200a
Дополнительные сведения 2: 200a18060e7de70916b37b7f9ae679ec
Дополнительные сведения 3: 645c
Дополнительные сведения 4: 645c95ed87e559442dba34b2b8b13721

please tell me how to fix it?

Bug in RNN

First, thank you to everyone for all of the great work!

In my testing of the RNN model, I seem to have found a bug. The issue is that when you call the predict function more than once, the results are different on the subsequent calls. This only happens after the first call. Every subsequent call results in a cube of predictions that is the same as the previous one. A simple example that demonstrates this is (taken from the predict electricity usage example):

#include <mlpack/core.hpp>
#include <mlpack/prereqs.hpp>
#include <mlpack/methods/ann/rnn.hpp>
#include <mlpack/methods/ann/layer/layer.hpp>
#include <mlpack/core/data/scaler_methods/min_max_scaler.hpp>
#include <mlpack/methods/ann/init_rules/he_init.hpp>
#include <mlpack/methods/ann/loss_functions/mean_squared_error.hpp>
#include <mlpack/core/data/split_data.hpp>
#include <ensmallen.hpp>

using namespace std;
using namespace mlpack;
using namespace mlpack::ann;
using namespace ens;

double MSE(arma::cube& pred, arma::cube& Y)
{
    return metric::SquaredEuclideanDistance::Evaluate(pred, Y) / (Y.n_elem);
}

template<typename InputDataType = arma::mat,
         typename DataType = arma::cube,
         typename LabelType = arma::cube>
void CreateTimeSeriesData(InputDataType dataset,
                          DataType& X,
                          LabelType& y,
                          const size_t rho)
{
    for (size_t i = 0; i < dataset.n_cols - rho; i++)
    {
        X.subcube(arma::span(), arma::span(i), arma::span()) =
                dataset.submat(arma::span(), arma::span(i, i + rho - 1));
        y.subcube(arma::span(), arma::span(i), arma::span()) =
                dataset.submat(arma::span(), arma::span(i + 1, i + rho));
    }
}

int main()
{
    // Change the names of these files and working directory as necessary
    const string dataFile = "/Users/steve/Desktop/Test/electricity-usage.csv";

    // Training data is randomly taken from the dataset in this ratio.
    const double RATIO = 0.1;

    // Step size of an optimizer.
    const double STEP_SIZE = 5e-5;

    // Number of data points in each iteration of SGD.
    const size_t BATCH_SIZE = 10;

    // Data has only one dimension.
    const size_t inputSize = 1;

    // We are predicting the next value, hence, the output is one dimensional.
    const size_t outputSize = 1;

    // Number of timesteps to look backwards in RNN.
    const size_t rho = 10;

    // Number of cells in the LSTM (hidden layers in standard terms)
    // NOTE: you may play with this variable in order to further optimize the
    // model.  (as more cells are added, accuracy is likely to go up, but training
    // time may take longer)
    const int H1 = 10;

    // Max rho for LSTM.
    const size_t maxRho = rho;

    arma::mat dataset;

    // In Armadillo rows represent features, columns represent data points.
    cout << "Reading data ..." << endl;
    data::Load(dataFile, dataset, true);

    // The CSV file has a header, so it is necessary to remove it. In Armadillo's
    // representation it is the first column.
    // The first column in the CSV is the date which is not required, therefore
    // removing it also (first row in in arma::mat).
    dataset = dataset.submat(1, 1, 1, dataset.n_cols - 1);

    // Split the dataset into training and validation sets.
    arma::mat trainData =
            dataset.submat(arma::span(), arma::span(0, (1 - RATIO) * dataset.n_cols));
    arma::mat testData = dataset.submat(
                arma::span(),
                arma::span((1 - RATIO) * dataset.n_cols, dataset.n_cols - 1));

    // Number of iterations per cycle.
    const int EPOCHS = 15;

    // Scale all data into the range (0, 1) for increased numerical stability.
    data::MinMaxScaler scale;
    // Fit scaler only on training data.
    scale.Fit(trainData);
    scale.Transform(trainData, trainData);
    scale.Transform(testData, testData);

    // We need to represent the input data for RNN in an arma::cube (3D matrix).
    // The 3rd dimension is the rho number of past data records the RNN uses for learning.
    arma::cube trainX, trainY, testX, testY;
    trainX.set_size(inputSize, trainData.n_cols - rho + 1, rho);
    trainY.set_size(outputSize, trainData.n_cols - rho + 1, rho);
    testX.set_size(inputSize, testData.n_cols - rho + 1, rho);
    testY.set_size(outputSize, testData.n_cols - rho + 1, rho);

    // Create training sets for one-step-ahead regression.
    CreateTimeSeriesData(trainData, trainX, trainY, rho);
    // Create test sets for one-step-ahead regression.
    CreateTimeSeriesData(testData, testX, testY, rho);

    // RNN regression model.
    RNN<MeanSquaredError<>, HeInitialization> model(rho);

    // Model building.
    model.Add<IdentityLayer<>>();
    model.Add<LSTM<>>(inputSize, H1, maxRho);
    model.Add<LeakyReLU<>>();
    model.Add<LSTM<>>(H1, H1, maxRho);
    model.Add<LeakyReLU<>>();
    model.Add<Linear<>>(H1, outputSize);

    // Set parameters for the Adam optimizer.
    ens::Adam optimizer(
                STEP_SIZE,  // Step size of the optimizer.
                BATCH_SIZE, // Batch size. Number of data points that are used in each iteration.
                0.9,        // Exponential decay rate for the first moment estimates.
                0.999,      // Exponential decay rate for the weighted infinity norm estimates.
                1e-8,       // Value used to initialise the mean squared gradient parameter.
                trainData.n_cols * EPOCHS, // Max number of iterations.
                1e-8,                      // Tolerance.
                true);

    // Instead of terminating based on the tolerance of the objective function,
    // we'll depend on the maximum number of iterations, and terminate early using the EarlyStopAtMinLoss callback.
    optimizer.Tolerance() = -1;

    cout << "Training ..." << endl;

    model.Train(trainX,
                trainY,
                optimizer,
                // PrintLoss Callback prints loss for each epoch.
                ens::PrintLoss(),
                // Progressbar Callback prints progress bar for each epoch.
                ens::ProgressBar(),
                // Stops the optimization process if the loss stops decreasing
                // or no improvement has been made. This will terminate the
                // optimization once we obtain a minima on training set.
                ens::EarlyStopAtMinLoss());

    cout << "Finished training." << endl;


    // NOTE: the code below is added in order to show how in a real application
    // the model would be saved, loaded and then used for prediction. Please note
    // that we do not have the last data point in testX because we did not use it
    // for the training, therefore the prediction result will be for the hour
    // before.  In your own application you may of course load any dataset.

    arma::cube predictions1;
    arma::cube predictions2;
    arma::cube predictions3;
    arma::cube predictions4;

    // Get predictions on the test data points.
    model.Predict(testX, predictions1);
    model.Predict(testX, predictions2);
    model.Predict(testX, predictions3);
    model.Predict(testX, predictions4);

    // Compare the predictions, they should be the same.
    auto res1 = approx_equal(predictions1, predictions2, "absdiff",1.0e-5);

    cout << "Are predictions 1 and 2 the same?"<< endl;

    if(res1)
        cout << "True"<< endl;
    else
        cout << "False"<< endl;

    auto res2 = approx_equal(predictions2, predictions3, "absdiff",1.0e-5);

    cout << "Are predictions 2 and 3 the same?"<< endl;

    if(res2)
        cout << "True"<< endl;
    else
        cout << "False"<< endl;

    auto res3 = approx_equal(predictions3, predictions4, "absdiff",1.0e-5);

    cout << "Are predictions 3 and 4 the same?"<< endl;

    if(res3)
        cout << "True"<< endl;
    else
        cout << "False"<< endl;

    // Calculate the MSE on the predictions.
    double testMSEP1 = MSE(predictions1, testY);
    cout << "Mean Squared Error on Prediction1 data points: " << testMSEP1 << endl;

    double testMSEP2 = MSE(predictions2, testY);
    cout << "Mean Squared Error on Prediction2 data points: " << testMSEP2 << endl;

    double testMSEP3 = MSE(predictions3, testY);
    cout << "Mean Squared Error on Prediction3 data points: " << testMSEP3 << endl;

    double testMSEP4 = MSE(predictions4, testY);
    cout << "Mean Squared Error on Prediction4 data points: " << testMSEP4 << endl;

    return 0;
}

Where the output is:

...
Finished training.
Are predictions 1 and 2 the same?
False
Are predictions 2 and 3 the same?
True
Are predictions 3 and 4 the same?
True
Mean Squared Error on Prediction1 data points: 0.010349
Mean Squared Error on Prediction2 data points: 0.0103513
Mean Squared Error on Prediction3 data points: 0.0103513
Mean Squared Error on Prediction4 data points: 0.0103513

The expected result is that all of the prediction cubes would be the same. However, the first call to predict seems to be modifying the model in some way. Perhaps this has something to do with issue #2713. Thanks again!

mnist_vae adaptation with mlpack 4 or above version

mnist_vae is not working with mlpack version 4 or above, in the make file it has been seen that this example does not currently build!"
@echo "The Reparametrization and TransposedConvolution layers have not "
@echo "yet been adapted to the mlpack 4 layer style. See"
@echo "mlpack/mlpack#2777 for more information."
is there any plan to update it ?

Put examples in folder by language

Current readme indicates examples should be put in folders by programming language and/or environment. Do this and ensure that any paths are adjusted appropriately.

Exception in mnist_cnn.cpp

I build mlpack with VS2019 x64 for Windows. I also created the needed CSV file with the data with the help of the python script from the tools directory.

I tried to run the "mnist_cnn.cpp" from this repo. Compiling works fine, but during the training an exception is thrown: "error: Mat::operator(): index out of bounds".

This is the call stack until the exception occurs:

Best regards

Changing CNN weights

Hi,
I would like to change the weights of the FFN model used for the MNIST example.
I practically would like to manually set all the weights of the network (I know that it sounds strange...).

Can you guys please help me to do that?

Thank you very much in advance for your help,
Fabio

it is solved！

VAE Model doesn't work.

While working on #83, I had to test each file to see if the changes made are working fine. This is where I ran into an error :

Run the make file to generate executable for mnist_vae_cnn.cpp. When you run the executable you will receive the following error on the master branch :

[FATAL] The output width / output height is not possible given the other parameters of the layer.

I can try opening a PR that fixes this however I am trying to close some of my open PRs so if anyone gets to it before me that's great. Thanks.

Only fit `MinMaxScaler` on training data in LSTM examples

This comes from mlpack/models#36 (comment).

Right now, the LSTM examples use MinMaxScaler to scale the data. But they use all of the data to fit the MinMaxScaler, which is actually a leakage of test data into the training process, and is not good data science practice. :)

So, it should be a simple change to ensure that MinMaxScaler's Fit() method is only called with the training data.

New Example Using YOLO

Hey, as YOLO has been already added to the model repo I was thinking I can make a Face Mask Detector using YOLO. Shall I start working on it? It will be a good example for year 2020.

Continuous integration

To make sure the code builds it would be great if we could set up some continuous integration services:

Set up Travis
Set up Appveyor

Realistically we can adapt (remove the test execution/build step) https://github.com/mlpack/mlpack/blob/master/.appveyor.yml and https://github.com/mlpack/mlpack/blob/master/.travis.yml for this repository.

At the end if

mkdir build
cmake ..
make

works this should be fine for now.

Cannot open file in Release mode

Hi!

I'm trying the DigitRecognizer example code on MSVS 2017. I fed the program the training dataset path as "C:\... ...\train.csv" and I could run it in Debug mode, x64.
Unfortunately when I switch to Release mode I get the following run-time error:
Unhandled exception at 0x00007FFA6920A388 in for_mlpack.exe: Microsoft C++ exception: std::runtime_error at memory location 0x00000071375FE088.

and from the windows terminal I get

Reading data ...
[FATAL] Cannot open file '        �      '.

the call stack is

>	mlpack.dll!mlpack::util::PrefixedOutStream::BaseLogic<std::basic_ostream<char,std::char_traits<char> > & (__cdecl*)(std::basic_ostream<char,std::char_traits<char> > &)>(std::basic_ostream<char,std::char_traits<char> > &(*)(std::basic_ostream<char,std::char_traits<char> > &) & val) Line 140	C++
 	mlpack.dll!mlpack::util::PrefixedOutStream::operator<<(std::basic_ostream<char,std::char_traits<char> > &(*)(std::basic_ostream<char,std::char_traits<char> > &) pf) Line 113	C++
 	mlpack.dll!mlpack::data::Load<double>(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & filename, arma::Mat<double> & matrix, const bool fatal, const bool transpose) Line 103	C++

I used the code file DigitRecognizer.cpp as I found it here, with no modification.
Could anyone help me??

Revamp CMake configuration to "modern" CMake

This issue came about as a result of #2107, which disables Boost's CMake configuration scripts because mlpack's CMake configuration is written in "old-style" CMake.

assignees myself

Target class out of range when compiling and running mnist examples

I downloaded mlpack 3.4.2, copied the mnist_simple example and downloaded the mnist_training and mnist_test csv files from Kaggle as the example illustrates. The data reads fine but when the training starts I get a std::runtime_error stating [DEBUG] Target class out of range.

Is there something new in mlpack that is causing this? Am I doing something wrong even though I just copied and pasted the example?

Additional Model Application Examples

I wanted to ask if the GSOC idea of implementing the ANN applications is to be done in this repo?

Switch to Internal split for LSTM examples.

LSTM examples requires data that is not shuffled. Currently mlpack's internal data split class doesn't allow us to split data without shuffling yet. After mlpack/mlpack#2293 is merged we should be able to do that. This issue aims to make the change from the current implementation in LSTM examples to internal split.
Let me know if I need to clarify anything.
Thanks a lot.

Transition to ensmallen instead of `mlpack/core/optimizers/`

Some of the models still use mlpack optimizers when instead they should be using ensmallen optimizers. To fix this we should:

Make the CMake configuration for this project search for ensmallen (and download if needed). We can just use the existing mlpack CMake/FindEnsmallen.cmake for this.
Instead of including things from mlpack/core/optimizers/... we should just include ensmallen.hpp.
Adapt the code to use ens::<OptimizerName> not mlpack::optimization::<OptimizerName>.

What is this repository for?

I've opened this issue to follow up on the discussion that we had in the video chat last week (CC: @shrit, @kartikdutt18). This kind of comes out of some comments I made on #55 and in some other places, where I thought this repository was a place to collect example implementations that people could base their own applications off of. However, I don't think that's necessarily what we have to make this repository focused on, and it seemed like there was a diversity of opinions on how to structure this repository.

In essence @shrit and @kartikdutt18 pointed out that there are people who might like to directly use the models in this repository off-the-shelf for their data. This would be why the use of the CLI framework would make sense here; however, a drawback is that that makes the actual code a little less easy to understand for users who just want a minimum working example that they can adapt.

Thus we have (at least) two kinds of users:

Folks who want to read the code here, understand it, and copy-paste it into their own applications. For them, this repository is kind of the equivalent of a collection of examples in the genre of the Hands-On Machine Learning notebooks (i.e. https://github.com/ageron/handson-ml/blob/master/01_the_machine_learning_landscape.ipynb) and the Keras examples directory (https://github.com/keras-team/keras/tree/master/examples). (There are lots of other repositories in this type of vein.)
Folks who don't want to read the code but directly use the model types that are available. Actually I think that this set of users might be closer to the original intention of the repository. For these users it would be awesome to have command-line programs that, e.g., train a model with a specific architecture, or download pretrained weights to make predictions, etc.

So, I opened this issue so that we can (a) work out how to best serve these types of users and (b) list any more types of users that we reasonably need to consider. :)

I'll also throw a proposal out there, and we can refine it and modify it.

We can handle the first class of users by either creating a separate "examples" repository that has extremely simple examples, or, by adding an examples/ directory to this repository. (Or even the main mlpack repository??) This could contain some simple workflow examples like the LSTM examples, and even examples for non-neural network models, like the ones that are currently contained in the various tutorials that we have. Examples could be .cpp files, but also .sh/.py/.jl files that demonstrate usage of the mlpack bindings, for instance.
We can handle the second class of users by turning this repository into a collection of specific bindings, in the same style as src/mlpack/methods/. Each directory can contain a model type. We might need some additional extra code to support downloading models, or something else. Models could easily be hosted on mlpack.org, as we currently aren't anywhere close to our maximum bandwidth costs. (That may eventually become more of a problem, but we can handle that when we get there.) Each binding can use the CLI system to handle input and output parameters, and we can use the CMake configuration ideas from the main mlpack repository to allow building of Python and Julia bindings, not just command-line programs. That way we could, e.g., provide turnkey models to other languages. (How we deploy those models and make them available is a separate issue, but shouldn't be too hard.)

That's just an idea---I'm not necessarily married to it. If others have other ideas, please feel free to speak up! Honestly speaking, I don't really have the time to structure this repository in the way that we decide or maintain it thoroughly, so I don't want anyone to feel like I'm forcing an idea that I won't be around to see through. :)

CMakeLists.txt removed but no Makefile added?

Hi Team,
I'm trying to add few more examples but still having problems compiling current examples on my machine, blocked by:

 undefined reference to `typeinfo for boost::archive::archive_exception'

in linking phase, when compiling via

g++ -std=c++11 -I/home/pmixer/Downloads/gsoc/mlpack/build/include -I/home/pmixer/Downloads/gsoc/ensmallen/include -L/home/pmixer/Downloads/gsoc/mlpack/build/lib -L/usr/lib/x86_64-linux-gnu -o simple -lmlpack -llapack -lboost_system -lboost_filesystem -lboost_context -lboost_serialization -fopenmp ./mnist_simple.cpp

is it a boost version related issue?(used 1.6.5 as default option on ubuntu 18.04.4

Regards,
Zan

Error building the mnist_cnn and mnist_simple example

When i try to build the mnist_cnn.cpp and the mnist_simple.cpp i get the following error which is same for both files.
the version of mlpack i have is: 3.2.2-3

mnist_cnn.cpp: In function ‘int main()’:
mnist_cnn.cpp:189:20: error: no matching function for call to ‘ens::EarlyStopAtMinLoss::EarlyStopAtMinLoss(main()::<lambda(const mat&)>)’
  189 |                   }));
      |                    ^
In file included from /usr/include/ensmallen.hpp:67,
                 from /usr/include/mlpack/methods/ann/ffn.hpp:33,
                 from mnist_cnn.cpp:20:
/usr/include/ensmallen_bits/callbacks/early_stop_at_min_loss.hpp:31:3: note: candidate: ‘ens::EarlyStopAtMinLoss::EarlyStopAtMinLoss(size_t)’
   31 |   EarlyStopAtMinLoss(const size_t patienceIn = 10) :
      |   ^~~~~~~~~~~~~~~~~~
/usr/include/ensmallen_bits/callbacks/early_stop_at_min_loss.hpp:31:35: note:   no known conversion for argument 1 from ‘main()::<lambda(const mat&)>’ to ‘size_t’ {aka ‘long unsigned int’}
   31 |   EarlyStopAtMinLoss(const size_t patienceIn = 10) :
      |                      ~~~~~~~~~~~~~^~~~~~~~~~~~~~~
/usr/include/ensmallen_bits/callbacks/early_stop_at_min_loss.hpp:21:7: note: candidate: ‘constexpr ens::EarlyStopAtMinLoss::EarlyStopAtMinLoss(const ens::EarlyStopAtMinLoss&)’
   21 | class EarlyStopAtMinLoss
      |       ^~~~~~~~~~~~~~~~~~
/usr/include/ensmallen_bits/callbacks/early_stop_at_min_loss.hpp:21:7: note:   no known conversion for argument 1 from ‘main()::<lambda(const mat&)>’ to ‘const ens::EarlyStopAtMinLoss&’
/usr/include/ensmallen_bits/callbacks/early_stop_at_min_loss.hpp:21:7: note: candidate: ‘constexpr ens::EarlyStopAtMinLoss::EarlyStopAtMinLoss(ens::EarlyStopAtMinLoss&&)’
/usr/include/ensmallen_bits/callbacks/early_stop_at_min_loss.hpp:21:7: note:   no known conversion for argument 1 from ‘main()::<lambda(const mat&)>’ to ‘ens::EarlyStopAtMinLoss&&’
make: *** [<builtin>: mnist_cnn.o] Error 1

Problem in shebang line in "download_data_set.py"

When I tried to download the datasets by running ./download_data_set.py on my Ubuntu20.04 machine, I got this error
Traceback (most recent call last):
File "./download_data_set.py", line 11, in
from tqdm import tqdm
ImportError: No module named tqdm

But when I tried python3 download_data_set.py, it worked.
So, the problem here is with selecting the correct python interpreter in the shebang line. By default it is selecting the python2 and the support for it is removed in ubuntu that is why it is having trouble finding the tqdm package.

An easy fix would be to replace "#!/usr/bin/python" with "#!/usr/bin/python3" but I am not sure if this would break something in any other OS. Or a better fix would be to write a wrapper script that will select the available interpreter itself.
Please let me know any thoughts.
Also, if there is a need to make any changes, then I will be happy work on it.