Coder Social home page Coder Social logo

novak-99 / mlpp Goto Github PK

View Code? Open in Web Editor NEW
1.1K 1.1K 155.0 21.02 MB

A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.

License: MIT License

C++ 99.71% Shell 0.29%
cpp data-science deep-learning machine-learning

mlpp's Issues

knn implementation problem

your implementation of

    std::vector<double> kNN::nearestNeighbors(std::vector<double> x){
        LinAlg alg;
        // The nearest neighbors
        std::vector<double> knn;
        
        std::vector<std::vector<double>> inputUseSet = inputSet;
        //Perfom this loop unless and until all k nearest neighbors are found, appended, and returned
        for(int i = 0; i < k; i++){
            int neighbor = 0;
            for(int j = 0; j < inputUseSet.size(); j++){
                bool isNeighborNearer = alg.euclideanDistance(x, inputUseSet[j]) < alg.euclideanDistance(x, inputUseSet[neighbor]);
                if(isNeighborNearer){
                    neighbor = j;
                }
            }
            knn.push_back(neighbor);
            inputUseSet.erase(inputUseSet.begin() + neighbor); // This is why we maintain an extra input"Use"Set
        }
        return knn;
    }

is wrong. Given a list of inputSet and x, assuming the inputSet is sorted in ascending order according to their distance to x, your implementation will output a list in index zero

introduce CMAKE build

Hi Marc,

I pulled your project into my CLion IDE (IntelliJ), looks very interesting! great work there.. are you really 16 years old!!!

I have introduced a cmake file that would allow more portable and modern build approach and easy IDE agnostic integration. I am attaching it here. just drop it next to buildSo.sh. the cmake will configure two targets ( shared lib "mlpp" and mlpp_runner for your main.cpp that will link against the shared library). I also upped the C++ support to 20.
CMakeLists.txt

Documentation

What is the status of the documentation you mentioned? Thx!

Softmax Optimization

std::vector<double> Activation::softmax(std::vector<double> z){
std::vector<double> a;
a.resize(z.size());
for(int i = 0; i < z.size(); i++){
double sum = 0;
for(int j = 0; j < z.size(); j++){
sum += exp(z[j]);
}
a[i] = exp(z[i]) / sum;
}
return a;
}

Here is one specific example of code optimization: The softmax function here will give the correct answer but as it is currently written it is recalculating the same sum z.size() times. It would be much better to calculate the sum outside of the loop once and then reuse that value inside the loop without recalculating it.

If we wanted to optimize even more we could look at the use of the exp() function. Even with the above fix the exponential of each element in z is being calculated twice (once for the sum and once for the final output element calculation). Assuming memory allocations and accesses are faster than the exp() function it would be better to make an intermediary array of the exponential values and then access that array to calculate the sum and then also to calculate values of a.

The first optimization here will make a massive difference, so I think it is definitely worth keeping things like this in mind. The second one will have much less of an impact and goes a bit more into the weeds, so I would not worry too much about optimizations like that during the initial coding - I mention it here just to give a more complete idea of what kind of optimizations are possible even in a very simple function.

Is MLPP reinventing the wheel? What would it be used for?

Great work! Very interesting!

In the README, you say that MLPP serves to revitalize C++ as a machine learning front-end. How does MLPP separate itself from the Pytorch C++ API? If you don't mind me asking, why not build wrappers around the already open-source and highly optimized Pytorch C++ code?

Thanks!

preformance_function error?

double Utilities::performance(std::vector<double> y_hat, std::vector<double> outputSet){
    double correct = 0;
    for(int i = 0; i < y_hat.size(); i++){
        if(std::round(y_hat[i]) == outputSet[i]){
            correct++;
        }
    }
    return correct/y_hat.size();
}

problem:std::round(y_hat[i]) == outputSet[i]???

Possible mistakes in cost functions

Cost::MAEDeriv is wrong. y_hat must be compared with y, but not with zero.
Cost::WassersteinLoss is same as Cost::HingeLoss, but thats are not same.

Development status?

I was sifting through the codebase, as I am building a similar project from scratch and this project gave me a lot of pointers and inspiration, and noticed several ways things could be optimized. Is the project still under active development?

Optimizing matrix multiplication

Impressive work!

You should swap the two inner loops here:

for(int i = 0; i < A.size(); i++){
for(int j = 0; j < B[0].size(); j++){
for(int k = 0; k < B.size(); k++){
C[i][j] += A[i][k] * B[k][j];
}
}
}

That is:

 for(int i = 0; i < A.size(); i++){ 
     for(int k = 0; k < B.size(); k++){ 
         for(int j = 0; j < B[0].size(); j++){ 
             C[i][j] += A[i][k] * B[k][j]; 
         } 
     } 
 } 

It won't change the result, but it should speed up the multiplication. Explanations: https://viralinstruction.com/posts/hardware/#15f5c31a-8aef-11eb-3f19-cf0a4e456e7a

Also, std::vector<std::vector<>> is not the best way to store a matrix: https://stackoverflow.com/a/55478808

Hyperbolic Activations

In the README, it looks like you've (planned on) implemented a lot of hyperbolic functions as activation functions.

If you don't mind me asking, are there specific cases where such functions with a diverging gradient (such as Cosh and Sinh) that you've implemented would be helpful? I am very curious, I haven't seen these used before. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.