The mlpp's discuss from novak-99

knn implementation problem

your implementation of

    std::vector<double> kNN::nearestNeighbors(std::vector<double> x){
        LinAlg alg;
        // The nearest neighbors
        std::vector<double> knn;
        
        std::vector<std::vector<double>> inputUseSet = inputSet;
        //Perfom this loop unless and until all k nearest neighbors are found, appended, and returned
        for(int i = 0; i < k; i++){
            int neighbor = 0;
            for(int j = 0; j < inputUseSet.size(); j++){
                bool isNeighborNearer = alg.euclideanDistance(x, inputUseSet[j]) < alg.euclideanDistance(x, inputUseSet[neighbor]);
                if(isNeighborNearer){
                    neighbor = j;
                }
            }
            knn.push_back(neighbor);
            inputUseSet.erase(inputUseSet.begin() + neighbor); // This is why we maintain an extra input"Use"Set
        }
        return knn;
    }

is wrong. Given a list of inputSet and x, assuming the inputSet is sorted in ascending order according to their distance to x, your implementation will output a list in index zero

logit function error

return std::log(z / (1 - z));
this is my PR #10

MLPP/MLPP/Activation/Activation.cpp

Line 224 in aac9bd6

return std::log(z / (1 + z));

introduce CMAKE build

Hi Marc,

I pulled your project into my CLion IDE (IntelliJ), looks very interesting! great work there.. are you really 16 years old!!!

I have introduced a cmake file that would allow more portable and modern build approach and easy IDE agnostic integration. I am attaching it here. just drop it next to buildSo.sh. the cmake will configure two targets ( shared lib "mlpp" and mlpp_runner for your main.cpp that will link against the shared library). I also upped the C++ support to 20.
CMakeLists.txt

Documentation

What is the status of the documentation you mentioned? Thx!

Softmax Optimization

MLPP/MLPP/Activation/Activation.cpp

Lines 52 to 64 in 4ebcc0a

    
           std::vector<double> Activation::softmax(std::vector<double> z){ 
        
               std::vector<double> a; 
        
               a.resize(z.size()); 
        
               for(int i = 0; i < z.size(); i++){ 
        
                   double sum = 0; 
        
                   for(int j = 0; j < z.size(); j++){ 
        
                       sum += exp(z[j]); 
        
                   } 
        
                   a[i] = exp(z[i]) / sum; 
        
               } 
        
               return a; 
        
           }

Here is one specific example of code optimization: The softmax function here will give the correct answer but as it is currently written it is recalculating the same sum z.size() times. It would be much better to calculate the sum outside of the loop once and then reuse that value inside the loop without recalculating it.

If we wanted to optimize even more we could look at the use of the exp() function. Even with the above fix the exponential of each element in z is being calculated twice (once for the sum and once for the final output element calculation). Assuming memory allocations and accesses are faster than the exp() function it would be better to make an intermediary array of the exponential values and then access that array to calculate the sum and then also to calculate values of a.

The first optimization here will make a massive difference, so I think it is definitely worth keeping things like this in mind. The second one will have much less of an impact and goes a bit more into the weeds, so I would not worry too much about optimizations like that during the initial coding - I mention it here just to give a more complete idea of what kind of optimizations are possible even in a very simple function.

Is MLPP reinventing the wheel? What would it be used for?

Great work! Very interesting!

In the README, you say that MLPP serves to revitalize C++ as a machine learning front-end. How does MLPP separate itself from the Pytorch C++ API? If you don't mind me asking, why not build wrappers around the already open-source and highly optimized Pytorch C++ code?

Thanks!

preformance_function error?

double Utilities::performance(std::vector<double> y_hat, std::vector<double> outputSet){
    double correct = 0;
    for(int i = 0; i < y_hat.size(); i++){
        if(std::round(y_hat[i]) == outputSet[i]){
            correct++;
        }
    }
    return correct/y_hat.size();
}

problem:std::round(y_hat[i]) == outputSet[i]???

Possible mistakes in cost functions

Cost::MAEDeriv is wrong. y_hat must be compared with y, but not with zero.
Cost::WassersteinLoss is same as Cost::HingeLoss, but thats are not same.

Development status?

I was sifting through the codebase, as I am building a similar project from scratch and this project gave me a lot of pointers and inspiration, and noticed several ways things could be optimized. Is the project still under active development?

Optimizing matrix multiplication

Impressive work!

You should swap the two inner loops here:

MLPP/MLPP/LinAlg/LinAlg.cpp

Lines 80 to 86 in 2a21d25

    
           for(int i = 0; i < A.size(); i++){ 
        
               for(int j = 0; j < B[0].size(); j++){ 
        
                   for(int k = 0; k < B.size(); k++){ 
        
                       C[i][j] += A[i][k] * B[k][j]; 
        
                   } 
        
               } 
        
           }

That is:

 for(int i = 0; i < A.size(); i++){ 
     for(int k = 0; k < B.size(); k++){ 
         for(int j = 0; j < B[0].size(); j++){ 
             C[i][j] += A[i][k] * B[k][j]; 
         } 
     } 
 }

It won't change the result, but it should speed up the multiplication. Explanations: https://viralinstruction.com/posts/hardware/#15f5c31a-8aef-11eb-3f19-cf0a4e456e7a

Also, std::vector<std::vector<>> is not the best way to store a matrix: https://stackoverflow.com/a/55478808

Hyperbolic Activations

In the README, it looks like you've (planned on) implemented a lot of hyperbolic functions as activation functions.

If you don't mind me asking, are there specific cases where such functions with a diverging gradient (such as Cosh and Sinh) that you've implemented would be helpful? I am very curious, I haven't seen these used before. Thanks!

novak-99 / mlpp Goto Github PK

mlpp's Issues

knn implementation problem

logit function error

introduce CMAKE build

Documentation

Softmax Optimization

Is MLPP reinventing the wheel? What would it be used for?

preformance_function error?

Possible mistakes in cost functions

Development status?

Optimizing matrix multiplication

Hyperbolic Activations

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	std::vector<double> Activation::softmax(std::vector<double> z){
	std::vector<double> a;
	a.resize(z.size());
	for(int i = 0; i < z.size(); i++){
	double sum = 0;
	for(int j = 0; j < z.size(); j++){
	sum += exp(z[j]);
	}
	a[i] = exp(z[i]) / sum;
	}

	return a;
	}

	for(int i = 0; i < A.size(); i++){
	for(int j = 0; j < B[0].size(); j++){
	for(int k = 0; k < B.size(); k++){
	C[i][j] += A[i][k] * B[k][j];
	}
	}
	}