novak-99 / mlpp Goto Github PK
View Code? Open in Web Editor NEWA library created to revitalize C++ as a machine learning front end. Per aspera ad astra.
License: MIT License
A library created to revitalize C++ as a machine learning front end. Per aspera ad astra.
License: MIT License
your implementation of
std::vector<double> kNN::nearestNeighbors(std::vector<double> x){
LinAlg alg;
// The nearest neighbors
std::vector<double> knn;
std::vector<std::vector<double>> inputUseSet = inputSet;
//Perfom this loop unless and until all k nearest neighbors are found, appended, and returned
for(int i = 0; i < k; i++){
int neighbor = 0;
for(int j = 0; j < inputUseSet.size(); j++){
bool isNeighborNearer = alg.euclideanDistance(x, inputUseSet[j]) < alg.euclideanDistance(x, inputUseSet[neighbor]);
if(isNeighborNearer){
neighbor = j;
}
}
knn.push_back(neighbor);
inputUseSet.erase(inputUseSet.begin() + neighbor); // This is why we maintain an extra input"Use"Set
}
return knn;
}
is wrong. Given a list of inputSet and x, assuming the inputSet is sorted in ascending order according to their distance to x, your implementation will output a list in index zero
return std::log(z / (1 - z));
this is my PR #10
MLPP/MLPP/Activation/Activation.cpp
Line 224 in aac9bd6
Hi Marc,
I pulled your project into my CLion IDE (IntelliJ), looks very interesting! great work there.. are you really 16 years old!!!
I have introduced a cmake file that would allow more portable and modern build approach and easy IDE agnostic integration. I am attaching it here. just drop it next to buildSo.sh. the cmake will configure two targets ( shared lib "mlpp" and mlpp_runner for your main.cpp that will link against the shared library). I also upped the C++ support to 20.
CMakeLists.txt
What is the status of the documentation you mentioned? Thx!
MLPP/MLPP/Activation/Activation.cpp
Lines 52 to 64 in 4ebcc0a
Here is one specific example of code optimization: The softmax function here will give the correct answer but as it is currently written it is recalculating the same sum z.size() times. It would be much better to calculate the sum outside of the loop once and then reuse that value inside the loop without recalculating it.
If we wanted to optimize even more we could look at the use of the exp() function. Even with the above fix the exponential of each element in z is being calculated twice (once for the sum and once for the final output element calculation). Assuming memory allocations and accesses are faster than the exp() function it would be better to make an intermediary array of the exponential values and then access that array to calculate the sum and then also to calculate values of a.
The first optimization here will make a massive difference, so I think it is definitely worth keeping things like this in mind. The second one will have much less of an impact and goes a bit more into the weeds, so I would not worry too much about optimizations like that during the initial coding - I mention it here just to give a more complete idea of what kind of optimizations are possible even in a very simple function.
Great work! Very interesting!
In the README, you say that MLPP serves to revitalize C++ as a machine learning front-end. How does MLPP separate itself from the Pytorch C++ API? If you don't mind me asking, why not build wrappers around the already open-source and highly optimized Pytorch C++ code?
Thanks!
double Utilities::performance(std::vector<double> y_hat, std::vector<double> outputSet){
double correct = 0;
for(int i = 0; i < y_hat.size(); i++){
if(std::round(y_hat[i]) == outputSet[i]){
correct++;
}
}
return correct/y_hat.size();
}
problem:std::round(y_hat[i]) == outputSet[i]???
Cost::MAEDeriv is wrong. y_hat must be compared with y, but not with zero.
Cost::WassersteinLoss is same as Cost::HingeLoss, but thats are not same.
I was sifting through the codebase, as I am building a similar project from scratch and this project gave me a lot of pointers and inspiration, and noticed several ways things could be optimized. Is the project still under active development?
Impressive work!
You should swap the two inner loops here:
Lines 80 to 86 in 2a21d25
That is:
for(int i = 0; i < A.size(); i++){
for(int k = 0; k < B.size(); k++){
for(int j = 0; j < B[0].size(); j++){
C[i][j] += A[i][k] * B[k][j];
}
}
}
It won't change the result, but it should speed up the multiplication. Explanations: https://viralinstruction.com/posts/hardware/#15f5c31a-8aef-11eb-3f19-cf0a4e456e7a
Also, std::vector<std::vector<>>
is not the best way to store a matrix: https://stackoverflow.com/a/55478808
In the README, it looks like you've (planned on) implemented a lot of hyperbolic functions as activation functions.
If you don't mind me asking, are there specific cases where such functions with a diverging gradient (such as Cosh and Sinh) that you've implemented would be helpful? I am very curious, I haven't seen these used before. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.