hpicgs / bhtsne Goto Github PK
View Code? Open in Web Editor NEWThis project forked from lvdmaaten/bhtsne
Barnes-Hut t-SNE
License: MIT License
This project forked from lvdmaaten/bhtsne
Barnes-Hut t-SNE
License: MIT License
use intrinsics for more efficient processing of vectorizable for loops
[--algorithmParameter value]* [-outputOption]* [--outputFile value] fileName
The order of parameters and options is not relevant.
refactor C-style C++ to modern C++ using features from newer standards
write content or remove cmake target
<param>
and set<param>
getter and setter for every algorithm parameter
load<fileExtension>
methods for each supported extension (e.g. legacyDat, csv, our binary)
run
method; or better name
save<fileExtension>
methods for each supported extension (e.g. svg, csv, [legacy]binary)
TODO:
add the data sets again but ignore them for the launchpad package
Many phases/steps from the algorithm perform an operation on all dimensions of a data point.
Using a library such as Vc, automatic vectorization is enforced.
enable package managers like apt/vcpkg/homebrew to simply install this project
The goal is to store at leat n * (n-1)/2
values and compute them at most.
Example:
TSNE::computeSquaredEuclideanDistance
implement time measurement, so performance before and after refactoring/optimization can be compared
enable simple cross-platform setup for Windows, macOS and Ubuntu by creating a CMake project
figure out, whether using the gpu can improve performance, and if it does, implement that
E.g., by bypassing C++ std::getline
and use the more C-like interface std::strtod
(example: https://github.com/cginternals/unifiedmemory-demo/blob/master/source/unifiedmemory/Computation.cpp#L216).
The trade-off is the missing error handling, which you lack nonetheless.
Further, the parsed data point can be moved
to the target row.
instead of passing an implicit sparse matrix around in 3 seperate vectors
Example:
TSNE::computeGaussianPerplexity
the three inner loops computing cur_P
, sum_P
, and H
may get consolidated and vectorized.All vectors shoould use aligned allocators and manual SSE / AVX code should use aligned load memory instructions.
add commandline option to write outData to disk on every n-th iteration, so "data developement" can be seen
To compute the new m_centerOfMass
, a number of potential precision-losing operations are performed.
figure out how the algorithm can be parallelized and implement that
Example:
TSNE::computeGaussianPerplexity
while initializing obj_X
write tests, so mistakes are easily caught when refactoring and optimizing
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.