Comments (8)
What is the size of the dataset you feed the algorithm? And what value of theta do you use when you run Barnes-Hut t-SNE?
from bhtsne.
I fed sample # = 2500 and 7000. I use theta = 0 for exact, and tried theta = 1e-3, 1e-7 for Barnes-Hut tsne.
from bhtsne.
Try larger values for theta. Theta determines how many branches of the quadtree are being pruned; larger values of theta imply more pruning, which, in turn, implies faster speed and slightly less accurate results.
In most of my experiments, I found theta = 0.5 to be a good trade-off between speed and quality.
from bhtsne.
Thanks for the suggestion! I just tried out and found out that if I use 2500 samples with dimension=102, for the exact algorithm (theta = 0), it took 9.44 sec to run 50 iterations.
For BH tsne, I tried theta = 0.5, 1 and it took ~65 sec and ~70 sec for 50 iterations. I was wondering, Is this unusual?
If it is, there might be something wrong with my experiment....
from bhtsne.
Yeah, that is quite unexpected. Does your data have duplicates (or near-duplicates) by any chance?
from bhtsne.
I am not quite sure whether my data would have near duplicates. But I tried MNIST and has got similar results.
When using N=2500, theta=0 takes ~9 sec per 50 iters, and theta=0.5 takes 68 sec per 50 iters.
When N = 10000, theta=0 takes 150 sec per 50 iters, and theta=0.5 takes 281 sec per 50 iters.
I could not try N = 20000+ right now since my PC does not have enough computing power.
I was suspecting if there's a slight chance that the computing power of PC would affect the result?
Thank you so very much for answering all my questions. It's very helpful to me!
from bhtsne.
The exact numbers may vary per CPU, but I see no reason why they would vary as much as you describe in your post. Are you sure you compiled the bhtsne
binary in optimized mode (-O2
/ -O3
) rather than in debug mode?
Running the MNIST example with N=2500 from the README
using the Python wrapper, I see the following timings on my Macbook Pro (3.5 GHz Intel Core i7):
>>> import time
>>> import numpy as np
>>> import bhtsne
>>> data = np.loadtxt("mnist2500_X.txt", skiprows=1)
>>> start_time = time.time(); embedding_array = bhtsne.run_bh_tsne(data, initial_dims=data.shape[1], theta=0.5); end_time = time.time();
>>> print(end_time - start_time)
24.6498529911
>>> start_time = time.time(); embedding_array = bhtsne.run_bh_tsne(data, initial_dims=data.shape[1], theta=0.0); end_time = time.time();
>>> print(end_time - start_time)
68.8492469788
from bhtsne.
Sorry for the late reply. I finally found out that I indeed compiled the code in the debug mode instead of the optimized mode, and that is exactly the reason why theta = 0.5 was much slower than theta = 0. After switching to optimized mode, both modes run at expected speeds. I cannot thank you enough for your help!!
from bhtsne.
Related Issues (20)
- Document the "gains" HOT 2
- There is no module called bhtsne.run_bh_tsne ??? HOT 1
- Help ME! thanks! HOT 1
- Usage of random generator(s) in the source HOT 2
- How can i visualize the image data like this? HOT 1
- bhtsne.py:135: ComplexWarning: Casting complex values to real discards the imaginary part HOT 1
- Butterfly effect HOT 3
- Can not use the python wrapper in Windows
- transposition based on input method HOT 3
- Dimension problem HOT 3
- Can't compile the .exe with visual studio 9.0 HOT 9
- Pytorch version? HOT 4
- python wrapper - Cost for each sample
- Performance difference to the old version HOT 1
- C API HOT 3
- Bhtsne for large datasets HOT 1
- Performance difference Windows/Ubuntu HOT 2
- t-SNE for Java/Scala/Kotlin/Clojure
- Is there a rule of thumb for the lower bound on the perplexity?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bhtsne.