Coder Social home page Coder Social logo

Comments (8)

lvdmaaten avatar lvdmaaten commented on June 9, 2024

What is the size of the dataset you feed the algorithm? And what value of theta do you use when you run Barnes-Hut t-SNE?

from bhtsne.

k8280627 avatar k8280627 commented on June 9, 2024

I fed sample # = 2500 and 7000. I use theta = 0 for exact, and tried theta = 1e-3, 1e-7 for Barnes-Hut tsne.

from bhtsne.

lvdmaaten avatar lvdmaaten commented on June 9, 2024

Try larger values for theta. Theta determines how many branches of the quadtree are being pruned; larger values of theta imply more pruning, which, in turn, implies faster speed and slightly less accurate results.

In most of my experiments, I found theta = 0.5 to be a good trade-off between speed and quality.

from bhtsne.

k8280627 avatar k8280627 commented on June 9, 2024

Thanks for the suggestion! I just tried out and found out that if I use 2500 samples with dimension=102, for the exact algorithm (theta = 0), it took 9.44 sec to run 50 iterations.

For BH tsne, I tried theta = 0.5, 1 and it took ~65 sec and ~70 sec for 50 iterations. I was wondering, Is this unusual?

If it is, there might be something wrong with my experiment....

from bhtsne.

lvdmaaten avatar lvdmaaten commented on June 9, 2024

Yeah, that is quite unexpected. Does your data have duplicates (or near-duplicates) by any chance?

from bhtsne.

k8280627 avatar k8280627 commented on June 9, 2024

I am not quite sure whether my data would have near duplicates. But I tried MNIST and has got similar results.

When using N=2500, theta=0 takes ~9 sec per 50 iters, and theta=0.5 takes 68 sec per 50 iters.

When N = 10000, theta=0 takes 150 sec per 50 iters, and theta=0.5 takes 281 sec per 50 iters.

I could not try N = 20000+ right now since my PC does not have enough computing power.

I was suspecting if there's a slight chance that the computing power of PC would affect the result?

Thank you so very much for answering all my questions. It's very helpful to me!

from bhtsne.

lvdmaaten avatar lvdmaaten commented on June 9, 2024

The exact numbers may vary per CPU, but I see no reason why they would vary as much as you describe in your post. Are you sure you compiled the bhtsne binary in optimized mode (-O2 / -O3) rather than in debug mode?

Running the MNIST example with N=2500 from the README using the Python wrapper, I see the following timings on my Macbook Pro (3.5 GHz Intel Core i7):

>>> import time
>>> import numpy as np
>>> import bhtsne
>>> data = np.loadtxt("mnist2500_X.txt", skiprows=1)
>>> start_time = time.time(); embedding_array = bhtsne.run_bh_tsne(data, initial_dims=data.shape[1], theta=0.5); end_time = time.time();
>>> print(end_time - start_time)
24.6498529911
>>> start_time = time.time(); embedding_array = bhtsne.run_bh_tsne(data, initial_dims=data.shape[1], theta=0.0); end_time = time.time();
>>> print(end_time - start_time)
68.8492469788

from bhtsne.

k8280627 avatar k8280627 commented on June 9, 2024

Sorry for the late reply. I finally found out that I indeed compiled the code in the debug mode instead of the optimized mode, and that is exactly the reason why theta = 0.5 was much slower than theta = 0. After switching to optimized mode, both modes run at expected speeds. I cannot thank you enough for your help!!

from bhtsne.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.