Comments (7)
I think it's a great idea...maybe can add it as an option?
Also, in addition to the adaptive gradient descent, another reason why calling it with max_iter=1 in a loop is because of the early exaggeration.
from fit-sne.
I was taking care to specify exaggeration correctly, so I think the only difference must have been due to adaptive gradient descent...
Anyway, optional output sounds right. I noticed that you have recently added another optional output so we should take care that these two optional outputs work correctly together. Or maybe combine them together? If some input flag is on, then optional outputs (all of them) are returned and if not then not? Or do you prefer to have a separate flag for each optional output?
from fit-sne.
I think you're referring to the R wrapper, correct? If so, then yes, it now optionally returns the KL divergence computed at every 50 iterations. I would have preferred that the output always be a list (i.e. so that the cost could also be outputted), but since people have already started using the old interface, I was hesitant to change the default and break peoples' code.
Anyways, I think it would be most intuitive to have a separate flag. That is, a 'get_costs' flag as we currently have, and (for example) an 'intermediate_iterations' flag. If these are both false, then the embedded matrix Y is returned (by default), if either of them are true, then a list is returned.
Do you think that is a good solution?
from fit-sne.
OK.
Currently the C++ code always saves the costs, but I'm not sure it's a good idea with gradient descent history: e.g. for 1mln points, the output would be 1mln x 2 x 1000, which is pretty large...
I was originally thinking of animating smaller datasets :)
from fit-sne.
@dkobak if this is still relevant, I have released a python only version of FIt-SNE, which was built with interactivity in mind (since we are integrating it into Orange). I've included a callback system where you can look at embeddings at each step of the optimization and animate this. I played around with this in the Orange widgets and the animations look really neat - but that hasn't been merged into master yet. If you need a quick fix, you can use it here.
I definitely think this would be a great addition to have here, but I am not that familiar with C++ and I don't know how a callback system would work.
from fit-sne.
@pavlin-policar I don't think there is a way to set up a callback system. My current thinking is that we should pass a boolean flag save_intermediate_iterations
into the C++ code. If it's set to True, then the code should write each iteration into a special file, e.g. intermediate_iterations.txt
(without storing all of them in RAM). Then one can read this file from Python/R/Matlab and make an animation.
For 25k points, the final output is 25k*2*8/1e+6 = 0.4 Mb
, so 1000 iterations would be 400 Mb, which is very manageable. For 1mln points, the final output is 16 Mb, so 1000 iterations would be 16 Gb. That's a large file, but can still be easily processed if needed.
@linqiaozhi Thoughts?
I could try to implement it some time in the next few weeks.
@pavlin-policar Wow, thanks a lot for the link to your fastTSNE package. Great work! I might leave some comments over there.
from fit-sne.
@pavlin-policar A callback system for visualizing real-time FIt-SNE is super cool...I wish that would be possible with our C++ code, but I can't imagine that working since the wrappers are just calling a binary right now.
@dkobak I think your approach is very reasonable. I would only suggest that we actually output floats instead of doubles, so each element would typically be 4 bytes instead of 8 bytes, which would halve the file size (we don't need that much precision for the visualization anyways). There are other more sophisticated things that could be done (e.g. only output a random subset of the points, or only specific iterations), but I don't think it's necessary at this point...this is a function that people will use only for very specific situations (e.g. diagnostics) so using some disk space and taking some time for I/O is probably okay. At least for a first implementation. Thanks for being willing to implement it!
from fit-sne.
Related Issues (20)
- Passing perplexity_list in R? HOT 1
- dyld: Library not loaded: @rpath/libfftw3.3.dylib with conda FFTW HOT 4
- Confusing instructions for Windows installation HOT 1
- Always get an error about version when using this in SEURAT 3.1.5 HOT 3
- Using no AVX instructions HOT 5
- Installation error in OSX HOT 1
- Error in pip install fitsne HOT 9
- Could not run in Google Colab HOT 1
- need to require rsvd package in R wrapper HOT 3
- Is the PyPI version up to date? HOT 3
- Memory Allocation Failed - Large Datasets HOT 7
- FFT supports only 2 components
- results file (results_date_seed-.dat) is not created HOT 3
- compile fitsne error HOT 3
- Segmentation fault sometime
- Windows installation - HOT 1
- Windows executable for Version 1.2.0 not compiling HOT 5
- How to install KlugerLab / FIt-SNE for Seurat usage in Windows 10 system
- Recommended Parameters for > 3 million cells HOT 8
- Error while compiling FIt-SNE v1.1.0 or 1.0.0 solved Fedora 31 5.6.8-200 while using with Seurat 3.1 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fit-sne.