Coder Social home page Coder Social logo

tsnejs's Introduction

tSNEJS

tSNEJS is an implementation of t-SNE visualization algorithm in Javascript.

t-SNE is a visualization algorithm that embeds things in 2 or 3 dimensions. If you have some data and you can measure their pairwise differences, t-SNE visualization can help you identify clusters in your data. See example below.

Online demo

The main project website has a live example and more description.

There is also the t-SNE CSV demo that allows you to simply paste CSV data into a textbox and tSNEJS computes and visualizes the embedding on the fly (no coding needed).

Research Paper

The algorithm was originally described in this paper:

L.J.P. van der Maaten and G.E. Hinton.
Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research
9(Nov):2579-2605, 2008.

You can find the PDF here.

Example code

Import tsne.js into your document: <script src="tsne.js"></script> And then here is some example code:

var opt = {}
opt.epsilon = 10; // epsilon is learning rate (10 = default)
opt.perplexity = 30; // roughly how many neighbors each point influences (30 = default)
opt.dim = 2; // dimensionality of the embedding (2 = default)

var tsne = new tsnejs.tSNE(opt); // create a tSNE instance

// initialize data. Here we have 3 points and some example pairwise dissimilarities
var dists = [[1.0, 0.1, 0.2], [0.1, 1.0, 0.3], [0.2, 0.1, 1.0]];
tsne.initDataDist(dists);

for(var k = 0; k < 500; k++) {
  tsne.step(); // every time you call this, solution gets better
}

var Y = tsne.getSolution(); // Y is an array of 2-D points that you can plot

The data can be passed to tSNEJS as a set of high-dimensional points using the tsne.initDataRaw(X) function, where X is an array of arrays (high-dimensional points that need to be embedded). The algorithm computes the Gaussian kernel over these points and then finds the appropriate embedding.

Web Demos

There are two web interfaces to this library that we are aware of:

  • By Andrej, here.
  • By Laurens, here, which takes data in different format and can also use Google Spreadsheet input.

About

Send questions to @karpathy.

License

MIT

tsnejs's People

Contributors

domluna avatar karpathy avatar piotrgrudzien avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tsnejs's Issues

Remove max cells condition

Hi,

How can I remove the max cell condition to run the t-sne code on a larger set of data ?

Thanks in advance for your help.

Node Package

Hello,
thank you for creating this library and mainting it!
I have some questions for you, if possible.
Is there a node package available for installing the library?
If not:
Will there ever be a node package to install this library?

Thank you for your time

Early exaggeration

Super late but I think there is a small bug with how early exaggeration is implemented. The code has:

var premult = 4 * (pmul * P[i*N+j] - Q[i*N+j]) * Qu[i*N+j]; 

whereas I think it should be

var premult = 4 * pmul * (P[i*N+j] - Q[i*N+j]) * Qu[i*N+j];

I.e. the early exaggeration factor pmul should multiply the overall gradient rather than P alone.

The t-SNE authors describe early exaggeration as scaling P for the first few iterations. Since P is a constant, you'd think you could achieve the effect just by scaling P in the gradient. This is what the code does.

But at the same time, since the loss (-(P * Q.log()).sum()) is multiplied by P, scaling P should also scale the overall gradient. Now I'm confused.

In Appendix A of the original paper, the authors assume the sum of P is 1. But under early exaggeration the sum is pmul. In this case, the q_ij term in the gradient becomes pmul * q_ij. So really the overall gradient should be scaled by pmul.

Not sure if this "fix" should be implemented, since the code has worked reliably for years. But thought it was worth noting.

Trouble with NaNs after step function updates

Some background: I'm trying to visualize my Spotify playlists but I'm having some trouble getting going here.

My data consists of 267 songs, each song has 10 features. Here's a sample (ignore the artist and title fields).

{
"artist":"Drake",
"audio_summary.acousticness":0.016128527,
"audio_summary.danceability":0.3236382,
"audio_summary.energy":0.8417243,
"audio_summary.key":7,
"audio_summary.liveness":0.13018084,
"audio_summary.loudness":-5.548,
"audio_summary.mode":1,
"audio_summary.speechiness":0.0,
"audio_summary.tempo":98.39,
"audio_summary.time_signature":5,
"title":"Over"
}

I'm passing the data as a 267 element array, each element is a 10 element array. I'm using initDataRaw to initialize but I've tried both init methods.

The problem is even after just one call to the step function, getSolution returns [NaN, NaN].

Now I had this problem originally but I switched from initDataDist to initDataRaw, that seemed to avoid the NaN. The visualization I got though was off. I wish I had taken a picture because I'm having trouble reproducing it due to NaN issues but essentially the songs spread out on a diagonal in a line as if it was being compressed to 1D.

I thought maybe the issue were some fields having values much larger than others, tempo for example. So I normalized all the features and then came the NaN problem. The weird this is that even my old non-normalized data is giving me NaNs now!

Any ideas of what I'm doing wrong? Tips for getting the data setup in general (avoiding NaNs)?

Thanks!

"if(gainid < 0.01) gainid = 0.01; // clamp" has no effect (line 257)

As of today, on line 257 of tsne.js, the line "if(gainid < 0.01) gainid = 0.01; // clamp" has no effect, and the intention of this code is unknown, insofar as I don't know what the algorithm should do here.

Should newgain be clamped? or should gainid be clamped prior to the multiply by 0.8, in line 256?
var gainid goes unused for the rest of the loop.

documentation & questions

Thanks a lot for the very nice library, which I discovered with distill.pub/2016/misread-tsne/

Here's a list of questions and suggestions:

  1. README: I struggled to make the library work because I wrongly inferred from the example in README.md that somehow the "distance" or "dissimilarity" matrix had to have 1.0 on the diagonal, and not 0.0 as they would with distances. (Also, the "example" matrix isn't symetric.)
    I finally understood that it was really a pairwise distance matrix, and fed it geodesic distances to make http://bl.ocks.org/Fil/b07d09162377827f1b3e266c43de6d2a

  2. Web Worker. tsne tries to attach itself to window, which does not allow to use it in a web woker (as in http://bl.ocks.org/Fil/e402e9c51ce77c21baedc2d1af933bc3 , which I made with https://github.com/scienceai/tsne-js ). This is probably a simple fix.

  3. Learning: Is there any possibility to expose the model โ€” and use the generated mapping to project points that were not given initially?

  4. Online: is it possible to augment a trained model with new data?

  5. Seeding: can we seed the model with initial positions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.