Comments (13)
FFI = Foreign function interface
I think there exist several FFIs for Python; I've been using ctypes in the past.
from bhtsne.
result.dat is created in a temporary folder. In your case it tries to create it in /var.
Check if your user has permission to create folders in /var
from bhtsne.
@gpapadop79 thanks for the infos. Because the matrix as the input is large(95 rows and 745544 columns), if there is no enough memory, should it cause this problem? As the folder is a temporary one, how to check the permission to create folders?
Thanks in advance
from bhtsne.
Is the t-SNE algorithm itself actually being run? Like do you see a loss being printed every 50 iterations or something like that? The result file should actually be really small (95x2 matrix), so I would be surprised if this were an OOM problem.
To check permissions, you can do something like ls -la /var/folders/92/8ty0c6392m773r5tbp4s9gy80000gp/T/tmpakdFT0
and confirm that there are w
for the three sets of users? Alternatively, try running this with sudo
to see if that helps?
@rohit-gupta Perhaps it makes sense to have an input option to specify the folder for intermediate results? I am not a Python user, so I am not familiar with the exact behavior of mkdtemp()
.
from bhtsne.
The t-SNE algorithm doesn't run and it directly show the error: IOError: [Errno 2] No such file or directory: '/var/folders/92/8ty0c6392m773r5tbp4s9gy80000gp/T/tmpllxm8j/result.dat'. I think the problem is the input matrix so large, I will try it on some other machine.
from bhtsne.
Can you please copy-paste the full output? Does the data.dat
file get written by the Python wrapper?
from bhtsne.
Here is the full output
~/Projects/bhtsne (master*) $ python bhtsne.py -i ~/Projects/tsne_python/lan_uid_matrix_tsne1.txt -o ~/Dropbox/github/data/lan_uid_coordinate.txt -p 5 -d 2 -t 1 -v
Error: could not open data file.
Traceback (most recent call last):
File "bhtsne.py", line 233, in
exit(main(argv))
File "bhtsne.py", line 224, in main
verbose=argp.verbose, initial_dims=argp.initial_dims, use_pca=argp.use_pca, max_iter=argp.max_iter):
File "bhtsne.py", line 211, in run_bh_tsne
for result in bh_tsne(tmp_dir_path, verbose):
File "bhtsne.py", line 164, in bh_tsne
with open(path_join(workdir, 'result.dat'), 'rb') as output_file:
IOError: [Errno 2] No such file or directory: '/var/folders/92/8ty0c6392m773r5tbp4s9gy80000gp/T/tmpllxm8j/result.dat'
The problem is that it can not find the intermediate file result.data.
from bhtsne.
The way the code works is: (1) Python wrapper writes data.dat
, (2) binary runs t-SNE on data.dat
, (3) binary writes results into result.dat
, and (4) Python wrapper reads result.dat
. Therefore, we first need to determine in which step things go wrong. The output suggests the problem is actually in step 1. Can you confirm by checking whether or not data.dat
gets written?
from bhtsne.
The file data.dat has been written. But there is no result.dat. I think the problem happens on step 3.
from bhtsne.
I have run the data on another computer and I meet the same problem. Here is the output:
Traceback (most recent call last):
File "bhtsne.py", line 233, in
exit(main(argv))
File "bhtsne.py", line 224, in main
verbose=argp.verbose, initial_dims=argp.initial_dims, use_pca=argp.use_pca, max_iter=argp.max_iter):
File "bhtsne.py", line 206, in run_bh_tsne
init_bh_tsne(input_file, tmp_dir_path, no_dims=no_dims, perplexity=perplexity, theta=theta, randseed=randseed,verbose=verbose, initial_dims=initial_dims, use_pca=use_pca, max_iter=max_iter)
File "bhtsne.py", line 118, in init_bh_tsne
cov_x = np.dot(np.transpose(samples), samples)
MemoryError
Error: could not open data file.
Traceback (most recent call last):
File "bhtsne.py", line 233, in
exit(main(argv))
File "bhtsne.py", line 224, in main
verbose=argp.verbose, initial_dims=argp.initial_dims, use_pca=argp.use_pca, max_iter=argp.max_iter):
File "bhtsne.py", line 211, in run_bh_tsne
for result in bh_tsne(tmp_dir_path, verbose):
File "bhtsne.py", line 164, in bh_tsne
with open(path_join(workdir, 'result.dat'), 'rb') as output_file:
IOError: [Errno 2] No such file or directory: '/tmp/tmpiXA4u_/result.dat'
It still cannot find the result.dat and it also mentions MemoryError.
And this time, the data.dat file has not been written.
I think the problem is Python wrapper does not writer data.dat.
from bhtsne.
@lvdmaaten , just wanted to ask.. for those of us using the wrapper from within another python program.. we do not expect to see a data.dat
file.. since the data is being fed via the call to run_bh_tsne
correct?
from bhtsne.
The Python wrapper is writing a data.dat
file, and then calling the bh_tsne
binary. The binary writes the results in a results.dat
file, which the wrapper reads in. Afterwards, the wrapper deletes both .dat
files. So you would expect to see a data.dat
file whilst the binary is running.
The whole thing is pretty clunky... I've been meaning to change this to a FFI call, but I haven't got around to doing that yet.
from bhtsne.
Understood. I guess I'm not seeing a data.dat then.. but it's likely related to the numppy crash I'm experiencing in OSX ...
Edit : After installing openBLAS and compiling numpy, wrapper functions (except inside of jupyter notebook, but that's an understood limitation I think.)
one last thing @lvdmaaten FFI means https://cffi.readthedocs.io/en/latest/overview.html ?
from bhtsne.
Related Issues (20)
- Document the "gains" HOT 2
- There is no module called bhtsne.run_bh_tsne ??? HOT 1
- Help ME! thanks! HOT 1
- Usage of random generator(s) in the source HOT 2
- How can i visualize the image data like this? HOT 1
- bhtsne.py:135: ComplexWarning: Casting complex values to real discards the imaginary part HOT 1
- Butterfly effect HOT 3
- Can not use the python wrapper in Windows
- transposition based on input method HOT 3
- Why is the exact algorithm 10 times faster? HOT 8
- Dimension problem HOT 3
- Can't compile the .exe with visual studio 9.0 HOT 9
- Pytorch version? HOT 4
- python wrapper - Cost for each sample
- Performance difference to the old version HOT 1
- C API HOT 3
- Bhtsne for large datasets HOT 1
- Performance difference Windows/Ubuntu HOT 2
- t-SNE for Java/Scala/Kotlin/Clojure
- Is there a rule of thumb for the lower bound on the perplexity?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bhtsne.