Comments (9)
Merge 6f0ca80 introduces the first version of the Python wrapper. Currently it supports dense data with double and single precision.
The performance on the Python random data benchmark is within roughly 10% of the C++ random data benchmark. The main slow-down seems to be the candidate comparison, where the wrapped code takes roughly 10 - 20% more time, at least on my laptop. Not clear why this is the case, but it might have to do with the fact that we are not using "standard" Eigen vectors but Eigen's Map functionality.
from falconn.
The performance issue was fixed in 5a69217 . The issue was related to passing Eigen vectors / memory maps correctly: http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html
from falconn.
Do we check dtype of the numpy array we pass? Would be nice to do it, if that's not the case yet.
from falconn.
We are not doing it on the C++ side (the C++ side doesn't really the the numpy object). The swig-generated numpy wrapper might do it automatically (swig does it for "normal" objects at least). Definitely something we should test though.
from falconn.
Another thing: not to forget to write IN CAPS that DON'T LET YOUR NUMPY ARRAY TO GET GC'ED! Otherwise, the C++ internals crash silently and that could be very confusing.
Would it be possible to enforce this or throw a more explicit exception if the memory is corrupt?
from falconn.
Agreed! (it's a bold item in the to-do list above)
A check on the C++ side would be great, but I don't know how to check if a given float or double pointer is valid. I did a quick search and it's not clear if this is possible.
Another workaround would be to write a thin wrapper on the Python side that stores a reference to the numpy array with the LSH table. Then things won't get garbage collected accidentally.
from falconn.
I just added the helper functions in 60cf7e4 .
So for the first version of the Python wrapper, only documentation (github wiki and the glove example) should be left now.
from falconn.
Should we have a thin wrapper around whatever swig produces? It would serve two purposes:
- detect
dtype
automatically and call the appropriate version of the construction function - hold a reference to the dataset so that it does not get garbage-collected
- (optionally) we can provide a flag for re-centering
I think it would substantially improve the user experience.
I can draft the first version, which we can review together.
from falconn.
Yes, I still think that a thin wrapper would be a good idea, especially for the dtype
issue. It might be good to keep it as "thin" as possible (small additional functionality) so that the C++ and Python versions don't diverge too much.
Maybe we should only do re-centering with issue #7. That issue is more for "implicit" re-centering (no additional copy of the data), but it would be good to clearly separate the two (e.g., by saying that construct_table
never copies the data set and offering a separate function for pre-processing that copies the data?)
from falconn.
Related Issues (20)
- How to return more points? HOT 2
- windows install falconn failure HOT 1
- Comment. Application to associative memory
- How to save the hash table? HOT 1
- Calculating similarity
- website certificate expired HOT 1
- The similarity Values
- Not returning specified number of neighbors
- Better explanation regarding windows
- something error in params.feature_hashing_dimension parameters HOT 1
- Unable to install python through repository
- The header-only library seems broken HOT 1
- Distance returned is more than 1 and it's also negated when using EuclideanSquared on normalized vectors
- Package is not be able to install inside python docker image
- Detailed steps for running code
- Why can get two vectors' similarity after random projection? HOT 1
- Compile error: error: unknown register name ‘%ymm14’ in ‘asm’
- How to get similar candidate?
- FALCONN KNN alternative to save as pickle? HOT 1
- The website does not load
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from falconn.