Coder Social home page Coder Social logo

Write a Python wrapper about falconn HOT 9 CLOSED

falconn-lib avatar falconn-lib commented on May 17, 2024
Write a Python wrapper

from falconn.

Comments (9)

ludwigschmidt avatar ludwigschmidt commented on May 17, 2024

Merge 6f0ca80 introduces the first version of the Python wrapper. Currently it supports dense data with double and single precision.

The performance on the Python random data benchmark is within roughly 10% of the C++ random data benchmark. The main slow-down seems to be the candidate comparison, where the wrapped code takes roughly 10 - 20% more time, at least on my laptop. Not clear why this is the case, but it might have to do with the fact that we are not using "standard" Eigen vectors but Eigen's Map functionality.

from falconn.

ludwigschmidt avatar ludwigschmidt commented on May 17, 2024

The performance issue was fixed in 5a69217 . The issue was related to passing Eigen vectors / memory maps correctly: http://eigen.tuxfamily.org/dox/TopicFunctionTakingEigenTypes.html

from falconn.

ilyaraz avatar ilyaraz commented on May 17, 2024

Do we check dtype of the numpy array we pass? Would be nice to do it, if that's not the case yet.

from falconn.

ludwigschmidt avatar ludwigschmidt commented on May 17, 2024

We are not doing it on the C++ side (the C++ side doesn't really the the numpy object). The swig-generated numpy wrapper might do it automatically (swig does it for "normal" objects at least). Definitely something we should test though.

from falconn.

ilyaraz avatar ilyaraz commented on May 17, 2024

Another thing: not to forget to write IN CAPS that DON'T LET YOUR NUMPY ARRAY TO GET GC'ED! Otherwise, the C++ internals crash silently and that could be very confusing.

Would it be possible to enforce this or throw a more explicit exception if the memory is corrupt?

from falconn.

ludwigschmidt avatar ludwigschmidt commented on May 17, 2024

Agreed! (it's a bold item in the to-do list above)

A check on the C++ side would be great, but I don't know how to check if a given float or double pointer is valid. I did a quick search and it's not clear if this is possible.

Another workaround would be to write a thin wrapper on the Python side that stores a reference to the numpy array with the LSH table. Then things won't get garbage collected accidentally.

from falconn.

ludwigschmidt avatar ludwigschmidt commented on May 17, 2024

I just added the helper functions in 60cf7e4 .

So for the first version of the Python wrapper, only documentation (github wiki and the glove example) should be left now.

from falconn.

ilyaraz avatar ilyaraz commented on May 17, 2024

Should we have a thin wrapper around whatever swig produces? It would serve two purposes:

  • detect dtype automatically and call the appropriate version of the construction function
  • hold a reference to the dataset so that it does not get garbage-collected
  • (optionally) we can provide a flag for re-centering

I think it would substantially improve the user experience.

I can draft the first version, which we can review together.

from falconn.

ludwigschmidt avatar ludwigschmidt commented on May 17, 2024

Yes, I still think that a thin wrapper would be a good idea, especially for the dtype issue. It might be good to keep it as "thin" as possible (small additional functionality) so that the C++ and Python versions don't diverge too much.

Maybe we should only do re-centering with issue #7. That issue is more for "implicit" re-centering (no additional copy of the data), but it would be good to clearly separate the two (e.g., by saying that construct_table never copies the data set and offering a separate function for pre-processing that copies the data?)

from falconn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.