Coder Social home page Coder Social logo

Comments (6)

LukeMathWalker avatar LukeMathWalker commented on July 30, 2024

I would definitely be interested in adding it, first of all, as a variant of the nearest neighbors algorithm (similarly to LSHForest in scikit-learn).

A couple of questions though concerning SIMD: how much of it leaks to the public interface? In other words, do I - as a user of the algorithm - have to be aware of SIMD-specific datatype when using the algorithm? Can that implementation complexity be hidden away from me?

from linfa.

vadixidav avatar vadixidav commented on July 30, 2024

@LukeMathWalker The reason that SIMD types are used in the public API as vectors for nearest neighbor is mostly due to alignment issues. By using the SIMD types I can ensure that Rust will always put them in vectors, structs, and the stack at the correct alignment. This could be hidden from the user, but I would need to specially take care of alignment issues manually in the code. I could create a wrapper type that just took in an array that creates a SIMD wrapper internally as necessary.

from linfa.

LukeMathWalker avatar LukeMathWalker commented on July 30, 2024

That would be ideal @vadixidav - I'd like to avoid each model forcing the user to learn a different input type. It would significantly complicate the usage of the overall crate.

from linfa.

vadixidav avatar vadixidav commented on July 30, 2024

Alright, I am a bit focused on some other open source stuff currently, but I will come back to this and simplify that API. Technically it could be used now easily given a speed penalty for not using SIMD, but I can make wrappers so we can get speed without exposing the SIMD types.

from linfa.

vadixidav avatar vadixidav commented on July 30, 2024

Alright, I have done my best to make it as easy as possible to interact with arrays from HNSW. It now has extensive impls of From and Into, including From to convert from slices of u8 and f32. While you do have to worry about how many bytes or floats you have, you now no longer need to interact with SIMD types beyond adding the packed_simd crate so you can write HNSW<Euclidean<[u8x64; 2]>> and typing out array.into() to convert your array/slice types of f32 or u8 into Euclidean or Hamming types respectively.

https://docs.rs/hnsw/0.4.0/hnsw/struct.Hamming.html

If desired, I can even make type aliases to avoid needing the packed_simd crate for specifying which HNSW you want. That would not be a breaking change.

Let me know if you need anything else to be done.

from linfa.

YuhanLiin avatar YuhanLiin commented on July 30, 2024

Pretty sure this issue can be closed, since we already have #119 for nearest-neighbours.

from linfa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.