Comments (6)
I would definitely be interested in adding it, first of all, as a variant of the nearest neighbors algorithm (similarly to LSHForest
in scikit-learn
).
A couple of questions though concerning SIMD: how much of it leaks to the public interface? In other words, do I - as a user of the algorithm - have to be aware of SIMD-specific datatype when using the algorithm? Can that implementation complexity be hidden away from me?
from linfa.
@LukeMathWalker The reason that SIMD types are used in the public API as vectors for nearest neighbor is mostly due to alignment issues. By using the SIMD types I can ensure that Rust will always put them in vectors, structs, and the stack at the correct alignment. This could be hidden from the user, but I would need to specially take care of alignment issues manually in the code. I could create a wrapper type that just took in an array that creates a SIMD wrapper internally as necessary.
from linfa.
That would be ideal @vadixidav - I'd like to avoid each model forcing the user to learn a different input type. It would significantly complicate the usage of the overall crate.
from linfa.
Alright, I am a bit focused on some other open source stuff currently, but I will come back to this and simplify that API. Technically it could be used now easily given a speed penalty for not using SIMD, but I can make wrappers so we can get speed without exposing the SIMD types.
from linfa.
Alright, I have done my best to make it as easy as possible to interact with arrays from HNSW. It now has extensive impls of From
and Into
, including From
to convert from slices of u8
and f32
. While you do have to worry about how many bytes or floats you have, you now no longer need to interact with SIMD types beyond adding the packed_simd
crate so you can write HNSW<Euclidean<[u8x64; 2]>>
and typing out array.into()
to convert your array/slice types of f32
or u8
into Euclidean
or Hamming
types respectively.
https://docs.rs/hnsw/0.4.0/hnsw/struct.Hamming.html
If desired, I can even make type aliases to avoid needing the packed_simd
crate for specifying which HNSW you want. That would not be a breaking change.
Let me know if you need anything else to be done.
from linfa.
Pretty sure this issue can be closed, since we already have #119 for nearest-neighbours.
from linfa.
Related Issues (20)
- Add Serde to CountVectorizer, to export and import it HOT 6
- Would there be any interest in a Covariance module? HOT 2
- How I can save a logistic model for serving? HOT 6
- linfa-clustering depends on partitions 0.2.4, which throws future-incompatibilities warnings HOT 1
- Incompatibility of the partitions dependency with future versions of Rust HOT 3
- Shuffle dataset HOT 2
- Implementing serde Serialize/Deserialize for linfa_bayes crate models HOT 1
- Bump `argmin` dependency to 0.8 HOT 6
- Standard error estimation on OLS coefficients HOT 3
- linfa-kernel fails to compile HOT 10
- linfa-svm learning speed and memory allocation
- Dependency version thiserror in linfa hierarchical causes problems in real world projects
- Crates.io version of linfa_kernel fails to compile. HOT 1
- Would it be better to use the same API as scikit-learn HOT 2
- Serialize a trained model to disk? HOT 2
- Is the `nu` parameter used incorrectly, for svm regression? HOT 1
- [Proposal] Gaussian Mixture Models HOT 1
- Discrepancy in GMM covariance between sklearn and linfa HOT 1
- add tag and github release of version 0.7.0
- Let linfa_linear::LinearRegression support weights from a dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from linfa.