Comments (15)
no plans but would love it if you want to show me how to do (or even better – send a pull request!)
from annoy.
@erikbern
http://eigen.tuxfamily.org
Eigen is one of promise linear algebra library. There is no dependencies and well tuned(SIMD, AVX, LAPACK..)
from annoy.
Doesn't seem like eigen has built-in functions for Euclidean distance and cosine distance though?
I guess I could compute three dot products using Eigen at
https://github.com/spotify/annoy/blob/master/src/annoylib.h#L119 and
https://github.com/spotify/annoy/blob/master/src/annoylib.h#L180
It seems like a bit of overkill to use Eigen only for dot products though?
from annoy.
Yes, we can get cosine similarity using Eigen's dot products and math library, it would be much faster.
I think container for feature vector could be changed by Eigen's DenceVector something similar..
https://github.com/spotify/annoy/blob/master/src/annoylib.h#L111
but, I agreed it seems like little a bit over-spec ;)
from annoy.
Eigen or other SIMD libs might be suitable for spotify but I had never used any before. I only code with specific intrinsics. Here is a code snippet to calculate l2 norm distance with avx2, hope this helps
// note that the input vectors are quantized and with fixed size
int l2_norm_avx(unsigned char * f1, unsigned char * f2) {
__m256i t = _mm256_setzero_si256();
int i;
for(i = 0; i < 128;i += 32) {
__m256i a = _mm256_loadu_si256((__m256i *) (f1 + i));
__m256i b = _mm256_loadu_si256((__m256i *) (f2 + i));
__m256i sublo = _mm256_sub_epi16(
_mm256_unpacklo_epi8(a, _mm256_setzero_si256()),
_mm256_unpacklo_epi8(b, _mm256_setzero_si256())
);
t = _mm256_add_epi32(
t,
_mm256_madd_epi16(sublo, sublo)
);
__m256i subhi = _mm256_sub_epi16(
_mm256_unpackhi_epi8(a, _mm256_setzero_si256()),
_mm256_unpackhi_epi8(b, _mm256_setzero_si256())
);
t = _mm256_add_epi32(
t,
_mm256_madd_epi16(subhi, subhi)
);
}
int temp[8];
_mm256_storeu_si256((__m256i*)temp, t);
return temp[0] + temp[1] + temp[2] + temp[3] + temp[4] + temp[5] + temp[6] + temp[7];
}
from annoy.
cool – seems very hard to read though.
i'm using -march=native for gcc which should enable it in the compiler right? i'm not sure if i need to tell the compiler to align right though
from annoy.
Compiler can optimize but it is hard to tell if it does the job right. You can try to write code like:
for(int i = 0; i < LEN; i += 4) {
data[i] = ...
data[i + 1] = ...
data[i + 2] = ...
data[i + 3] = ...
}
In my mind this might help the compiler to vectorize data(process 4 or 8 items in a loop), but I am not sure. And for checking it out, this might help: http://superuser.com/questions/726395/how-to-check-if-a-binary-requires-sse4-or-avx-on-linux
Still, a simd library would be a better choice:)
from annoy.
Saw this from kgraph btw: https://github.com/aaalgo/kgraph/blob/master/metric.cpp
from annoy.
In Eigen, just put "-lgomp -fopenmp -march=native -fPIC" will be enough for most cases, and actually, it's not that easy to optimize SIMD codes and make works for various cpu architectures and gccs. (Eigen does)
You don't need to digging SIMD instructions, Eigen is very easy to use. At the same time it's very fast.
- Cheat sheet: http://eigen.tuxfamily.org/dox/AsciiQuickReference.txt
- Benchmark: http://eigen.tuxfamily.org/index.php?title=Benchmark
If you want minimize refactoring cost, considering Map. It can map raw array data into Eigen container(Matrix, Array, Vectors..), so you can plugged-in Eigen codes into the place where you want without changing whole structures.
float raw_array[128];
Map<Matrix<float,16,8> > mm(raw_array);
from annoy.
Thanks a lot – i'll give it a shot!
What happens if the vector is not aligned?
from annoy.
I'm not sure but maybe.. it will not works, by the way, is there not aligned vector in annoy? If so, you will face to the problems..
This is documentation for alignment issues
It seems you need to replace allocator for aligned vector... and there is a way turn off checking alignment, though in that case, you don't take advantage of vectorization, it's big deal for performance.
Anyway, I look around annoylib.h a little bit, and now I'm sure Eigen would be helpful. Many functions written through for-loop, that means there are plenty of room for improvement through vectorization 😄 for example, distance function(https://github.com/spotify/annoy/blob/master/src/annoylib.h#L114) could be re-written one or two lines of code with Eigen like comment you written: a^2 / a^2 + b^2 / b^2 - 2ab/|a||b|
from annoy.
Also somewhat related: this saga of improving Annoy baseline benchmark using BLAS (vectorization + memory caches + parallelization).
Such optimizations had greater effect there than here (as brute force has to do more low-level work; memory access patterns matter more) but it will be exciting to see this stuff in Annoy! Erik is pushing Annoy to ever more greatness :)
from annoy.
I also wonder if just changing the definition of the Node class so that it's aligned against 128 bits (8 bytes) would be enough.
http://locklessinc.com/articles/vectorize/
The definition of Node is currently pretty dumb – it's 12 bytes and then the float* array
from annoy.
See #135
I ran some basic tests on my laptop and it actually seemed marginally slower (2.2s instead of 2.0s). Was only with f=100.
from annoy.
closing this as i'm pretty sure that gcc enables SIMD by default
from annoy.
Related Issues (20)
- How many trees should I use? HOT 2
- Memory Leak in Annoy (get_nns_by_vector)? HOT 8
- Annoy Object Not Pickle'able HOT 1
- Add sample weights to distance metric? HOT 3
- Source distribution not availabe for 1.17.2 version HOT 2
- Unable to inherit the AnnoyIndex class HOT 2
- doesn't work correctly if torch tensor is input. But also doesn't throw error. Pls add an assertion that this only takes np arrays not torch tensors HOT 2
- _Vector should use position-only parameter for the index HOT 3
- How do you reduce a vector to 2 coordinates HOT 1
- [Distance] What did I do wrong?
- [MSVC] Annoy failed to run test on Windows HOT 1
- Some segment faults HOT 1
- Regarding updating an existing ANNOY model HOT 2
- Anyone tried storing trees and nodes in DynamoDB? HOT 1
- Is there any workaround to be able to use the Chebyshev distance with this library? HOT 1
- from annoy import AnnoyIndex
- Annoy build failed in MSVC x86 mode
- Using a built Annoy tree in a different device HOT 1
- ? HOT 1
- problem building with python3.12 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annoy.