Comments (3)
I see. Thanks!
It seems there are only two options to solve this:
- use single-threaded construction.
- setting high ef/efConstruction values, so the search will be almost exact.
There is a potential fix that can stabilize the randomness to some extent - setting the element levels before the actual insertion (it would require updating bindings), but it will not solve the problem completely.
I think that hnsw in faiss (e.g. https://github.com/facebookresearch/faiss/blob/master/benchs/bench_hnsw.py) works that way. You can try it (although it is generally slower than hnswlib at fixed accuracy).
from hnswlib.
Hi @sumsuddin,
Can you please provide a demo script to understand what is going on?
from hnswlib.
I can't share the private data that I was working on. But here is a randomly generated numpy
array that I saved in a file. I attached the saved file here so that you can investigate.
# Generating sample data
#data = np.float32(np.random.random((num_elements, dim)))
#np.savetxt('data.txt', data)
data = np.loadtxt('data.txt')
For this specific random number combination (attached file) I get following two different recall accuracy randomly in different run.
Recall for two batches: 0.99990000000000001 (this happens rarely)
Recall for two batches: 1.0 (I mostly get this one)
Increasing the item size makes the issue more obvious in my experiments.
num_elements = 100000
I guess you can find easier ways to regenerate the issue.
Thanks for your time.
Python version : Python 2.7.6
OS: Ubuntu 14.04.5 LTS
from hnswlib.
Related Issues (20)
- hamming distance support
- Poetry add fails w/ error caused by "cc1plus: some warnings being treated as errors"
- Regarding the identification of elements with a high contribution rate in the inspection vector. HOT 3
- How can I ensure consistent results every time I run the code? I get different results each time I run the code HOT 7
- How to use C++ to run this code?
- ability to have a custom distance function
- Question : Implementing disk paging for index larger than memory
- Question: Single vs. Multiple HNSW Graphs for Nearest Neighbor Searches Across Multiple Companies? HOT 1
- Error: ‘_mm512_reduce_add_ps’ was not declared in this scope HOT 1
- CMakeLists needs a fix for some hardcoded flags HOT 2
- Question: Is index build time unstable? HOT 2
- Sharing: Single vs Multiple HNSW graph && Impact of parameters on HNSW search
- Purpose of label_lookup_ HOT 1
- InnerProduct possible BUG: 1.0 - sum(x * y). In faiss and milvus, its just sum(x * y)
- Illegal instruction (core dumped) when importing hnswlib HOT 1
- The IP space [1.0 - sum(AiBi)] might be inaccurate. Why not separate the IP from the cosine, and define the IP simply as [- sum(AiBi)]? HOT 1
- Loaded index query error
- Normalization of CLIP Vectors Before Storing in Milvus for Cosine Similarity HOT 1
- About checkIntegrity HOT 1
- How do I use hnswlib for ColBERT's set of vectors? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hnswlib.