Comments (9)
Interesting. Looks like something in the custom query. There were some settings in there I wasn't 100% sure about. I don't have a test case for this scenario. So I'll go ahead and add one to see if I can reproduce it. Feel free to post your python script as well if that's not too difficult.
from elastiknn.
I'm able to reproduce. Working on a fix.
from elastiknn.
@flo-dhalluin I was able to fix my own repro of the issue in #159. So far it seems like I just wasn't accounting for Lucene's behavior when searching on a segment with deleted docs.
The fix is in this snapshot release: https://github.com/alexklibisz/elastiknn/releases/download/0.1.0-PRE35-PR159-SNAPSHOT/elastiknn-0.1.0-PRE35-PR159-SNAPSHOT_es7.6.2.zip
Can you try that on your end and let me know if it's working?
from elastiknn.
@flo-dhalluin I went ahead and merged the changes since it was definitely a bug that needed fixing anyways. Let me know when you've had a chance to try it on your end. No rush. Thanks!
from elastiknn.
Hi, I tested with PRE36 ( which I believe included the MR), I see another exception this time
"Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 21665 out of bounds for length 21664",
"at com.klibisz.elastiknn.lucene.ArrayHitCounter.increment(ArrayHitCounter.java:22) ~[?:?]",
"at org.apache.lucene.search.MatchHashesAndScoreQuery$1.countHits(MatchHashesAndScoreQuery.java:67) ~[?:?]",
"at org.apache.lucene.search.MatchHashesAndScoreQuery$1.scorer(MatchHashesAndScoreQuery.java:148) ~[?:?]",
"at org.apache.lucene.search.Weight.bulkScorer(Weight.java:181) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]",
"at org.elasticsearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:244) ~[elasticsearch-7.6.2.jar:7.6.2]",
"at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:195) ~[elasticsearch-7.6.2.jar:7.6.2]",
"at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:171) ~[elasticsearch-7.6.2.jar:7.6.2]",
"at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]",
"at org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:333) ~[elasticsearch-7.6.2.jar:7.6.2]",
I am attaching the script if you want to reproduce ( the zip file is large as it contains the sample data )
elastiknn.zip
from elastiknn.
Ah, an off-by-one. Thanks for trying it again. I'll take another look at it with your specific example tonight.
from elastiknn.
I'm able to repro locally. I think I know what's going on. Will hopefully have a PR in the next half hour or so.
from elastiknn.
@flo-dhalluin I made a PR. There will be a snapshot release from that PR once the CI build is done. Look for it over here: https://github.com/alexklibisz/elastiknn/releases (should have PR163 in the name). Now the script is running into another issue:
Traceback (most recent call last):
File "index_elastiknn.py", line 87, in <module>
id_r = exact_ids.pop()
KeyError: 'pop from an empty set'
I'm only indexing the first 5000 docs, so maybe after deleting some of the vectors it fails to find anything with score > 0.6?
from elastiknn.
@flo-dhalluin Closing this, but please feel free to re-open if this is still an issue.
from elastiknn.
Related Issues (20)
- Cross-build for Elasticsearch 7.x and 8.x HOT 11
- Stop publishing Scala and Java libraries
- Migrate to Scala 3
- JAVA api
- RecallSuite tests are extremely slow in Github Actions HOT 2
- Adding elastiknn as an extension in the Elastic cloud fails with releases 8.4.2.1 and 8.4.3.0 HOT 4
- Migrate documentation site to github pages HOT 1
- Integrate with Coveralls for test coverage
- Try PyLucene for ann-benchmarks implementation
- Upgrade ann-benchmarks to 8.6.2 (or latest)
- Try Vectors from Project Panama for vector similarity computations HOT 1
- Plugin [.installing-18148280304972249747] is missing a descriptor properties file HOT 1
- Run benchmarks in Github Actions on a standalone EC2 instance HOT 1
- Try vectors from Project Panama for LSH operations HOT 3
- can't create a mapping HOT 1
- Try quick select algorithm for KthGreatest implementation HOT 4
- Try resampling vectors to speed up L2LshModel
- Try getting rid of HashAndFreq to minimize allocations HOT 1
- Try re-using threadlocal arrays in ArrayHitCounter HOT 2
- Try caching the query vector's FloatVector segments when computing distance HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elastiknn.