Coder Social home page Coder Social logo

Comments (9)

alexklibisz avatar alexklibisz commented on May 22, 2024

Interesting. Looks like something in the custom query. There were some settings in there I wasn't 100% sure about. I don't have a test case for this scenario. So I'll go ahead and add one to see if I can reproduce it. Feel free to post your python script as well if that's not too difficult.

from elastiknn.

alexklibisz avatar alexklibisz commented on May 22, 2024

I'm able to reproduce. Working on a fix.

from elastiknn.

alexklibisz avatar alexklibisz commented on May 22, 2024

@flo-dhalluin I was able to fix my own repro of the issue in #159. So far it seems like I just wasn't accounting for Lucene's behavior when searching on a segment with deleted docs.

The fix is in this snapshot release: https://github.com/alexklibisz/elastiknn/releases/download/0.1.0-PRE35-PR159-SNAPSHOT/elastiknn-0.1.0-PRE35-PR159-SNAPSHOT_es7.6.2.zip

Can you try that on your end and let me know if it's working?

from elastiknn.

alexklibisz avatar alexklibisz commented on May 22, 2024

@flo-dhalluin I went ahead and merged the changes since it was definitely a bug that needed fixing anyways. Let me know when you've had a chance to try it on your end. No rush. Thanks!

from elastiknn.

flo-dhalluin avatar flo-dhalluin commented on May 22, 2024

Hi, I tested with PRE36 ( which I believe included the MR), I see another exception this time

"Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 21665 out of bounds for length 21664",
"at com.klibisz.elastiknn.lucene.ArrayHitCounter.increment(ArrayHitCounter.java:22) ~[?:?]",
"at org.apache.lucene.search.MatchHashesAndScoreQuery$1.countHits(MatchHashesAndScoreQuery.java:67) ~[?:?]",
"at org.apache.lucene.search.MatchHashesAndScoreQuery$1.scorer(MatchHashesAndScoreQuery.java:148) ~[?:?]",
"at org.apache.lucene.search.Weight.bulkScorer(Weight.java:181) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]",
"at org.elasticsearch.search.internal.ContextIndexSearcher$1.bulkScorer(ContextIndexSearcher.java:244) ~[elasticsearch-7.6.2.jar:7.6.2]",
"at org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:195) ~[elasticsearch-7.6.2.jar:7.6.2]",
"at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:171) ~[elasticsearch-7.6.2.jar:7.6.2]",
"at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445) ~[lucene-core-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:14]",
"at org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:333) ~[elasticsearch-7.6.2.jar:7.6.2]",

I am attaching the script if you want to reproduce ( the zip file is large as it contains the sample data )
elastiknn.zip

from elastiknn.

alexklibisz avatar alexklibisz commented on May 22, 2024

Ah, an off-by-one. Thanks for trying it again. I'll take another look at it with your specific example tonight.

from elastiknn.

alexklibisz avatar alexklibisz commented on May 22, 2024

I'm able to repro locally. I think I know what's going on. Will hopefully have a PR in the next half hour or so.

from elastiknn.

alexklibisz avatar alexklibisz commented on May 22, 2024

@flo-dhalluin I made a PR. There will be a snapshot release from that PR once the CI build is done. Look for it over here: https://github.com/alexklibisz/elastiknn/releases (should have PR163 in the name). Now the script is running into another issue:

Traceback (most recent call last):
  File "index_elastiknn.py", line 87, in <module>
    id_r = exact_ids.pop()
KeyError: 'pop from an empty set'

I'm only indexing the first 5000 docs, so maybe after deleting some of the vectors it fails to find anything with score > 0.6?

from elastiknn.

alexklibisz avatar alexklibisz commented on May 22, 2024

@flo-dhalluin Closing this, but please feel free to re-open if this is still an issue.

from elastiknn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.