Coder Social home page Coder Social logo

Duplicate vector handling about n2 HOT 3 CLOSED

kakao avatar kakao commented on May 17, 2024
Duplicate vector handling

from n2.

Comments (3)

gony-noreply avatar gony-noreply commented on May 17, 2024

does n2 handles duplicate vectors without performance degradation or recall issues?

Yes, since version 0.1.7

The HNSW algorithm doesn't work efficiently on duplicate vectors. We thought this was because the heuristic neighbor selection algorithm focused only on navigation. With the heuristic neighbor selection, duplicate or near-duplicate vectors are hidden and
search becomes difficult, resulting in a low recall.

To solve this, we modified the heuristic neighbor selection algorithm and improved it in a form that has some nearest neighbors but does not degrade navigation performance.

Below is one of the benchmarks measured for the 0.1.7 release, and GIST has duplicate vectors(about 2% of train vectors are duplicated)

image
You can see a high recall compared to N2 version 0.1.6

Handling duplicate vectors have a tradeoff relationship with navigation performance, the way we handled it may not be optimal. So we are continuing to work to find if there is a better way.

from n2.

fjsj avatar fjsj commented on May 17, 2024

It's awesome that you're tackling this problem, thank you very much for the detailed response. Please feel free to close the issue if you wish.

from n2.

gony-noreply avatar gony-noreply commented on May 17, 2024

If we found another achievement for that problem, I'll comment here.

from n2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.