Comments (2)
Fair enough. Was able to work around this. Thanks!
from meilisearch.
hello @runabol 👋
The effect you are seeing is caused by the search algorithm of the semantic search. The semantic search uses an Approximate Nearest-Neighbor (ANN) search algorithm to find the limit+offset
vectors that are the "closest" to the query vector.
Because the algorithm is approximate, increasing limit+offset
might result in "closer" vectors being found, which means that the order of results is not guaranteed to remain stable for two distinct values of limit+offset
.
As it is a design characteristic of ANN algorithms, and exact algorithms (kNN) don't scale with the number and size of vectors, this behavior is unlikely to be fixed.
As a workaround, you can set offset
to 0, set a large value for limit
, and then apply the pagination manually in frontend.
In the future, we might consider some mitigation to this behavior, where we would always, for example, consider the first few hundred results (as if limit+offset
would be equal to, say 1000) regardless of the actual value of limit+offset
. There's a correctness/performance tradeoff to measure here, though.
Thank you for the report ❤️
from meilisearch.
Related Issues (20)
- Internal document id don't do the difference between string and integer HOT 1
- Improve the tenant token error message
- Error while generating embeddings: user error: attempt to embed the following text in a configuration where embeddings must be user provided HOT 13
- Confusing matches in `_matchesPosition` when using phrase search HOT 2
- Tracking issue: Road to Vector store stabilization
- Allow custom headers in REST embedder configuration
- REST API parameter names and behavior are unclear
- Prepare `/similar` for stabilization, or create a new experimental feature for the `/similar` routes
- Review all error codes before stabilization
- Consider a more explicit behavior to designate the default embedder
- Reconsider the failure of the indexing process when a document doesn't explicitly declare its embeddings for a `userProvided` embedder.
- Make the `embeddings` parameter optional when specifying your embedder HOT 1
- Tasks stuck in `processing` state, lots of tasks HOT 1
- Incorrect `_matchesPosition` returned, cuts into a UTF-8 character HOT 1
- Hybrid search don't skip offset results when the keyword results are returned early HOT 1
- Federated search
- Tasks processing seems to block at some point
- Language settings
- Document DB compression
- Exp - Update documents with a function
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from meilisearch.