Comments (5)
Other language bindings (like tantivy-py) can hardly inherit the tantivy's Query struct due to different language implementations. For example, for tantivy-py, though pyO3 can turn python objects into rust's struct, we have to first define all the utility classes before tantivy-py can consume.
Out of curiosity, do you plan to pass a Python callable to this eventually? If so, I fear from personal experience this might prohibitively slow due to the overhead of getting the GIL and crossing the Python-Rust boundary.
from tantivy.
Out of curiosity, do you plan to pass a Python callable to this eventually? If so, I fear from personal experience this might prohibitively slow due to the overhead of getting the GIL and crossing the Python-Rust boundary.
Yes, that's my ultimate goal. Actually I have tried to compile the source in tantivy-py as follow:
@numba.njit
def score_add_10(score: float) -> float:
return score + 10
function_score_query = Query.function_score_query(
const_score_query, lambda _, score, __: score_add_10(score)
)
and the numba trick works. I understand that currently pyO3 needs you to acquire a GIL in Python, but I'm not sure if this is still the case in the future. If this PR passes, more investigations should be done on the performance issue when calling Python from Rust with JIT / other compiled code. Other than that, I think this feature should exist while providing templates for other languages are just an extra benefit.
Also, as ElasticSearch's documentation said (which also same as Lucene), the function score query should be called only after a majority of documents are filtered out. I also expect users only map the scoring after the retrieval stage as iterating through all the documents is slow.
from tantivy.
I understand that currently pyO3 needs you to acquire a GIL in Python, but I'm not sure if this is still the case in the future.
It certainly still does and even though we aware of nogil CPython builds, there are a lot of issues around that still unresolved, so I wouldn't hold my breath. This is particularly problematic as tantivy-py
explicitly releases the GIL during search (to allow multi-threaded server in a Python application) which means that invoking a callback does not just mean checking that the GIL is held, but really acquiring the lock bouncing it around all search threads in the worst case.
Other than that, I think this feature should exist while providing templates for other languages are just an extra benefit.
I did not add this to argue against the feature itself, just wanted to share some unhappy experiences trying to inject behaviour as Python code into Rust code.
from tantivy.
I think for integrating python code there could be some alternatives. The first thing come up in my mind is that we can create some pre-built 'function factory' that perform function currying, so user just plug-in their parameters and the function is executed in rust. Say users want a y = m * pow(score, n) + C
function.
For more complicated usecase, they might just create their own pyO3 distribution with the additional function signature that suits their case. Although this seems quite similar to implementing their own Query Struct, but still their work is much less that they don't need figure out the whole querying logic like Weight and Scorer.
from tantivy.
Closing
The business need is not strong enough, and the implementation details are not clear.
Please comment here if something like if you need something like this (scaling score in a non linear manner).
from tantivy.
Related Issues (20)
- Random Crash in Bitpacking/Columnar when Merging Segments HOT 3
- Highligh feature not work? HOT 1
- Any plan to support learned sparse vector search? HOT 3
- Implementing Block WAND optimization for more queries HOT 3
- Implement "minimum number should match" on BooleanQuery HOT 3
- Flaky Test test_cancel_cpu_intensive_tasks HOT 3
- Does tantivy::IndexWriter support multi-process? HOT 4
- Rayon thread pool abort on panic
- Isolate Aggregations
- parsing simple quote in query doesn't always give a sensible result
- allow escape in query string outside of quotes
- Concurrent commit failed in multi-process environment HOT 1
- Unique field HOT 1
- Track new FxHash Algorithm
- Fix inefficiency on multivalued but sparse column. HOT 1
- Add error handling for invalid CustomOrder in term aggregation
- monotonic mapping broken for `get_docids_for_value_range`
- Possible Codec Between SPARSE and DENSE: CHIMERA HOT 2
- keys should be increasing panic HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tantivy.