Coder Social home page Coder Social logo

Comments (5)

adamreichold avatar adamreichold commented on July 28, 2024 1

Other language bindings (like tantivy-py) can hardly inherit the tantivy's Query struct due to different language implementations. For example, for tantivy-py, though pyO3 can turn python objects into rust's struct, we have to first define all the utility classes before tantivy-py can consume.

Out of curiosity, do you plan to pass a Python callable to this eventually? If so, I fear from personal experience this might prohibitively slow due to the overhead of getting the GIL and crossing the Python-Rust boundary.

from tantivy.

alex-au-922 avatar alex-au-922 commented on July 28, 2024

Out of curiosity, do you plan to pass a Python callable to this eventually? If so, I fear from personal experience this might prohibitively slow due to the overhead of getting the GIL and crossing the Python-Rust boundary.

Yes, that's my ultimate goal. Actually I have tried to compile the source in tantivy-py as follow:

@numba.njit
def score_add_10(score: float) -> float:
    return score + 10

function_score_query = Query.function_score_query(
    const_score_query, lambda _, score, __: score_add_10(score)
)

and the numba trick works. I understand that currently pyO3 needs you to acquire a GIL in Python, but I'm not sure if this is still the case in the future. If this PR passes, more investigations should be done on the performance issue when calling Python from Rust with JIT / other compiled code. Other than that, I think this feature should exist while providing templates for other languages are just an extra benefit.

Also, as ElasticSearch's documentation said (which also same as Lucene), the function score query should be called only after a majority of documents are filtered out. I also expect users only map the scoring after the retrieval stage as iterating through all the documents is slow.

from tantivy.

adamreichold avatar adamreichold commented on July 28, 2024

I understand that currently pyO3 needs you to acquire a GIL in Python, but I'm not sure if this is still the case in the future.

It certainly still does and even though we aware of nogil CPython builds, there are a lot of issues around that still unresolved, so I wouldn't hold my breath. This is particularly problematic as tantivy-py explicitly releases the GIL during search (to allow multi-threaded server in a Python application) which means that invoking a callback does not just mean checking that the GIL is held, but really acquiring the lock bouncing it around all search threads in the worst case.

Other than that, I think this feature should exist while providing templates for other languages are just an extra benefit.

I did not add this to argue against the feature itself, just wanted to share some unhappy experiences trying to inject behaviour as Python code into Rust code.

from tantivy.

alex-au-922 avatar alex-au-922 commented on July 28, 2024

I think for integrating python code there could be some alternatives. The first thing come up in my mind is that we can create some pre-built 'function factory' that perform function currying, so user just plug-in their parameters and the function is executed in rust. Say users want a y = m * pow(score, n) + C function.

For more complicated usecase, they might just create their own pyO3 distribution with the additional function signature that suits their case. Although this seems quite similar to implementing their own Query Struct, but still their work is much less that they don't need figure out the whole querying logic like Weight and Scorer.

from tantivy.

fulmicoton avatar fulmicoton commented on July 28, 2024

Closing

The business need is not strong enough, and the implementation details are not clear.
Please comment here if something like if you need something like this (scaling score in a non linear manner).

from tantivy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.