Coder Social home page Coder Social logo

Recommending 'consumed' items about recpack HOT 3 OPEN

lienm avatar lienm commented on July 18, 2024
Recommending 'consumed' items

from recpack.

Comments (3)

LienM avatar LienM commented on July 18, 2024 1

Hi @paraschakis,

You make a good point: We should at least make sure we provide the same functionality to people who do and do not use the pipeline. I'll add a more permanent solution to the issue tracker for our next release.

For now you can use the predict_and_remove_history snippet below to obtain behavior consistent with that of the pipeline:

from recpack.algorithms import ItemKNN, Algorithm
from recpack.datasets import DummyDataset
from recpack.matrix import InteractionMatrix
from recpack.metrics import NDCGK
import recpack.pipelines
from recpack.scenarios import StrongGeneralization

from scipy.sparse import csr_matrix

d = DummyDataset()
im = d.load()
# Scenario without validation data, as we won't perform hyperparameter optimization
scenario = StrongGeneralization(frac_users_train=0.7, frac_interactions_in=0.8, validation=False)
scenario.split(im)

# Use RecPack without pipeline
algorithm = ItemKNN(K=10)
algorithm.fit(scenario.full_training_data)
X_test_in = scenario.test_data_in

def predict_and_remove_history(algorithm: Algorithm, X_in: InteractionMatrix) -> csr_matrix:
    # Makes predictions and then filters the user history 
    X_pred = algorithm.predict(X_in)
    X_pred = X_pred - X_pred.multiply(X_in.binary_values)
    return X_pred


X_pred = predict_and_remove_history(algorithm, X_test_in)
ndcg = NDCGK(K=10)
ndcg.calculate(scenario.test_data_out.binary_values, X_pred)


# Use RecPack with pipeline
pipeline_builder = recpack.pipelines.PipelineBuilder('exp1')
pipeline_builder.add_algorithm('ItemKNN', params={'K': 10})
pipeline_builder.add_metric('NDCGK', 10)
pipeline_builder.set_data_from_scenario(scenario)
pipeline = pipeline_builder.build()
pipeline.run()
metrics = pipeline.get_metrics()


assert metrics.iloc[0,0] == ndcg.value

Hope this helps!
Lien

from recpack.

LienM avatar LienM commented on July 18, 2024

Hi @paraschakis,

You're absolutely right, no RecPack algorithm filters out items previously interacted with. The reason for that is that filtering them after is easy, adding them back when you need them is not.
On top of that we've found there are actually a lot of real world scenarios in which you might want to recommend things a user has previously interacted with.

However, in most offline experiments they are indeed filtered out. If you use the Pipeline, it will filter out the items in the user's history passed to the predict method by default as a sort of post-processing step.
You can toggle this history filtering on and off by passing remove_history=True/False in the __init__, see: https://recpack.froomle.ai/generated/recpack.pipelines.Pipeline.html#recpack.pipelines.Pipeline.

Hope this answers your question!
Lien

from recpack.

paraschakis avatar paraschakis commented on July 18, 2024

Thanks for the explanation. Now I think I understand why I was getting different accuracy scores for the same configurations of algorithms/metrics when testing them in pipeline and outside pipeline. Frankly, this isn't very intuitive. I would expect history filtering to be the default behavior everywhere. Perhaps a provision of out-of-the-box post filter would help this issue?

from recpack.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.