Comments (3)
Hi @paraschakis,
You make a good point: We should at least make sure we provide the same functionality to people who do and do not use the pipeline. I'll add a more permanent solution to the issue tracker for our next release.
For now you can use the predict_and_remove_history
snippet below to obtain behavior consistent with that of the pipeline:
from recpack.algorithms import ItemKNN, Algorithm
from recpack.datasets import DummyDataset
from recpack.matrix import InteractionMatrix
from recpack.metrics import NDCGK
import recpack.pipelines
from recpack.scenarios import StrongGeneralization
from scipy.sparse import csr_matrix
d = DummyDataset()
im = d.load()
# Scenario without validation data, as we won't perform hyperparameter optimization
scenario = StrongGeneralization(frac_users_train=0.7, frac_interactions_in=0.8, validation=False)
scenario.split(im)
# Use RecPack without pipeline
algorithm = ItemKNN(K=10)
algorithm.fit(scenario.full_training_data)
X_test_in = scenario.test_data_in
def predict_and_remove_history(algorithm: Algorithm, X_in: InteractionMatrix) -> csr_matrix:
# Makes predictions and then filters the user history
X_pred = algorithm.predict(X_in)
X_pred = X_pred - X_pred.multiply(X_in.binary_values)
return X_pred
X_pred = predict_and_remove_history(algorithm, X_test_in)
ndcg = NDCGK(K=10)
ndcg.calculate(scenario.test_data_out.binary_values, X_pred)
# Use RecPack with pipeline
pipeline_builder = recpack.pipelines.PipelineBuilder('exp1')
pipeline_builder.add_algorithm('ItemKNN', params={'K': 10})
pipeline_builder.add_metric('NDCGK', 10)
pipeline_builder.set_data_from_scenario(scenario)
pipeline = pipeline_builder.build()
pipeline.run()
metrics = pipeline.get_metrics()
assert metrics.iloc[0,0] == ndcg.value
Hope this helps!
Lien
from recpack.
Hi @paraschakis,
You're absolutely right, no RecPack algorithm filters out items previously interacted with. The reason for that is that filtering them after is easy, adding them back when you need them is not.
On top of that we've found there are actually a lot of real world scenarios in which you might want to recommend things a user has previously interacted with.
However, in most offline experiments they are indeed filtered out. If you use the Pipeline
, it will filter out the items in the user's history passed to the predict method by default as a sort of post-processing step.
You can toggle this history filtering on and off by passing remove_history=True/False
in the __init__
, see: https://recpack.froomle.ai/generated/recpack.pipelines.Pipeline.html#recpack.pipelines.Pipeline.
Hope this answers your question!
Lien
from recpack.
Thanks for the explanation. Now I think I understand why I was getting different accuracy scores for the same configurations of algorithms/metrics when testing them in pipeline and outside pipeline. Frankly, this isn't very intuitive. I would expect history filtering to be the default behavior everywhere. Perhaps a provision of out-of-the-box post filter would help this issue?
from recpack.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from recpack.