Comments (4)
Hi! I want to compare my model's results with your benchmark on Yelp dataset, but the dataset I found has different number of interactions/user/items with the numbers you report in the paper.
I downloaded the dataset from the site and kaggle and both sources has 160,585 items, 2,189,457 users and 8,635,403 interactions (as on the site). But you report 174,567 items , 1,326,101 users, and 5,261,669 interactions for origin Yelp in the paper (table 1). Papers says you considered all interactions with rating >=1 for the Yelp dataset and your code corresponds to this (no filtering by rating).
Could you please tell, where to get or how to prepare the data to get identical dataset and compare my model with your baselines?
Best regards!
and this is my preprocessed data used for this paper
the link is below:
https://drive.google.com/file/d/17BeoiY0aEZqwqsMpa_SNxBkRWK1EFPRm/view?usp=sharing
from daisyrec.
i think you can use the code in master branch. the code there is the original code we ran for our paper. The data_generator.py is the code I try to convert the raw data to satisfy my model. The load_rate module is quite different between master and dev. I hope this answer might be helpful for you.
from daisyrec.
and this is my preprocessed data used for this paper
Thank you, it is very helpful!
I was confused about the number of items, because you report more items than the number of items in the original dataset now. But I realised that the Yelp dataset is updating and may have been updated since you used it. They do not store the old versions.
from daisyrec.
and this is my preprocessed data used for this paper
Thank you, it is very helpful!
I was confused about the number of items, because you report more items than the number of items in the original dataset now. But I realised that the Yelp dataset is updating and may have been updated since you used it. They do not store the old versions.
yes, and my code just want everyone to judge the generated results on the same level, no matter what the raw dataset is.
from daisyrec.
Related Issues (8)
- about ctr prediction metric HOT 2
- Is DaisyRec going to have session-based recommenders? HOT 2
- Bug in CDAE model about out_activation HOT 1
- Paper availability? HOT 2
- the parameter test_method='ufo' in daisy.utils.splitter.split_test() HOT 1
- The problem with the <UNK> in the oitems variable returned by the skip-gram function in the Item2Vec model HOT 3
- Why can MRR be bigger than 1? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from daisyrec.