Yelp dataset statistics about daisyrec HOT 4 CLOSED

amazingdd commented on May 26, 2024

Yelp dataset statistics

from daisyrec.

Comments (4)

AmazingDD commented on May 26, 2024 1

Hi! I want to compare my model's results with your benchmark on Yelp dataset, but the dataset I found has different number of interactions/user/items with the numbers you report in the paper.

I downloaded the dataset from the site and kaggle and both sources has 160,585 items, 2,189,457 users and 8,635,403 interactions (as on the site). But you report 174,567 items , 1,326,101 users, and 5,261,669 interactions for origin Yelp in the paper (table 1). Papers says you considered all interactions with rating >=1 for the Yelp dataset and your code corresponds to this (no filtering by rating).

Could you please tell, where to get or how to prepare the data to get identical dataset and compare my model with your baselines?

Best regards!

and this is my preprocessed data used for this paper
the link is below:
https://drive.google.com/file/d/17BeoiY0aEZqwqsMpa_SNxBkRWK1EFPRm/view?usp=sharing

from daisyrec.

AmazingDD commented on May 26, 2024

i think you can use the code in master branch. the code there is the original code we ran for our paper. The data_generator.py is the code I try to convert the raw data to satisfy my model. The load_rate module is quite different between master and dev. I hope this answer might be helpful for you.

from daisyrec.

monkey0head commented on May 26, 2024

and this is my preprocessed data used for this paper

Thank you, it is very helpful!

I was confused about the number of items, because you report more items than the number of items in the original dataset now. But I realised that the Yelp dataset is updating and may have been updated since you used it. They do not store the old versions.

from daisyrec.

AmazingDD commented on May 26, 2024

and this is my preprocessed data used for this paper

Thank you, it is very helpful!

I was confused about the number of items, because you report more items than the number of items in the original dataset now. But I realised that the Yelp dataset is updating and may have been updated since you used it. They do not store the old versions.

yes, and my code just want everyone to judge the generated results on the same level, no matter what the raw dataset is.

from daisyrec.

Yelp dataset statistics about daisyrec HOT 4 CLOSED

Comments (4)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent