Coder Social home page Coder Social logo

Yelp dataset statistics about daisyrec HOT 4 CLOSED

amazingdd avatar amazingdd commented on May 26, 2024
Yelp dataset statistics

from daisyrec.

Comments (4)

AmazingDD avatar AmazingDD commented on May 26, 2024 1

Hi! I want to compare my model's results with your benchmark on Yelp dataset, but the dataset I found has different number of interactions/user/items with the numbers you report in the paper.

I downloaded the dataset from the site and kaggle and both sources has 160,585 items, 2,189,457 users and 8,635,403 interactions (as on the site). But you report 174,567 items , 1,326,101 users, and 5,261,669 interactions for origin Yelp in the paper (table 1). Papers says you considered all interactions with rating >=1 for the Yelp dataset and your code corresponds to this (no filtering by rating).

Could you please tell, where to get or how to prepare the data to get identical dataset and compare my model with your baselines?

Best regards!

and this is my preprocessed data used for this paper
the link is below:
https://drive.google.com/file/d/17BeoiY0aEZqwqsMpa_SNxBkRWK1EFPRm/view?usp=sharing

from daisyrec.

AmazingDD avatar AmazingDD commented on May 26, 2024

i think you can use the code in master branch. the code there is the original code we ran for our paper. The data_generator.py is the code I try to convert the raw data to satisfy my model. The load_rate module is quite different between master and dev. I hope this answer might be helpful for you.

from daisyrec.

monkey0head avatar monkey0head commented on May 26, 2024

and this is my preprocessed data used for this paper

Thank you, it is very helpful!

I was confused about the number of items, because you report more items than the number of items in the original dataset now. But I realised that the Yelp dataset is updating and may have been updated since you used it. They do not store the old versions.

from daisyrec.

AmazingDD avatar AmazingDD commented on May 26, 2024

and this is my preprocessed data used for this paper

Thank you, it is very helpful!

I was confused about the number of items, because you report more items than the number of items in the original dataset now. But I realised that the Yelp dataset is updating and may have been updated since you used it. They do not store the old versions.

yes, and my code just want everyone to judge the generated results on the same level, no matter what the raw dataset is.

from daisyrec.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.