Coder Social home page Coder Social logo

edwardrha / yelpreviewprediction Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 100.48 MB

Project abandoned. Topic modeling and extraction of categorical scores for the restaurants based on the extracted topics.

Python 1.60% Jupyter Notebook 33.83% HTML 64.56%

yelpreviewprediction's Introduction

YelpReviewPrediction

By Terry Shih and InHo Rha

Data science project for CSCI 183. Using the data from Yelp reviews to predict the stars it will give. Changed project to topic modeling and extraction of categorical scores for the restaurants based on the extracted topics.

Steps to run our code:

Using: Python 2 and Jupyter Notebook Required packages: numpy, pandas, sklearn, matplotlib, gensim, nltk, pytagcloud (optional tool for creating word clouds from https://github.com/atizo/PyTagCloud)

Setup:

  1. Clone https://github.com/edwardrha/YelpReviewPrediction

  2. Download Yelp data from https://www.yelp.com/dataset/challenge and place the JSON files into the /dataset directory.

  3. Run the preprocessing.py from inside the /src directory. This will create and save the processed restaurant review files into the /dataset directory. WARNING: This step requires very high amount of RAM(64GB or higher). We do not recommend this step to be run and instead use the processed JSON uploaded to Google Drive here: https://drive.google.com/drive/folders/1gYgWxNDK_78notWqKFuRiujawdkB640r?usp=sharing

  4. Run the ClusterModel.py from inside the /src directory. This will create a CountVectorizer object, feature names object, train 6 LDA models for k=[10, 15, 20, 25, 30, 40] and pickle them into /models directory. It will also save the label predictions for the reviews into /dataset directory as txt files. NOTE: Training each LDA model takes around 30 minutes each per core. Time consuming.

  5. Now the Main.ipynb in the main directory is ready to be run.

Main.ipynb: By using the models and data created from the Setup process, demonstrates how we can predict the category of a review and use it to give the categorical rating for a chosen restaurant.

Labeling.ipynb: Contains the codes we used to examine the reviews from a cluster so we can manually label them.

Slides.ipynb: Contains the codes and slides used to create our presentation. Run in terminal: jupyter nbconvert Slides.ipynb --to slides --post serve To open the presentation slides.

yelpreviewprediction's People

Contributors

terrie9876 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.