Coder Social home page Coder Social logo

yelp_nlp's Introduction

NLP Analysis of Yelp Restaurant Reviews

Ankur Vishwakarma

Metis SF Winter 2018


Project Abstract:

Opening and managing a restaurant in the United States is a difficult business. According to Business Insider, as many as 60% of restaurants close within the 1st year of opening. Yelp is a great source of positive and negative feedback from customers. However, it can be time consuming to read all reviews and that still would not allow a restaurant owner to compare feedback against other restaurants in the area.

This project aims to analyze restaurant reviews on Yelp via topic modeling to determine main topics for positive and negative reviews. Once each review can be assigned its topic weights, that information was averaged to assign topic weights for each restaurant.

With all restaurants and their reviews translated into a topic space, direct comparison of positive and negative review topics was made possible.


Data Source

  1. Data in JSON format was downloaded from Yelp and set up in MongoDB on an AWS instance.

Exploratory Data Analysis

  1. The 1st notebook (1_extracting_reviews.ipynb) reads the data from MongoDB.

  2. Of the 5.2 million reviews it contains, 130,000 restaurant reviews for the state of Ohio were kept for analysis.

  3. Lengths of the reviews were calculated and as the histogram below shows, all star ratings consisted of the same distribution of short, long, and medium-length reviews.

Tokenization

  • Code in the 2nd notebook (2_text_processing.ipynb).

Documents (yelp reviews) were tokenized using TFIDF and 1-grams, which gave the most consistent results and distinct topics. For topic modeling, the following approaches were used:

Notes
LSI Negative word weights in resulting topics were not easily interpretable for this application. Topic strengths from LSI were used to determine # of topics.
LDA Topics had lots of overlap in meaning among each other, which would be the result of relatively short document size.
NMF Gave most interpretable results and separable topics. Rest of the analysis was done using NMF.

There were 1 set of topics for positive reviews and 1 set of topics for negative reviews to see if different topics would show up in these spaces.

The elbow plot below shows the relative strengths of the topics and that 6 topics were chosen for the analysis.

Mapping All Reviews

  1. Map all reviews to topics using Non-negative Matrix Factorization (NMF).

  2. Normalize topic weights to sum to 1.

  3. Average topic distributions for all reviews of a restaurant to map that restaurant to the topic space (shown below).

  4. Save information to CSV files.

  5. Visualize data using Tableau.

Visualizing The Results

The data can be used to show topic strengths for positive & negative reviews and how they compare against other restaurants in Ohio.

The interactive visualization on Tableau public can be reached at this link. The screenshot below shows what the dashboard looks like for a random restaurant from this analysis.

yelp_nlp's People

Contributors

vishwacorp avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.