Coder Social home page Coder Social logo

course-dprep / imdb-genres-ratings-investigation Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 6.65 MB

This repository is created to analyze the connection between genres of movies and their ratings on IMDb. In this project we try to find out if the genre of a movie influences the ratings of these movies and what genres might be superior in terms of ratings.

R 78.24% Makefile 21.76%
api-integration data-analysis-in-r data-cleaning github-repository movie-genres movie-ratings-data r

imdb-genres-ratings-investigation's Introduction

Open in Visual Studio Code

The influence of the genre on the average rating of a movie

"Does the genre of a movie influence the rating of this movie on IMDb?"

Motivation of research question

We would like to know if there is an effect of movie genres on the ratings of a movie. To answer this question, we will compare the average rating of two genres with each other. The result of this can be interesting for filmmakers, so they know which genre will get the highest ratings, which can influence how big of a financial success the movie will be. For example, a higher rating will lead to higher viewer satisfaction, leading people to recommend it to their friends. This leads to higher ticket sales and, thus, to a higher box office.

To answer our research question, we want to know if there is a significant difference between the rating of genres. Genres are the independent variable, and they are categorial, and most of the time, ANOVA is used when dealing with categorical variables. Furthermore, we want to know if there is a significant difference between groups that can not be concluded with the results of a linear regression and can with the results of an ANOVA. If there are significant differences between groups, post hoc tests will be used to examine these differences further

Method and results

Datasets

We used the following two datasets from IMDb:

  • title.basics.tsv.gz

  • title.ratings.tsv.gz

The two datasets were merged into a single dataset to answer the research question.

Variables

The merged dataset consists of four variables, two of which were relevant for answering the research question:

Variables Description
Genre The genre of the movie (27 different levels)
Rating The rating that users of IMDb gave to the movie

Research Method

An One-way Anova will be performed on the dataset to reach a conclusion about the influence of genres on ratings. We chose this form of analysis because we looking for differences in means of groups (genres) which is exactly what Anova can be used for. Genre will be used as the independent variable and rating as the dependent variable. To furhter invest the differences between the genres, the Anova was followed up with the function emmeans.

Results and interpretation

The ANOVA showed that there is a significant difference in ratings between genres (p <0.001). This result answered our research question already but we wanted to know which genres had the most posstive effect on ratings. The emmeans function compares the mean of one genre with the mean of another genre. For every combination it becomes clear which one of the genres has the biggest influence on the ratings. From the results of the emmeans it becomes clear that Game-Show has the highest average rating, followed by Documentary and History. When looking at the emmeans between genres, it also becomes clear that the genre Game-Show has the biggest influence on the ratings of movies where other genres exists next to Game-Show.

Repository Structure


└── src
   ├── data-analysis
   ├── data-preparation
├── README.md
├── makefile

Running instructions

Dependencies

install.packages("data.table")
install.packages("tidyverse")
install.packages("car")
install.packages("emmeans")
install.packages("xtable")
install.packages("readr")
  • For the makefile to work, R and make need to be made available in the system path

Running the code

To run the code provided in this repository, the following instructions can be followed:

  1. Open the terminal (MacOS) or open Gitbash (Windows)

  2. Clone this repository by typing in the following command in the terminal or command line:

    git clone https://github.com/course-dprep/IMDb-genres-ratings-investigation
  1. Set the working directory to the repository by typing the following command in the terminal or command line:
   cd IMDB-genres-ratings-investigation
  1. To run the code, type the following command:
    make
  1. When the code is running, multiple files will appear in the working directory. One of these files is a .pdf file and includes the final analysis of the data as a knitted Rmarkdown file to a pdf.

  2. To remove any of the raw data files, the following command can be typed in the terminal or the command line:

    make clean 

Resources

IMDb Non-commercial Datasets

Authors

This repository was created by team 5 for the course Data Preperation and Workflow Management taugth by Hannes Datta, at Tiburg university, as part of the Master's program Marketing Analytics. The repository is maintained by the members of Team 5:

Notes

  • make clean removes all unnecessary raw data files.
  • Tested under MacOS
  • IMPORTANT: In makefile, when using \ to split code into multiple lines, no space should follow \. Otherwise Gnu make aborts with error 193.
  • Many possible improvements remain. Comments and contributions are welcome!

imdb-genres-ratings-investigation's People

Contributors

chrisschellekens avatar github-classroom[bot] avatar irisberkvens avatar mauritsvanelteren avatar snijdershugo avatar

Watchers

 avatar  avatar

imdb-genres-ratings-investigation's Issues

points to address before submission

  • need a makefile at the root directory
  • integrate research motivation in readme, wipe pdf file
  • remove remaining template directories
  • cleaning datasets: be consistent in use of dplyr & tidyverse. Would drop any base R and data.table operation if you can use tidyverse instead.
  • make file in each submodule

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.