Coder Social home page Coder Social logo

data-movies's Introduction

The internet movie database, imdb.com, is a website devoted to collecting movie data supplied by studios and fan. It claims to be the biggest movie database on the web and is run by amazon. More about information imdb.com can be found online, including information about the data collection process.

IMDB makes their raw data available. Unfortunately, the data is divided into many text files and the format of each file differs slightly. To create one data file containing all the desired information these ruby scripts extract the relevant information and store in a database. Finally, this data is exported to csv to make it easier to import into data analysis packages.

The following text files were downloaded and used:

  • business.list. Total budget
  • genres.list. Genres that a movie belongs to (eg. comedy and action)
  • movies.list. Master list of all movie titles with year of production.
  • mpaa-ratings-reasons.list. MPAA ratings.
  • ratings.list. IMDB fan ratings.
  • running-times.list. Movie length in minutes.

Movies were selected for inclusion if they had a known length and had been rated by at least one IMDB user. The final output contains the following fields:

  • title. Title of the movie.
  • year. Year of release.
  • budget. Total budget (if known) in US dollars
  • length. Length in minutes.
  • rating. Average IMDB user rating.
  • votes. Number of IMDB users who rated this movie.
  • r1-10. Distribution of votes for each rating, to mid point of nearest decile: 0 = no votes, 4.5 = 1-9$%$ votes, 14.5 = 11-19$%$ of votes, etc. Due to rounding errors these may not sum to 100.
  • mpaa. MPAA rating.
  • action, animation, comedy, drama, documentary, romance, short. Binary variables representing if movie was classified as belonging to that genre.

data-movies's People

Contributors

hadley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-movies's Issues

No Genre or Actors/Actresses

Following the answer in #3, I was able to get this to run all the way through. The only issue is, as mentioned in #3, the 'Genres' and 'Ratings' tables are not working, and the Actors/Actresses are not actually added to the 'Movies' table. Here are screenshots of all three tables (in DB Browser for SQLite). I'm aware that the creator of this abandoned this project a long time ago, but I'm wondering if there is anybody out there who knows how to fix this? I'm a beginner to SQL and Ruby, but maybe it has something to do with the syntax of the 'INSERT INTO's in the Ruby script? Any help is greatly appreciated. Thank you very much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.