Coder Social home page Coder Social logo

ansuman13 / bollywood-revenue-prediction Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yashraj9892/bollywood-revenue-prediction

0.0 0.0 0.0 3.15 MB

Predicting upcoming Bollywood movie Revenue and Verdict

Python 4.21% CSS 1.01% HTML 0.19% Jupyter Notebook 94.59%

bollywood-revenue-prediction's Introduction

Bollywood-Revenue-Prediction

Predicting upcoming bollywood movie prediction and Verdict

INTRODUCTION

The production of a movie begins with construction of a script and screenplay. Then the film-making team is assembled, filming locations are decided, and investors are attracted. Then filming occurs, with after effects and editing. Last is distribution. To support the investment decisions of a movie, a prediction of profitability has to be done before the actual production. Predictions made right after release may have more data and maybe more accurate, but it will be late for investors to make their decisions. In this project, we will apply machine learning algorithms such as linear regression, logistic regression, multiclass Naive Bayes, SVM and k-means to predict box office collection for movies. We will classify movies as ‘Flop’, ‘Hit’ and ‘Super hit’ on the basis of the return over investment the movie has made in the domestic box office i.e. if a movie was made on a budget of 40 crores and it grossed only 36 crores in the Indian box office then it will be classified as a ‘Losing’ movie, even if it might have more earnings worldwide.

Process

In this project, we define revenue prediction as a discrete prediction task using supervised learning. We have a set of training examples ( X(1), Y(1), X(2), Y(2), … , X(n), Y(n) ); where each X(i) = [ x1, x2, … , xn ]T corresponds to a vector of input features for a particular movie and each Y(i)∈ R6 is a categorical dependent variable corresponding to one of six possible revenue categories. We can model the relationship between X(i) and Y(i) in a variety of ways. During the experimentation phase of this project, we utilized five different types of models to categorize the input into a revenue category, and further we used the result from those categorizations to approximate the domestic box office collection of the movie represented by the input vector. Our dataset was collected from multiple sources after cross validation as the figures for Budget and Box Office Collection were varying over the multiple websites we were dependent on. In our final dataset, we had over 65% of the movies which generated less domestic collection than their budget. Our hypothesis for this learning problem is in a higher dimensional space than the number of samples we have available. We are victims of the Curse of Dimensionality. To deal with this issue we have used SVM with linear kernel, because the last thing we want is to project the data into an even higher dimensional space. A basic rule of thumb is mentioned NTU’s ‘A practical guide to support vectors classification’ which says: If the number of features is large, one may not need to map data to a higher dimensional space. That is, the nonlinear mapping does not improve the performance. Using the linear kernel is good enough, and one only searches for the parameter C.

Accuracy

Model Accuracy Recommendation
Linear Model 0.82 definitely
SVM linear kernel 0.53 can use
Naive Bayes 0.44 prefer this for hit/superhit
KNN 0.52 gives all flop - dont use
Logistic Regression 0.56 little better than knn - dont use
Randforest 0.56(Gini)/0.57(Entropy) same problem as KNN - dont use

Confusion Matrix

Model Confusion Matrix / Graph
KNN picture alt
Logistic Regression picture alt
Naive Bayes picture alt
SVM picture alt
Linear Regression picture alt

RESULT

Some test result are as follows:

Movie Result
Padmaavat picture alt
Piku picture alt
Raazi picture alt
Thugs_of_hindustan picture alt
Badaahi Ho picture alt
Pink picture alt
Zero picture alt
Stree picture alt
RockOn 2 picture alt

FLOW DIAGRAM

Flow_Diagram picture alt

ACCURACY

output = accuracy of (Naive Bayes)*accuracy(linear Regression) output = 36.82%

bollywood-revenue-prediction's People

Contributors

yashraj9892 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.