Coder Social home page Coder Social logo

tigeryi1998 / collaborative-filtering-using-als-for-movie-recommendation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from abhilashhn1993/collaborative-filtering-using-als-for-movie-recommendation

0.0 0.0 0.0 69 KB

A movie recommendation system designed with ALS algorithm with Matrix factorization on the user ratings data from the movie-lens dataset

Jupyter Notebook 100.00%

collaborative-filtering-using-als-for-movie-recommendation's Introduction

movie-recommendation-engine

A Context based Recommendation system for Big Data setting to recommend movies and TV shows for users.

Movie recommendations for users

TABLE OF CONTENTS

OBJECTIVE

The main objective of the project is to design a full fledge custom movie-recommendation engine for the users, the other key objectives are

  1. Design a content-based recommendation system that provides movie recommendations to users based on movie genres
  2. Implement a collaborative-filtering approach to recommend movies to users

DATA

In view of achieving the core objectives using multiple approaches, two different data sources were referred.

  1. Movielens data: Consisting of 27 million instances of movie ratings provided by users

  2. Movies metadata: Movie metadata with 24 features capturing various details about the film

TECHNOLOGIES

Python - Spark, pyspark, sklearn, nltk, scikit learn, pandas, matplotlib, seaborn

ALGORITHMS

  • Collaborative Filtering using ALS algorithm
  • Content based filtering using k-means clustering

IMPLEMENTATION

Collaborative filtering using ALS algorithm:

Collaborative filtering technique allows filtering out items that a user might like by leveraging the ratings of similar users. The underlying assumption in recommendation using collaborative filtering is that, if the user A and user B share a similar response (movie rating in our case) to a movie, then they are likely to share a similar response to any movie X, compared to any random user.

  • Employed the model-based system of performing collaborative filtering on the MovieLens dataset.
  • Implemented Alternating Least Square(ALS) with Spark. ALS is a matrix factorization technique to perform collaborative filtering. The objective function of ALS uses L1 regularization and optimizes the loss functions using Gradient Descent.
  • The dataset contained movie_id and user_ratings in the format of a user-rating matrix shown as factors as given below:

Capture1

Here, d would be the number of features we learned from each user and movie association. With ALS, we intend to minimize the error in the matrix calculation shown below:

Capture1

And the error is given by the below equation:

Capture1

We train the ALS model by tuning the below hyper-parameters:

  • Rank: Indicating the number of latent factors generated in matrix factorization
  • regParam: The L1-regularization parameter used in ALS algorithm
  • maxIter: The maximum number of iterations the algorithm is run

After tuning the parameters and implementing ALS with Cross validation an optimal RMSE value of 0.8037 for 30 latent factors at the regParam value of 0.05 in 10 iterations.

Below are the resulting movie predictions made by the tuned ALS model on the test data

Capture1

Refer to this link for code - Collaborative filtering using ALS

Context-based filtering using k-means clustering:

  • Used the movies-metadata file with 45k instances and 24 features. In view of capturing the content-based information for a given movie, the feature 'Overview' which provides the description about the genre as well as the plot of the film
  • The description containing a paragraph with average 50-70 words was cleaned to remove whitespaces and stopwords were removed
  • The text data is then input to compute TF-IDF scores and the corresponding TF-IDF matrix is generated
  • The scores are used to group similar movies (content with similar scores) into clusters
  • These clusters provide recommendations to user

Below is a sample output of movie recommendations provided by the k-means clustering

Capture1

Refer to this link for code: Context-based filtering using k-means clustering

RESULTS

The movie recommendation system has shown tremendous potential. Movie recommendations have been pretty accurate for specific users, and movie titles have been successfully segmented into clusters based on their overview content. In the future scope, I plan to extend project to build recommender systems for TV shows

REFERENCES

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.