This project implements a movie recommendation system using content-based filtering. It leverages natural language processing (NLP) techniques and cosine similarity to recommend movies based on their textual content, specifically the combination of movie overviews and genres.
- Introduction
- Data
- Exploratory Data Analysis
- Data Preprocessing
- Text Vectorization
- Similarity Calculation
- Movie Recommendations Function
- Example Recommendations
- Model Saving and Loading
- Dependencies
- Usage
- Contributing
- License
The Movie Recommendation System uses content-based filtering to recommend movies based on their textual information, combining movie overviews and genres. It employs natural language processing (NLP) techniques and cosine similarity to determine movie similarity.
The dataset (top10K-TMDB-movies.csv
) contains information about movies, including id
, title
, overview
, and genre
.
The initial exploration involves displaying the first 10 rows, generating summary statistics, and checking for null values.
Data preprocessing includes selecting relevant columns (id
, title
, overview
, genre
), creating a new column tags
by concatenating overview
and genre
, and dropping irrelevant columns (overview
, genre
).
Text vectorization is performed using CountVectorizer
from sklearn.feature_extraction.text
to convert text data into numerical vectors.
Cosine similarity is calculated based on the vectorized tags, creating a similarity matrix.
A function named recommends
is defined to recommend movies based on user input.
An example demonstrates recommending movies for a specific input movie title, such as "Iron Man."
The processed data (new_data
) and the similarity matrix are saved using the pickle
module. Loading the model back into the program is also demonstrated.
- Python 3.x
- scikit-learn
- pandas
- numpy
- matplotlib
- seaborn
- Install the required dependencies:
pip install -r requirements.txt
- Run the Python script or Jupyter notebook.
Contributions are welcome! Feel free to open issues or submit pull requests.