Natural Language Processing Spring 2022 Final Project
Prediction of genres based on the plot of the movies
Tasks:
- Data crawling from IMDB
- Data pre-processing
- Data analysis
During this phase, we crawled around 31K
unique movies with different genres of IMDB, and pre-processed and cleaned the data for further phases down the road.
The data contains the following format:
Movie Title | Genres | Plot Summary |
---|
An example of the data we've crawled:
Movie Title | Genres | Plot Summary |
---|---|---|
Doctor Strange in the Multiverse of Madness | ['Action', ' Adventure', ' Fantasy'] | Doctor Strange teams up with a mysterious teenage girl from his dreams who can travel across multiverses, to battle multiple threats, including other-universe versions of himself, which threaten to wipe out millions across the multiverse. They seek help from Wanda the Scarlet Witch, Wong and others. |
Libraries used for data scrapping:
- BeautifulSoup
- Requests