Team Members: Rui CHEN, Shichao JIA, Jimin WANG, Zhuoyue WANG
- Python 3
- pandas
- numpy
- networkx 2.3
- pyecharts
- json
- datetime
- math
- seaborn
- matplotlib
- snapshot_selenium
- wordcloud
- nltk
- io
- PIL
- itertools
- scipy
- collections
tmdb_5000_movies.csv
- General data of 4803 movies with 20 feature columns
- budget
- genres
- homepage
- id
- keywords
- original_language
- original_title
- overview
- popularity
- production_companies
- production_countries
- release_date
- revenue
- runtime
- spoken_languages
- status
- tagline
- title
- vote_average
- vote_count
- General data of 4803 movies with 20 feature columns
tmdb_5000_credits.csv
- Credits data of 4803 movies with 4 feature columns
- movie_id
- title
- cast
- crew
- Credits data of 4803 movies with 4 feature columns
-
ntds_tmdb_data_explore.ipynb
- Preprocessing of original data, mainly including data cleaning and feature augmentation.
- Preliminary data exploration
-
ntds_tmdb_movie_graph.ipynb
- Pipeline of Movie Graph built by similarity of leading actors, including data preprocessing, network construction, data analysis and visualization.
-
ntds_tmdb_actor_graph.ipynb
- Pipeline of Actor Graph built by co-appearance, including data preprocessing, network construction, data analysis and visualization.
-
ntds_tmdb_crew_graph.ipynb
- Pipeline of Movie Graph built by similarity of main crew, including data preprocessing, network construction, data analysis and visualization.
chromedriver.exe
- This component is to ensure pyecharts work normally in jupyter lab by chrome.