-Demo Link (heroku)
-Demo Link (AWS ec2)
-Nelson 2021.04.26
- Kugou
- NetEase
-Web
-category
Kugou_category.json
kuwo_category.json
NetEase_category.json
QQ_category.json
-config
setting.json
clean_data_ver1.py
clear_data_ver2.py
config.py
crawler.py
json_to_xlsx.py
mysql.py
- Step 1. 創建platform's category json檔案
- Step 2. 讀取category json檔,根據origin toplist ID抓取當周的排行榜音樂資訊
- Step 3. 抓每首歌的詳細信息 (Lyrics)
- Step 4. Data cleaning
- Step 5. Build Song Table
- Step 6. Build Intermediate_Table
$ python crawler.py --action all --xlsx False
--action platformname platform to be crawlered
--xlsx boolean Output xlsx or not,now just support QQ and NetEase
取四個platform共同欄位
- songname
- album_name
- duration
- singer name
- lyrics
- Increasing songID
- Building Song Table
- Building Intermediate_Table