Coder Social home page Coder Social logo

maomao622 / 163musicspider Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nacedwang/163musicspider

0.0 0.0 0.0 12.33 MB

一个获取网易云音乐歌手、专辑、歌曲、评论、歌词等数据的Python爬虫

License: GNU General Public License v3.0

Python 100.00%

163musicspider's Introduction

163MusicSpider

一个获取网易云音乐歌手、专辑、歌曲、评论、歌词等数据的Python爬虫

会将获取的数据存储至mysql,并通过将url保存至 redis 做防重复爬取处理(网易云OS:你不要过来啊.jpg)


本爬虫未使用Scrapy框架,适合初学者查看使用

作者也是一名python初学者

文件信息

  1. 爬取所有的歌手信息 artists.py

  2. 爬取专辑信息 album_by_artist.py

  3. 爬取歌曲信息 music_by_album.py

    爬的多了可能被禁

  4. 爬取歌词信息 lyric_by_music.py

  5. 爬取评论信息(热评+前1000条) comments_by_music.py

  6. 建表sql db.sql

  7. 评论词云分析 word_cloud_by_comment.py

  8. 评论词云分析结果 commentCloud.png

使用方法

  • 爬取所有数据

    运行main.py
  • 爬取单独数据

    到src目录下 找到 【文件信息】 里的相应文件,解开最后两行的注释,运行
  • 线程数设置

    每个文件都有一个 pool = ProcessPoolExecutor(5) 这个数字就是并发线程数,如果设置的过大会引起网易云的发爬虫机制,导致爬取失败

爬取的时候注意下并发数,给网易云减少些服务器压力

更新计划

  1. 爬取评论信息里的关联评论,能够使评论形成关联的故事
  2. 爬取歌单信息,及歌单里包含的音乐
  3. 形成数据分析报表,如:歌手歌曲排行、热门评论排行、用户评论排行、热门词汇、歌曲/歌手评论排行榜

参考

RitterHou/music-163

python教程

Archiewyq/music_163

感谢以上对本工程的帮助和支持

统计信息

截止2019-09-11

已爬取33439歌手信息 ,290772专辑信息 , 1802483歌曲信息 ,622398评论信息 ,2W+歌词信息

更新记录

时间 内容 备注
2019-08-28 增加redis,防重复爬取,增加网易云api文档
2019-09-11 增加评论词云分析,及60W+评论分析结果

163musicspider's People

Contributors

nacedwang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.