Comments (17)
https://www.bilibili.com/video/BV1Bf4y1T7jH/?spm_id_from=333.999.0.0&vd_source=eed5f0b8d964ee5e9f5bb8134e2cdc11
这个视频评论大概120条,只能爬取75条
from bilibili_comment_crawl.
推
from bilibili_comment_crawl.
你可以尝试把循环次数改大一些,并且调整停止策略,你可以比较上一次和这一次爬取评论之后,那个comment列表的长度有没有改变,如果有改变说明有新数据进来,如果没改变说明到头了。至于为什么数据无法爬取完毕,我在最开始设计的时候就指定了100个循环,还有每次都要保证是20条评论数据的停止策略可能不对。所以你可以尝试我给你说的这种方法进行修改
from bilibili_comment_crawl.
主要还是每次都必须有20条的评论数据这个停止策略有问题
from bilibili_comment_crawl.
for i in range(1, page_num):
pre_comment_length = len(comment)
responses = requests.get(url=url.format(i), headers=header, proxies=proxy).json()
for content in responses["data"]["replies"]:
comment.append(content["content"]["message"])
print("搜集到%d条评论" % (len(comment)))
# 爬虫退出策略,比对上一次的comment长度和这一次的长度,相等就退出
if len(comment) == pre_comment_length:
print("评论爬取完毕!!!")
break
else:
continue
from bilibili_comment_crawl.
这样的话应该会好一些,具体测试一下,如果有啥其他不对的也可以指出
from bilibili_comment_crawl.
我已更新了代码和文档你可以参考一下,如果还有什么bug可以继续提出
from bilibili_comment_crawl.
首先感谢作者的耐心回答,问问题之后我也尝试研究了评论数据json,感觉作者的代码,需要在往深处爬一层,responses['data']['replies'][0-19]["content"]["message"]是第一层的评论信息,但在该评论下还有很大一部信息,需要在嵌套一层循环进行爬取,数据为responses['data']['replies'][0-19]['replies'][0-19]["content"]["message"]且无法确定第二层具体有无评论,有多少评论?不知是否有更好的解决方法!!!
from bilibili_comment_crawl.
可以看一下json文件的格式吗?我貌似没有研究到
from bilibili_comment_crawl.
可以试试这条评论的数据,replies下面还有replies且有些不为空。
from bilibili_comment_crawl.
我也试着加过,但效果不好,且存在异常,会中途弹出!!!整不明白@~@!
from bilibili_comment_crawl.
还有一个想法,就是循环两百条的限制可否解除,但爬取一些极大的评论,如50000条时,因为只循环200,每次爬20,最多只能爬取4000条评论。
from bilibili_comment_crawl.
你这个好像不是那些up主发的那种视频把
from bilibili_comment_crawl.
他这个只能抓评论数据,没有写抓评论的评论的数据的功能
from bilibili_comment_crawl.
明白了
from bilibili_comment_crawl.
那数字200能否改成跑完全部后停止-0-
from bilibili_comment_crawl.
from bilibili_comment_crawl.
Related Issues (3)
- 问一下那个user agent怎么看 HOT 1
- 评论的获取 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bilibili_comment_crawl.