Coder Social home page Coder Social logo

Comments (17)

Ciment0 avatar Ciment0 commented on June 12, 2024 1

是的,但差不多的。
https://api.bilibili.com/x/v2/reply/main?csrf=b32c8adce0accfb6817b845e74674b08&mode=3&next=0&oid=379150378&plat=1&seek_rpid=&type=1

https://www.bilibili.com/video/BV1Bf4y1T7jH/?spm_id_from=333.999.0.0&vd_source=eed5f0b8d964ee5e9f5bb8134e2cdc11
这个视频评论大概120条,只能爬取75条

from bilibili_comment_crawl.

Ciment0 avatar Ciment0 commented on June 12, 2024

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

你可以尝试把循环次数改大一些,并且调整停止策略,你可以比较上一次和这一次爬取评论之后,那个comment列表的长度有没有改变,如果有改变说明有新数据进来,如果没改变说明到头了。至于为什么数据无法爬取完毕,我在最开始设计的时候就指定了100个循环,还有每次都要保证是20条评论数据的停止策略可能不对。所以你可以尝试我给你说的这种方法进行修改

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

主要还是每次都必须有20条的评论数据这个停止策略有问题

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024
for i in range(1, page_num):
    pre_comment_length = len(comment)
    responses = requests.get(url=url.format(i), headers=header, proxies=proxy).json()
    for content in responses["data"]["replies"]:
        comment.append(content["content"]["message"])
    print("搜集到%d条评论" % (len(comment)))
    # 爬虫退出策略,比对上一次的comment长度和这一次的长度,相等就退出
    if len(comment) == pre_comment_length:
        print("评论爬取完毕!!!")
        break
    else:
        continue

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

这样的话应该会好一些,具体测试一下,如果有啥其他不对的也可以指出

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

我已更新了代码和文档你可以参考一下,如果还有什么bug可以继续提出

from bilibili_comment_crawl.

Ciment0 avatar Ciment0 commented on June 12, 2024

首先感谢作者的耐心回答,问问题之后我也尝试研究了评论数据json,感觉作者的代码,需要在往深处爬一层,responses['data']['replies'][0-19]["content"]["message"]是第一层的评论信息,但在该评论下还有很大一部信息,需要在嵌套一层循环进行爬取,数据为responses['data']['replies'][0-19]['replies'][0-19]["content"]["message"]且无法确定第二层具体有无评论,有多少评论?不知是否有更好的解决方法!!!

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

可以看一下json文件的格式吗?我貌似没有研究到

from bilibili_comment_crawl.

Ciment0 avatar Ciment0 commented on June 12, 2024

https://api.bilibili.com/x/v2/reply/main?csrf=40a227fcf12c380d7d3c81af2cd8c5e8&mode=3&next=0&oid=1082731&plat=1&type=1

可以试试这条评论的数据,replies下面还有replies且有些不为空。

from bilibili_comment_crawl.

Ciment0 avatar Ciment0 commented on June 12, 2024

我也试着加过,但效果不好,且存在异常,会中途弹出!!!整不明白@~@!

from bilibili_comment_crawl.

Ciment0 avatar Ciment0 commented on June 12, 2024

还有一个想法,就是循环两百条的限制可否解除,但爬取一些极大的评论,如50000条时,因为只循环200,每次爬20,最多只能爬取4000条评论。

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

你这个好像不是那些up主发的那种视频把

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

他这个只能抓评论数据,没有写抓评论的评论的数据的功能

from bilibili_comment_crawl.

Ciment0 avatar Ciment0 commented on June 12, 2024

明白了

from bilibili_comment_crawl.

Ciment0 avatar Ciment0 commented on June 12, 2024

那数字200能否改成跑完全部后停止-0-

from bilibili_comment_crawl.

1837669410 avatar 1837669410 commented on June 12, 2024

from bilibili_comment_crawl.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.