Comments (5)
如果有cookie,应该不止50页吧
from weibospider.
如果有cookie,应该不止50页吧
感谢回复,是有cookie的,然后也降低爬取速度了,但是差不多都是50页自动结束。
from weibospider.
能提供一个用户id的例子吗
from weibospider.
能提供一个用户id的例子吗
好的!
用户ID:6244553417
时间:2022-1-1 到 2023-1-1 爬取结果:1000条数据 然后代码运行完成
{'downloader/request_bytes': 709037,
'downloader/request_count': 645,
'downloader/request_method_count/GET': 645,
'downloader/response_bytes': 2531993,
'downloader/response_count': 645,
'downloader/response_status_count/200': 645,
'dupefilter/filtered': 3,
'elapsed_time_seconds': 786.770646,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2024, 2, 14, 3, 36, 23, 765561),
'httpcompression/response_bytes': 17931461,
'httpcompression/response_count': 641,
'item_scraped_count': 1065,
'log_count/DEBUG': 1712,
'log_count/INFO': 23,
'log_count/WARNING': 1,
'memusage/max': 74203136,
'memusage/startup': 66420736,
'request_depth_max': 54,
'response_received_count': 645,
'scheduler/dequeued': 645,
'scheduler/dequeued/memory': 645,
'scheduler/enqueued': 645,
'scheduler/enqueued/memory': 645,
'start_time': datetime.datetime(2024, 2, 14, 3, 23, 16, 994915)}
2024-02-14 04:36:23 [scrapy.core.engine] INFO: Spider closed (finished)
from weibospider.
针对单个用户的基于时间段的检索接口,微博自身有限制,大概只能采集500条,所以如果时间段内用户超过500条就不能采集了。这里通过将大的时间段拆分成小的时间段(10天一次)来依次采集可以解决。
基于最近的代码
用户ID:6244553417
时间:2022-1-1 到 2023-1-1
可以采集9431条推文
from weibospider.
Related Issues (20)
- 溯源问题 HOT 2
- 爬取信息问题 HOT 1
- 百度网盘的数据集分享链接失效了~
- python3 run_spider.py user 出现问题 HOT 5
- 爬取的数据会漏,应该怎么调整?爬取的速度可以慢一些,但希望能够爬全。 HOT 3
- AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv3_METHOD HOT 3
- 爬取微博热门分类下的博文
- 想问一下这个代码要怎么运行呀,只能在cmd里运行嘛,输入什么呢?对不起我还不是很懂 HOT 1
- 如何把对应推文的图片jpg下载下来? HOT 2
- 写入jsonl文件的代码在哪个位置? HOT 2
- 请问评论可以溯源嘛 HOT 1
- 我写了一个Dockerfile,以及相关的依赖和文档
- AttributeError: module 'OpenSSL.SSL' has no attribute 'SSLv3_METHOD' HOT 1
- 对于有位置的微博, geo字段会呈现数据吗? HOT 1
- 如何控制抓取评论的数量? HOT 3
- 抓取转发关系出错 HOT 1
- 基于关键词爬取微博数量与实际数量差距较大的问题
- 问下这个应该安装哪个库啊 HOT 1
- 运行comment时报错
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weibospider.