Comments (2)
重写async def process_start_urls(self):这个方法也是同样结果
输出:
[2021:05:30 13:51:58] INFO Ruia Spider started!
[2021:05:30 13:51:58] INFO Ruia Worker started: 140647323043984
[2021:05:30 13:51:58] INFO Ruia Worker started: 140647323044160
[2021:05:30 13:51:58] INFO Request <GET: http://www.httpbin.org/>
[2021:05:30 13:51:58] INFO Request <POST: http://www.httpbin.org/post>
{
"args": {},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "/",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "0",
"Content-Type": "application/octet-stream",
"Host": "www.httpbin.org",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36 Edg/91.0.864.37",
"X-Amzn-Trace-Id": "Root=1-60b327ff-c1662cca36b7"
},
"json": null,
"origin": "229.165.62.121",
"url": "http://www.httpbin.org/post"
}
[2021:05:30 13:51:59] INFO Ruia Stopping spider: Ruia
[2021:05:30 13:51:59] INFO Ruia Total requests: 2
[2021:05:30 13:51:59] INFO Ruia Time usage: 0:00:00.897849
[2021:05:30 13:51:59] INFO Ruia Spider finished!
Process finished with exit code 0
找不到请求中的body参数值
from ruia.
用法不对,改动代码如下:
class ScrapedSpider(Spider):
start_urls = ["http://www.httpbin.org/"]
concurrency = 5
async def parse(self, response):
url = "http://www.httpbin.org/post"
aiohttp_kwargs = {"data": {"name": "jack"}}
yield Request(url=url, method="POST", callback=self.parse1, **aiohttp_kwargs)
如下图传递成功:
from ruia.
Related Issues (20)
- Is it possible to use SOCKS5 proxy? HOT 9
- Improve Chinese documentation
- Would be nice to be able to pass in "start_urls" HOT 7
- Trouble scraping deck.tk/deckstats.net HOT 7
- python3.9 remove asyncio.Task.all_tasks() HOT 3
- 【suggestion】重试逻辑可以添加或更换代理ip HOT 8
- 运行示例代码报错 HOT 10
- 代理使用问题 HOT 1
- 是否可以用模式匹配工具-pampy来实现对json解析的支持 HOT 1
- 并发5,循环爬取1000个网页,CPU耗尽为0,但是内存没有耗完,大佬帮看看代码有什么问题 HOT 3
- httpx替换aiohttp支持http2 HOT 1
- 我应当如何向 Spider 传递 start_urls? HOT 1
- 示例代码运行报错 HOT 3
- worker_numbers 数值多少合适 HOT 1
- ruia 使用lxml编码xml文档时报错 HOT 1
- 希望添加更多功能,更多示例,更多文档,希望长期维护~
- 通过中间件添加 socks5 代理后如何关闭 session?
- docs.python-ruia.org is not available HOT 4
- 如果能支持分布式就好了
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ruia.