Comments (3)
可否描述一下你的问题呢,你只是发一个标题,我真的不知道你想要做什么?
from jd_spider.
安装scrapy-splash
:
$ pip install scrapy-splash
爬取前开启splash
:
$ docker run -p 8050:8050 scrapinghub/splash
在settings.py
中:
加入splash服务器地址 SPLASH_URL = 'http://127.0.0.1:8050'
并启用splash中间件(也就是将
DOWNLOADER_MIDDLEWARES = {
...
}
变为以下:)
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
并且将SplashDeduplicateArgsMiddleware
加入SPIDER_MIDDLEWARES
:
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
在spider.py中:
将yield Request(url, self.parse_result)
改为
yield SplashRequest(url, self.parse_result,
args={
# optional; parameters passed to Splash HTTP API
'wait': 0.5,
# 'url' is prefilled from request url
# 'http_method' is set to 'POST' for POST requests
# 'body' is set to request body for POST requests
},
endpoint='render.json', # optional; default is render.html
splash_url='<url>', # optional; overrides SPLASH_URL
slot_policy=scrapy_splash.SlotPolicy.PER_DOMAIN, # optional
)
上面不懂的话,改为
yield SplashRequest(url, self.parse_result,
args={
'wait': 0.5,#结果不全或者没结果的话,加大这个值
}
)
就能在http://你的服务器外网ip(如果在本机部署的话就是127.0.0.1):8050
初步看到结果了
其余蜘蛛代码简单情况下可以不变
from jd_spider.
你能否表述一下你的问题呢?使用 scrapy-splash
干什么,解决了什么问题,和现在这个项目有什么相关的东西?
from jd_spider.
Related Issues (19)
- 加个好友吧,谢谢! HOT 1
- 请教一下为什么爬出来的数据80%以上都是图书? HOT 15
- erro HOT 18
- 如何用python3? HOT 18
- 项目中Scrapy-Redis的核心代码在哪里可以找到。 HOT 4
- 同一个商品多个sku,如何获取所有sku的信息 HOT 15
- 大佬,能不能借用一下你的部分爬到的数据,诚心感谢啊 HOT 4
- graphite 没有监控到 scrapy 数据 HOT 1
- 爬虫优化 HOT 2
- 关于graphite部分,楼主可以解释一下怎么创建的吗(新手) HOT 20
- scrapy_redis HOT 1
- 能不能写个java版本的出来啊 HOT 1
- 京东前端策略更改了吧,这样的抓取不到了 HOT 6
- 新手。。如何运行 HOT 6
- 没有验证重复的商品? HOT 2
- 好像不是分布式的? HOT 5
- 爬取页数限制 HOT 1
- 下载问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jd_spider.