awolfly9 / ipproxytool Goto Github PK
View Code? Open in Web Editor NEWpython ip proxy tool scrapy crawl. 抓取大量免费代理 ip,提取有效 ip 使用
License: MIT License
python ip proxy tool scrapy crawl. 抓取大量免费代理 ip,提取有效 ip 使用
License: MIT License
发现一个问题,如果运行起ipproxypool.py的话,程序会一直跑一直跑,这没啥毛病,但是表里慢慢会出现大量重复的ip,不知道这是出于某种逻辑有意为之还是一个逻辑上的bug?
运行启动脚本 ipproxytool.py 时,报错。
[root@localhost IPProxyTool]# python3 ipproxytool.py
Traceback (most recent call last):
File "ipproxytool.py", line 8, in
import run_validator_async
File "/media/sf_LinuxShare/IPProxyTool/run_validator_async.py", line 8, in
import aiohttp
ModuleNotFoundError: No module named 'aiohttp'
然后我用 pip 安装 aiohttp 之后运行再次报错:
[root@localhost IPProxyTool]# python3 ipproxytool.py
Traceback (most recent call last):
File "ipproxytool.py", line 30, in
run_validator.validator()
File "/media/sf_LinuxShare/IPProxyTool/run_validator.py", line 33, in validator
module = import_module(path)
File "/usr/local/lib/python3.6/importlib/init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 978, in _gcd_import
File "", line 961, in _find_and_load
File "", line 936, in _find_and_load_unlocked
File "", line 205, in _call_with_frames_removed
File "", line 978, in _gcd_import
File "", line 961, in _find_and_load
File "", line 936, in _find_and_load_unlocked
File "", line 205, in _call_with_frames_removed
File "", line 978, in _gcd_import
File "", line 961, in _find_and_load
File "", line 945, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'ipproxytool.spiders'; 'ipproxytool' is not a package
Traceback (most recent call last):
File "run_crawl_proxy.py", line 6, in
import scrapydo
ImportError: No module named scrapydo
[root@localhost IPProxyTool]# Traceback (most recent call last):
File "run_server.py", line 8, in
from server import dataserver
File "/media/sf_LinuxShare/IPProxyTool/server/dataserver.py", line 9, in
from sql import SqlManager
File "/media/sf_LinuxShare/IPProxyTool/sql/init.py", line 4, in
from sql.mysql import MySql
File "/media/sf_LinuxShare/IPProxyTool/sql/mysql.py", line 6, in
import pymysql
ImportError: No module named pymysql
因为最近需要用到提取ip,所以想问下是不是我的安装步骤哪里不对。
我用的是centos6.5, python3.6.1
Site: http://pzzqz.com/
API Doc: https://pzzqz.com/settings/profile
sudo apt-get install docker.io
...
sudo docker run -it --name=proxy proxy
Traceback (most recent call last):
File "ipproxytool.py", line 7, in <module>
import run_validator
File "/home/run_validator.py", line 11, in <module>
from ipproxytool.spiders.validator.douban import DoubanSpider
File "/home/ipproxytool/spiders/validator/douban.py", line 3, in <module>
from validator import Validator
File "/home/ipproxytool/spiders/validator/validator.py", line 10, in <module>
from sql import SqlManager
File "/home/sql/__init__.py", line 5, in <module>
from mongodb import Mongodb
File "/home/sql/mongodb.py", line 4, in <module>
import pymongo
ImportError: No module named pymongo
有大佬测试过吗?可用率怎么样?
环境是macOS10.12.4 python3.5 运行python ipproxytool.py的时候报错
标题开元棋牌挂, 内容不知道什么鬼
Traceback (most recent call last):
File "ipproxytool.py", line 7, in
import run_validator
File "/opt/software/IPProxyTool/run_validator.py", line 8, in
import scrapydo
ImportError: No module named scrapydo
请问,这个模块从哪里可以下载到??
我看到源码中 server 是用 flask 直接 run 起来的,没有使用任何容器。这样是不是会有性能问题?
其实我在使用中还没碰到过问题,我现在写的程序每秒的请求并发大概是 40 左右,担心将来翻几倍的情况下有性能瓶颈。
首先感谢您的项目!
在网页打开拉勾的测试网址,显示您操作太频繁,是否因为测试了太多非高匿ip,使拉勾网把我的ip列为可疑?
谢谢!
TypeError: type object argument after ** must be a mapping, not NoneType
macos
python3.6
你好,我看爬虫里面有这段代码,可是为何日志文件到不了这个目录?都放在log下面
def init(self):
self.meta = {
'download_timeout': self.timeout,
}
self.dir_log = 'log/proxy/%s' % self.name
utils.make_dir(self.dir_log)
self.sql.init_proxy_table(config.free_ipproxy_table)
r = requests.get(url=self.urls[0], timeout=20)
data = json.loads(r.text)
httpbin验证的时候上面这段会报the json object must be str not 'bytes'这个错误
如题
依赖都安装好后 , 执行报错, python 没用过 , 求助
➜ IPProxyTool git:(master) ✗ python runspider.py
Traceback (most recent call last):
File "runspider.py", line 12, in
from ipproxytool.spiders.proxy.xicidaili import XiCiDaiLiSpider
ImportError: No module named ipproxytool.spiders.proxy.xicidaili
pip3 install -r requirements.txt
ERROR: No matching distribution found for Twisted==20.3.0
用pip2 也是有一部分软件无法安装 ,您有空可以处理下吗
搞了个搬瓦工的VPS,就是装不上去!
amazon.cn 经常用代理可以访问,但是回来的内容是 "输入验证码" 的页面,这种情况对代理的筛选需要对内容进行验证
Traceback (most recent call last):
File "ipproxytool.py", line 7, in
import run_validator
File "/usr/local/IPProxyTool/IPProxyTool-master/run_validator.py", line 8, in
import scrapydo
ImportError: No module named scrapydo
Originally posted by @projectmanagerment in #28 (comment)
Traceback (most recent call last):
File "//python/IPProxyTool/ipproxytool.py", line 8, in
import run_validator_async
File "//python/IPProxyTool/run_validator_async.py", line 8, in
import aiohttp
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/aiohttp/init.py", line 6, in
from .client import * # noqa
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/aiohttp/client.py", line 18, in
from . import client_exceptions, client_reqrep
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 17, in
from . import hdrs, helpers, http, multipart, payload
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/aiohttp/helpers.py", line 166, in
@attr.s(frozen=True, slots=True)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/aiohttp/helpers.py", line 168, in ProxyInfo
proxy = attr.ib(type=str)
TypeError: attr() got an unexpected keyword argument 'type'
2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF NOT EXI
STS free_ipproxy (id
INT(8) NOT NULL AUTO_INCREMENT,ip
CHAR(25) NOT NULL UNI
QUE,port
INT(4) NOT NULL,country
TEXT DEFAULT NULL,anonymity
INT(2) DEFAUL
T NULL,https
CHAR(4) DEFAULT NULL ,speed
FLOAT DEFAULT NULL,source
CHAR(20
) DEFAULT NULL,save_time
TIMESTAMP NOT NULL,PRIMARY KEY(id)) ENGINE=InnoDB
2017-02-14 11:29:19 [10], msg:*********run spider waiting...
在运行validator的时候根据日志可以看到用于验证proxy的spider(百度,拉勾,httpbin等)都会有引用问题,具体log如下
File "run_spider.py", line 33, in <module>
runspider(name)
File "run_spider.py", line 19, in runspider
process = CrawlerProcess(get_project_settings())
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 243, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 134, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 330, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 61, in from_settings
return cls(settings)
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 25, in __init__
self._load_all_spiders()
File "/usr/local/lib/python2.7/dist-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 69, in walk_modules
mods += walk_modules(fullpath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/opt/IPProxyTool/ipproxytool/spiders/proxy/basespider.py", line 10, in <module>
from sql import SqlManager
File "/opt/IPProxyTool/sql/__init__.py", line 3, in <module>
from sql.sql import Sql
运行环境:
Ubuntu 16
python3
已安装requirement.txt中的东西(确认是pip3安装的)
$ python ipproxytool.py
Traceback (most recent call last):
File "ipproxytool.py", line 7, in <module>
import run_validator
File "/Users/hucw/DevTools/IPProxyTool/run_validator.py", line 8, in <module>
import scrapydo
ImportError: No module named scrapydo
我参照的这个[开源项目],已经装好了python3及SSDB数据库,能否支持SSDB数据库呢?我看这个项目是mysql的数据库,有些不一样!(https://github.com/jhao104/proxy_pool)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.