Coder Social home page Coder Social logo

nsdown / ok_ip_proxy_pool Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cwjokaka/ok_ip_proxy_pool

0.0 1.0 0.0 58 KB

🍿爬虫代理IP池(proxy pool) python🍟一个还ok的IP代理池,先做给自己用着~

License: MIT License

Python 98.89% Dockerfile 1.11%

ok_ip_proxy_pool's Introduction

ok_ip_proxy_pool😁

一个还ok的IP代理池,先做给自己用着~

运行环境

  • python 3.7

特点

  • 异步爬取&验证代理🚀
  • 用权重加减来衡量代理的可用性(可用性:通过验证则+1,否则-1)🎭
  • 使用Sqlite,无需安装数据库环境🛴
  • 目前支持的免费代理有: 免费代理/全网/66/西刺/快代理/云代理/IP海

下载&安装

  • 源码下载:

    git clone [email protected]:cwjokaka/ok_ip_proxy_pool.git
    
  • 安装依赖:

    pip install -r requirements.txt
    

配置文件

# 代理爬虫配置
SPIDER = {
    'crawl_interval': 60,       # 爬取IP代理的间隔(秒)
    'list': [                   # 使用的代理爬虫(类名)
        'Spider66Ip',
        'SpiderQuanWangIp',
        'SpiderXiciIp',
        'SpiderKuaiDaiLiIp',
        'SpiderYunDaiLiIp',
        'SpiderIpHaiIp',
        'SpiderMianFeiDaiLiIp'
    ]
}

# 校验器配置
VALIDATOR = {
    'test_url': 'http://www.baidu.com',	# 验证url
    'request_timeout': 4,               # 校验超时时间
    'validate_interval': 30			        # 验证时间间隔(秒)
}

# 数据库配置
DB = {
    'db_name': 'test.db',
    'table_name': 'proxy'
}

# WEB配置(Flask)
WEB_SERVER = {
    'host': 'localhost',
    'port': '8080'
}

# 爬虫请求头
HEADERS = {
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 "
    "(KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36",
}

运行

python main.py

API使用

API method description
/ GET 首页介绍
/get GET 获取一个代理
/get_all GET 获取所有代理

代理爬虫扩展

如果需要添加自定义代理爬虫,可通过以下步骤添加:

  1. 进入src/spider/spiders.py
  2. 添加自己的爬虫类,继承AbsSpider,实现它的do_crawl方法。注:请求需要异步调用
  3. 用@spider_register修饰此类
  4. 在配置文件setting.py的SPIDER['list']中添加此类名

LAST

欢迎Fork|Star|Issue 三连😘

ok_ip_proxy_pool's People

Contributors

cwjokaka avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.