Coder Social home page Coder Social logo

gagan1510 / greendeck-proxygrabber Goto Github PK

View Code? Open in Web Editor NEW
20.0 3.0 5.0 81 KB

A python library for scraping/checking/fetching/storing proxies. ๐ŸŽญ

License: MIT License

Python 100.00%
proxies-http proxychecker https-proxies proxyservice mongodb proxy greendeck proxies proxies-generator proxies-scraper

greendeck-proxygrabber's Introduction

greendeck-proxygrabber ๐ŸŽญ

Gd Logo

This package is developed by Greendeck

Install from pip

https://pypi.org/project/greendeck-proxygrabber/

pip install greendeck-proxygrabber


WHATS NEW?

Added proxy grabbing support of 4 new regions to proxy service, proxy grabber and proxy scraper.


๐Ÿ‘‰ What is proxy service?

Proxy service is a service that keeps and updates a Mongo Database with latest up and running proxies.

๐Ÿ‘‰ How to use?

import the service class
from greendeck_proxygrabber import ProxyService
service = ProxyService(MONGO_URI = 'mongodb://127.0.0.1:27017',
                       update_time = 300,
                       pool_limit = 1000,
                       update_count = 200,
                       database_name = 'proxy_pool',
                       collection_name_http = 'http',
                       collection_name_https = 'https',
                       country_code = 'ALL'
                       )

This creates a service object.

Args
  • update_time = Time after which proxies will be updated (in seconds)
  • pool_limit = Limit after which insertion will change to updating
  • update_count = Number of proxies to request grabber at a time
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Starting the service

service.start()

Starting service gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Service...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

๐Ÿ‘‰ What is proxy to mongo?

Proxy to mongo is a functionality that lets you grab a set of valid proxies from the Internet and store it to the desired MongoDB database. You can schedule this to update or insert a given set of proxies to your database of pool, i.e. put it on airflow or any task scheduler.

๐Ÿ‘‰ How to use?

import the ProxyToMongo class
from greendeck_proxygrabber import ProxyService
service = ProxyToMongo( MONGO_URI = MONGO_URI,
                        pool_limit = 1000,
                        length_proxy = 200,
                        database_name='proxy_pool',
                        collection_name_http='http',
                        collection_name_https='https',
                        country_code='DE'
                        )

This creates a service object.

Args
  • pool_limit = Total number of proxies to keep in mongo/pass None if you don't want to update
  • length_proxy = Number of proxies to fetch at once
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Calling the ProxyToMongo grabber

service.get_quick_proxy()

Starting Grabber gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Grabber...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

๐Ÿ‘‰ How to use Proxy Grabber Class?

import ProxyGrabber class
from greendeck_proxygrabber import ProxyGrabber
initialize ProxyGrabber object
grabber = ProxyGrabber(len_proxy_list, country_code, timeout)

Here default values of some arguments are,

len_proxy_list = 10
country_code = 'ALL'
timeout = 2

Currently the program only supports proxies of combined regions

Getting checked, running proxies

The grab_proxy grab_proxy() function helps to fetch the proxies.

grabber.grab_proxy()

This returns a dictionary of the following structure:

{
    'https': [< list of https proxies >],
    'http': [< list of http proxies >],
    'region': 'ALL' # default for now
}
Getting an unchecked list of proxies

The grab_proxy proxy_scraper() method of ScrapeProxy helps to fetch the proxies. This returns a list of 200 proxies of both type http and https.

from greendeck_proxygrabber import ScrapeProxy
proxies_http, proxies_https = ScrapeProxy.proxy_scraper()

This returns list of proxies of type http proxies followed by https proxies.

http_proxies = [< list of http proxies >]
https_proxies = [< list of https proxies >]
Filtering invalid proxies from a list of proxies

The proxy_checker_https and proxy_checker_http methods from ProxyChecker class helps to validate the proxies.

Given a list of proxies, it checks each of them to be valid or not, and returns a list of valid proxies from the proxies feeded to it.

from greendeck_proxygrabber import ProxyChecker
valid_proxies_http = ProxyChecker.proxy_checker_http(proxy_list = proxy_list_http, timeout = 2)
valid_proxies_https = ProxyChecker.proxy_checker_https(proxy_list = proxy_list_https, timeout = 2)

๐Ÿ‘‰ How to build your own pip package

In the parent directory

  • python setup.py sdist bdist_wheel
  • twine upload dist/*

references

MADE WITH ๐Ÿ BY Gagan

greendeck-proxygrabber's People

Contributors

chandanmishra-03 avatar gagan1510 avatar yash-greendeck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.