Coder Social home page Coder Social logo

proxyup's Introduction

PROXYUP

Build Status PyPI repository Code Coverage Code Health

ProxyUP is a package designed for retrieving proxies from a public API. Transparently, it will only retrieve valid proxies, checking them constantly at a fixed rate and ensuring the proxies answer before their delivery.

Installation

It can be installed through pip:

pip install proxyup

Basic Usage

A simple example of use to retrieve a single http proxy:

from proxyup import ProxyupRetriever

proxies = ProxyupRetriever()
proxies.start()

try:
    proxy = proxies.get_once()
finally:
    proxies.close()

print(proxy)

'http://X.X.X.X:XXXX'

As soon as the retriever is instantiated, it begins to scrap proxies in the background at a fixed rate. Internally it will hold a list of valid proxies that are periodically checked and updated.

The get_once() method allows to retrieve N proxies in a single-shot. By default, only one proxy is retrieved.

It is encouraged to wrap the ProxyupRetriever in a context manager:

from proxyup import ProxyupRetriever

with ProxyupRetriever(proxy_type="http") as proxies:   # Valid proxy types=["http", "socks4", "socks5"]
    proxies_list = proxies.get_once(4) 

print(proxies_list)

['http://X.X.X.X:XXXX', 'http://X.X.X.X:XXXX', 'http://X.X.X.X:X', 'http://X.X.X.X:X']

All the returned proxies have passed properly the control measures. These measures consists of the following rules:

  • They all had a server listening on the specified port.
  • They all answered with a 200 status code when requested https://www.google.com through them.
  • They all were responsive in the last 60 seconds. This parameter is modificable throguh the check_interval_seconds during instantiation of the class.

Advanced usage

If desired, it can be wrapped in an infinite iterator that retrieves X number of proxies as follows:

from proxyup import ProxyupRetriever

with ProxyupRetriever(proxy_type="http") as proxies:   # Valid proxy types=["http", "socks4", "socks5"]

    for proxies_list in proxies[4]:   # The index es the size of the list to retrieve in a single shot  
        print(proxies_list)

['http://X.X.X.X:XXXX', 'http://X.X.X.X:XXXX', 'http://X.X.X.X:X', 'http://X.X.X.X:X']
['http://X.X.X.X:XXXX', 'http://X.X.X.X:XXXX', 'http://X.X.X.X:X', 'http://X.X.X.X:X']
['http://X.X.X.X:XXXX', 'http://X.X.X.X:XXXX', 'http://X.X.X.X:X', 'http://X.X.X.X:X']
...

This iterator will run forever, reporting valid proxies on each iteration, which may be the same or different proxies than the previous iteration.

If a valid proxy is detected to not be valid anymore, it will never be yielded again. The internal proxy list is constantly being updated at a rate of 120 seconds, a value that can be modified by using the update_interval_seconds parameter.

A single update will scrap around 100-200 new proxies to include in the proxies list. Previous proxies are not removed unless they are detected to not be valid anymore.

In order to avoid an internal list overflow, a limit is specified in the number of internal max proxies allowed to be kept for checks. This value is by default 1000 proxies, but it can be modified through the parameter proxy_cache_size.

Note that it is important to close the proxies object. Otherwise, their internal threads will not know when to finish and will run in background forever, avoiding the process termination.

References

This package, as of version 0.0.1, uses the API from https://proxyscrape.com/ to scrap new proxies. Note that this backend might change in future releases of the package.

proxyup's People

Contributors

ipazc avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.