stefangabos / zebra_curl Goto Github PK

A high-performance solution for making HTTP requests from your PHP projects. It allows running of multiple requests concurrently, asynchronously, supports GET, POST, HEADER, PUT, PATCH, and DELETE requests, and offers support for caching, FTP downloads, HTTP authentication and proxy requests.

Home Page: https://stefangabos.github.io/Zebra_cURL/Zebra_cURL/Zebra_cURL.html

License: Other

PHP 99.94% Batchfile 0.06%

php php-curl php-curl-library php-curl-async curl curl-functions curl-library

zebra_curl's Introduction

Zebra cURL

Zebra cURL is a high performance cURL PHP library which not only allows the running of multiple asynchronous requests at once, but also finished threads can be processed right away without having to wait for the other threads in the queue to finish.

Also, each time a request is completed another one is added to the queue, thus keeping a constant number of threads running at all times and eliminating wasted CPU cycles from busy waiting. This result is a faster and more efficient way of processing large quantities of cURL requests (like fetching thousands of RSS feeds at once), drastically reducing processing time.

This script supports GET (with caching), POST, HEADER, PUT, PATCH and DELETE requests, basic downloads as well as downloads from FTP servers, HTTP Authentication, and requests made through proxy servers.

For maximum efficiency downloads are streamed (bytes downloaded are directly written to disk) removing the unnecessary strain from the server of having to read files into memory first, and then writing them to disk.

The code is heavily commented and generates no warnings/errors/notices when PHP's error reporting level is set to E_ALL.

Features

supports GET (with caching), POST, HEADER, PUT, PATCH and DELETE requests, basic downloads as well as downloads from FTP servers, HTTP Authentication, and requests through proxy servers
allows the running of multiple requests at once, asynchronously, and as soon as one thread finishes it can be processed right away without having to wait for the other threads in the queue to finish
downloads are streamed (bytes downloaded are directly written to disk) removing the unnecessary strain from the server of having to read files into memory first, and then writing them to disk
provides detailed information about the made requests
has awesome documentation
code is heavily commented and generates no warnings/errors/notices when PHP's error reporting level is set to E_ALL

📔 Documentation

Check out the awesome documentation!

🎂 Support the development of this project

Your support is greatly appreciated and it keeps me motivated continue working on open source projects. If you enjoy this project please star it by clicking on the star button at the top of the page. If you're feeling generous, you can also buy me a coffee through PayPal or become a sponsor. Thank you for your support! 🎉

Requirements

PHP 5.3.0+ with the cURL extension enabled.

Installation

You can install via Composer

# get the latest stable release
composer require stefangabos/zebra_curl

# get the latest commit
composer require stefangabos/zebra_curl:dev-master

Or you can install it manually by downloading the latest version, unpacking it, and then including it in your project

require_once 'path/to/Zebra_cURL.php';

How to use

Scrap a page

<?php

// include the library
// (you don't need this if you installed the library via Composer)
require 'path/to/Zebra_cURL.php';

// instantiate the Zebra cURL class
$curl = new Zebra_cURL();

// cache results 3600 seconds
$curl->cache('path/to/cache', 3600);

// since we are communicating over HTTPS, we load the CA bundle from the examples folder,
// so we don't get CURLE_SSL_CACERT response from cURL
// you can always update this bundle from https://curl.se/docs/caextract.html
$curl->ssl(true, 2, __DIR__ . '/cacert.pem');

// a simple way of scrapping a page
// (you can do more with the "get" method and callback functions)
echo $curl->scrap('https://github.com/', true);

Fetch RSS feeds

<?php

// include the library
// (you don't need this if you installed the library via Composer)
require 'path/to/Zebra_cURL.php';

// instantiate the Zebra cURL class
$curl = new Zebra_cURL();

// cache results 3600 seconds
$curl->cache('path/to/cache', 3600);

// since we are communicating over HTTPS, we load the CA bundle from the examples folder,
// so we don't get CURLE_SSL_CACERT response from cURL
// you can always update this bundle from https://curl.se/docs/caextract.html
$curl->ssl(true, 2, __DIR__ . '/cacert.pem');

$feeds = array(
    'https://rss1.smashingmagazine.com/feed/'       =>  'Smashing Magazine',
    'https://feeds.feedburner.com/nettuts'          =>  'TutsPlus',
    'https://feeds.feedburner.com/alistapart/main'   =>  'A List Apart',
);

// get RSS feeds of some popular tech websites
$curl->get(array_keys($feeds), function($result) use ($feeds) {

    // everything went well at cURL level
    if ($result->response[1] == CURLE_OK) {

        // if server responded with code 200 (meaning that everything went well)
        // see https://httpstatus.es/ for a list of possible response codes
        if ($result->info['http_code'] == 200) {

            // the content is an XML, process it
            $xml = simplexml_load_string($result->body);

            // different types of RSS feeds...
            if (isset($xml->channel->item))

                // show title and date for each entry
                foreach ($xml->channel->item as $entry) {
                    echo '<h6>' . $feeds[$result->info['original_url']] . '</h6>';
                    echo '<h2><a href="' . $entry->link . '">' . $entry->title . '</a></h2>';
                    echo '<p><small>' . $entry->pubDate . '</small></p>';
                    echo '<p>' . substr(strip_tags($entry->description), 0, 500) . '</p><hr>';
                }

            // different types of RSS feeds...
            else

                // show title and date for each entry
                foreach ($xml->entry as $entry) {
                    echo '<h6>' . $feeds[$result->info['original_url']] . '</h6>';
                    echo '<h2><a href="' . $entry->link['href'] . '">' . $entry->title . '</a></h2>';
                    echo '<p><small>' . $entry->updated . '</small></p>';
                    echo '<p>' . substr(strip_tags($entry->content), 0, 500) . '</p><hr>';
                }

        // show the server's response code
        } else trigger_error('Server responded with code ' . $result->info['http_code'], E_USER_ERROR);

    // something went wrong
    // ($result still contains all data that could be gathered)
    } else trigger_error('cURL responded with: ' . $result->response[0], E_USER_ERROR);

});

Use custom HTTP headers

// include the library
// (you don't need this if you installed the library via Composer)
require 'path/to/Zebra_cURL.php';

// instantiate the Zebra cURL class
$curl = new Zebra_cURL;

// since we are communicating over HTTPS, we load the CA bundle from the examples folder,
// so we don't get CURLE_SSL_CACERT response from cURL
// you can always update this bundle from https://curl.se/docs/caextract.html
$curl->ssl(true, 2, __DIR__ . '/cacert.pem');

// set custom HTTP headers
$curl->option(CURLOPT_HTTPHEADER, [
    'accept: application/json',
    'X-Token-Foo-Bar: ABC123'   // Pass keys to APIs, for example
]);

echo $curl->scrap('https://httpbin.org/get') . PHP_EOL;

Download an image

<?php

// include the library
// (you don't need this if you installed the library via Composer)
require 'path/to/Zebra_cURL.php';

// instantiate the Zebra cURL class
$curl = new Zebra_cURL();

// since we are communicating over HTTPS, we load the CA bundle from the examples folder,
// so we don't get CURLE_SSL_CACERT response from cURL
// you can always update this bundle from https://curl.se/docs/caextract.html
$curl->ssl(true, 2, __DIR__ . '/cacert.pem');

// download one of the official twitter image
$curl->download('https://abs.twimg.com/a/1362101114/images/resources/twitter-bird-callout.png', 'cache');

zebra_curl's People

Contributors

Stargazers

Watchers

Forkers

eipii apmolsa thinkbox deki33 raxan kokareff novades crazymomo esajjad zoxta fexploit iuyes veerendrapatel martinsv im-denisenko synanhero omgan jzell jorik041 jahanzaibbahadur jasonshaw mymizan daiduong47 mldarshana sbosshardt maheshwarishivam mhmtkrmzgl zuzpk evlsqd qsnipp archoncap2 archoncap omear celebigil softbucket baboleevan otzi122 dylan-irzi renaudoq yefritavarez plonknimbuzz im286er heimpogo aboodma erbhosting proclnas hadryan codelingobot leehomewl tigerxxx 5l1v3r1 nimaguerrero quentpilot kdps unknownssassss xitedemon kezenwa acuto5 shaoweigege rajeevkk32 devmahbubhussain andreyatgithub qfz9527

zebra_curl's Issues

header Problem

I have Array in header like this :
Array(
"arrayA" => "value1",
"arrayA" => "value2",
"arrayB" => "value"
)
But, only show like this :
Array(
"ArrayA" => "value2",
"ArrayB" => "value"
)
i want show all value in arrayA

Please don't disable SSL cerrtificate checking by default

https://github.com/stefangabos/Zebra_cURL/blob/master/Zebra_cURL.php#L2176

When using SSL, you should verify the peer's certificate by default to make sure that you're communicating with the server you want to talk to and not a middle man or impostor. Otherwise you will trick users into believing they do secure connections when they're not.

Pointed out to me by Kevin McArthur: https://twitter.com/KevinSMcArthur/status/697536360354172928

[Feature request] Fallback to sequential downloading if cURL is not available ?

Hi,

What do you think about falling back to a sequential downloading (using file_get_contents for example) if the cURL PHP extension is not available ?

I don't think this would be a big overhead and high number of lines of code to write, and it could provide a library that works on every system, without failing.

Thanks

Non-blocking asynchronous requests?

Can I use Zebra requests in a non-blocking way such that I can run other code while the requests are processing?
I believe it's possible to do with PHP but does Zebra allow this? The callback function option will probably have to be discarded in this case.

How To use Use custom HTTP headers?

I am trying to use Use custom HTTP headers, I am using the same data from a common cURL connection that is functional with the application token but here at Zebra it returns me as a 'Token in White'. I tried to find something in the documentation about it but I didn't find it. Could you help me with that?

Custom header for each request

Hi,

I am doing multiple requests using Zebra Curl (using 10 threads)

Is it possible to set different headers to each thread? Or is it only possible to set same header for entire 10 threads?

I would need to change the cookie in each request.

Thank you

Array key checking

In current stable version

Warning: Undefined array key 92 in vendor\stefangabos\zebra_curl\Zebra_cURL.php on line 2735
---
// get CURLs response code and associated message
$result->response = array($this->_response_messages[$info['result']], $info['result']);

How do I pass in authentication with a token?

Our Api requires that I use a token to pass data in.

$ curl -H "Content-Type: application/json;" -H "X-Cachet-Token: YOUR_KEY_HERE" -d '{"name":"API","description":"An example description","status":1}' http://status.cachethq.io/api/v1/components

I can't find any class or documentation on how to solve this with Zebra.

By the way. I love, Zebra. It's the best in my opinion.

HTTP Retry?

I've been having issues with downstream vendors failing intermittently (500 server error) or I'll exceed API rate limits. How do I send my existing request (get/post) back to the queue to be processed? For something like exceeding API rate limits, I'll want to review the response headers as many vendors let you know when you can make your next request, sleep until then, and process the request again. I would like to have a max of 3 retries before calling it quits, so I need a way to keep track of how many attempts a given request has performed.

Here's an example of a call I'm making. Please check for the comment //Try to process request again here!!

function createSMSCampaign($obj,$resp){
  $obj->authorization_key = getToken("comm_engine_prod");
  $obj->url = "https://xxxx/campaigns";
  $resp->errors = array();
  $resp->campaigns_created = 0;
  $url_data_requests = array();
  $resp->completed_requests = 0;
  $resp->request_count = 0;
  //See if there are any English campaigns
  $url_data_requests = array_merge(getSMSCampaignData($obj,"English"),$url_data_requests);
  //See if there are any Spanish campaigns
  $url_data_requests = array_merge(getSMSCampaignData($obj,"Spanish"),$url_data_requests);
  
  $curl = new Zebra_cURL();
  $curl->ssl(true, 2, $GLOBALS['ca_cert']);
  $curl->option(array());
  $curl->option(CURLOPT_HTTPHEADER, ['Authorization: '. $obj->authorization_key,'content-type: application/json']);
  $curl->threads = 5;
  $resp->url_data_requests = $url_data_requests;
  $resp->responses = array();
  $curl->post($url_data_requests, function($result) use ($resp){
    $resp->completed_requests++;
    // everything went well at cURL level
    if ($result->response[1] == CURLE_OK) {
      array_push($resp->responses,json_decode($result->body));
      if ($result->info['http_code'] == 200) {
        // see all the returned data
        $response_body = json_decode($result->body);
        if(property_exists($response_body,'status') && $response_body->status == "SUCCESS"){
          $resp->campaigns_created++;
        }else{
          $postData = json_decode($result->post);
          array_push($resp->errors,$postData->language . ' campaign "' .(property_exists($postData, "campaign_name") ? $postData->campaign_name : ""). '" returned the following error: ' . $result->body);
          $resp->success = 0;
        }
  
      // show the server's response code
      } else {
        
        //Try to process request again here!!

        array_push($resp->errors,'"'.$result->info['url'] .'" responded with code ' . $result->info['http_code']);
        array_push($resp->errors,$result->body);
        $resp->success = 0;
      }
  
    // something went wrong
    } else {  
      array_push($resp->errors,'cURL responded with: ' . $result->response[0]);
      $resp->success = 0;
    }
  
  });
  
  return $resp;
}

URLs in POST are the Key ($url) not $value)

Line 1683, in the POST foreach where you are building the _requests property, you reference 'url' => $url which is the key of the foreach... a quick change to $values grabs the right thing. I'm using version 1.3.4. Thanks for the awesome work on zc... I use it extensively and love it!

File name too long or not writable

So im having a issue saving a image with a long name, if I do:

$curl->download('<redacted>/MlOAemd1rvXUnph3GjWS8ArwfqWzdkaRRYYcAigXz8fGKyd54v4zzvv3mXG1GlmZhV8x11YjsoRnHJVi6-QoBdGio5Ji-ulo4ozZ6UL9FPCU30FFvDFtFEiJEoeYxoMiE5XlIO1OSvTMpVlKiX6XK4feSFQ58hpf91h1RXXONk-KxUQzuLsyjR5pPJzOoLURB_nCSzgxjRTQEz6JJoJ6lMssWTKG6BqqCLZWtSmN-xsmfGBti2-awaFfv4kdUqHxx4jmbJ05z5hiQ1hm8084lSG6yQ4cBUruigdS9QSjJZjhP-56XeWO19dBZmVmRMchwO0VujgtCido7GPsJI2D1Q.jpg', 'data');

I get this error:

PHP Warning:  fopen(data/MlOAemd1rvXUnph3GjWS8ArwfqWzdkaRRYYcAigXz8fGKyd54v4zzvv3mXG1GlmZhV8x11YjsoRnHJVi6-QoBdGio5Ji-ulo4ozZ6UL9FPCU30FFvDFtFEiJEoeYxoMiE5XlIO1OSvTMpVlKiX6XK4feSFQ58hpf91h1RXXONk-KxUQzuLsyjR5pPJzOoLURB_nCSzgxjRTQEz6JJoJ6lMssWTKG6BqqCLZWtSmN-xsmfGBti2-awaFfv4kdUqHxx4jmbJ05z5hiQ1hm8084lSG6yQ4cBUruigdS9QSjJZjhP-56XeWO19dBZmVmRMchwO0VujgtCido7GPsJI2D1Q.jpg): failed to open stream: File name too long in /vendor/stefangabos/zebra_curl/Zebra_cURL.php on line 2763
PHP Warning:  curl_setopt_array(): supplied argument is not a valid File-Handle resource in /vendor/stefangabos/zebra_curl/Zebra_cURL.php on line 2791

but if I do:

$curl->download('<redacted>/MlOAemd1rvXUnph3GjWS8ArwfqWzdkaRRYYcAigXz8fGKyd54v4zzvv3mXG1GlmZhV8x11YjsoRnHJVi6-QoBdGio5Ji-ulo4ozZ6UL9FPCU30FFvDFtFEiJEoeYxoMiE5XlIO1OSvTMpVlKiX6XK4feSFQ58hpf91h1RXXONk-KxUQzuLsyjR5pPJzOoLURB_nCSzgxjRTQEz6JJoJ6lMssWTKG6BqqCLZWtSmN-xsmfGBti2-awaFfv4kdUqHxx4jmbJ05z5hiQ1hm8084lSG6yQ4cBUruigdS9QSjJZjhP-56XeWO19dBZmVmRMchwO0VujgtCido7GPsJI2D1Q.jpg', __DIR__ . '/data/temp.jpg');

I get this error

PHP Fatal error:  "/data/temp.jpg" is not a valid path or is not writable in /vendor/stefangabos/zebra_curl/Zebra_cURL.php on line 760

Is there any known fix for this

sure async?

CURLM_CALL_MULTI_PERFORM is deprecated,

sure that you libary is really asnyc?

Downloading of URLs without a filename fails

There is currently an issue when downloading files from e.g. "http://www.example.com/file?test=aaa".

It can't open a file handle and therefore not save it. The reason is because it wants to determine the filename using basename().

multiple callback handles not working

//have a seecond callback - zebra never send request from queue
function mycallback_final($result) {

    // everything went well at cURL level
    if ($result->response[1] == CURLE_OK) {

        // if server responded with code 200 (meaning that everything went well)
        // see http://httpstatus.es/ for a list of possible response codes
        if ($result->info['http_code'] == 200) {

        // see all the returned data
        print_r('<pre>');
        print_r($result);

        // show the server's response code
        } else die('Server responded with code ' . $result->info['http_code']);

    // something went wrong
    // ($result still contains all data that could be gathered)
    } else die('cURL responded with: ' . $result->response[0]);

}

// the callback function to be executed for each and every
// request, as soon as a request finishes
// the callback function receives as argument an object with 4 properties
// (info, header, body and response)
function mycallback($result) {

    // everything went well at cURL level
    if ($result->response[1] == CURLE_OK && $result->info['http_code'] == 200) {

        // see all the returned data
        print_r('<pre>');
        print_r($result);

        //good results
        //get some data from results
        //go to a different page with different callback
        //trying to add to the queue in  a call back fails (zebra never sends request)
        $zebra->queue()
        $zebra->get($options , 'mycallback_final');

        //$zebra->start(); //<~~~ trying to do this locks execution

    // something went wrong
    // ($result still contains all data that could be gathered)
    } else {

        //trying to add to the queue in  a call back fails (zebra never sends request)
        $zebra->queue();
        $zebra->post($options , 'mycallback');
        //$zebra->start(); //<~~~ trying to do this locks execution

    }

}

//the initial loop work great (until we try to resend resuest - see mycallback)
$zebra->queue();

//loop to get a bunch of urls
for($entry as $item){
$zebra->post($options , 'mycallback');
}//end loop

$zebra->start();

Setting max load while executing a lot of get requests

I have a question. I want to use this library to send a lot of get request to a website, for cache warming. But I want to monitor the system load by sys_getloadavg, so I will not be higher then 10%.

When can I add this in the code, _process()?

Best regards
Dennis

include zebra_cURL in class

I need to use Zebra_cURL in another class

    public function __construct($url, $username, $password)
    {
        $this->url = $url;
        $this->cookie = __DIR__ . DIRECTORY_SEPARATOR . "cookie.txt";
        require_once('../includes/Zebra_cURL.php');
	$this->curl = new Zebra_cURL();
    }

But I have this error:

Fatal error: Uncaught Error: Class '....\Zebra_cURL' not found in
How can use that?

Callback function on CodeIgniter

Hello,
I am working with Zebra and CodeIgniter but i have a trouble when defining the callback function.
I've added the function within the controller and also in the file Zebra_cURL.php...
Any ideas how to define the callback function for zebra on CodeIgniter?
Thank you

different proxy for different url's

Hello,

is it possible to use different proxies for different url's? set them up while queuing them?

i tried the options() method but i get curl error 35.

 'options'   =>  array(
 CURLOPT_USERAGENT   =>  'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36',
CURLOPT_SSL_VERIFYPEER   =>  0,
CURLOPT_SSL_VERIFYHOST   =>  0,
CURLOPT_PROXYTYPE   =>  $c['PROXY_TYPE'],
CURLOPT_PROXY   =>  $c['PROXY_HOST'],
 CURLOPT_PROXYPORT   =>  $c['PROXY_PORT'],
CURLOPT_PROXYUSERPWD   =>  $c['PROXY_USER'].":".$c['PROXY_PASS'],

i also tried to set the proxy to false and re-set it again.

$curl->proxy(false);
$curl->proxy($c["PROXY_HOST"], $c["PROXY_PORT"], $c["PROXY_USER"], $c["PROXY_PASS"]);

in this case, requests are made from the same proxy always.

Callback extension

Is there anyway to supply a variable to each request so that it can be easly retrieved in the callback function, this could allow differentiation between each requests.

for exemple , fetching data from a database to customise each request. and then add the id of each fetched db row to the request. Retrieving that id in the calback would allow me to delete the corresponding row in the database if the request returned successful response and keep rows whose corresponding requests failed for later request.

I think this would be a great feature and make this the best curl library in addition off already existing ones.

thanks in advance.

Append requests to a running process

Hello,

Using this class on a process where urls are created every second and their response can take more than 1 second... what you think it would be the 'good way' to append more urls to a running process and still guarantee that the threads won't exceed the configured one.

Thanks in advance!

Feature request: user own parameter

You have done great library. But i need some improvement.

Feature request: Add user own parameter for each call get(), download(), or post().

So, the code maybe will look like:

$curl->get('the_url', 'the_callback', array('user_parameter'=>'used for callback'));

More Notes:
I need different additional parameter for each of the URL in array.

If it ready for next day, i will use this library.

This might be more of a Potential Feature Request than an Issue.

Got this code from StackOverFlow https://stackoverflow.com/questions/58570054/what-is-correct-example-to-use-multi-curl and the poster said it's more performant.
Don't know if yours already has similar features / points or am just mistaken.
Kindly help review the Poster's points (which is the accepted answer) so we know if we could borrow some codes to make ours better.

Thanks

`<?php
/**

fetch all urls in parallel,
warning: all urls must be unique..
@param array $urls_unique
```
       urls to fetch
```
@param int $max_connections

       (optional, default 100) max simultaneous connections

       (some websites will auto-ban you for "ddosing" if you send too many requests simultaneously,

       and some wifi routers will get unstable on too many connectionis.. )

@param array $additional_curlopts

       (optional) set additional curl options here, each curl handle will get these options

@throws RuntimeException on curl_multi errors
@throws RuntimeException on curl_init() / curl_setopt() errors
@return array(url=>response,url2=>response2,...)
*/
function curl_fetch_multi_2(array $urls_unique, int $max_connections = 100, array $additional_curlopts = null)
{
// $urls_unique = array_unique($urls_unique);
$ret = array();
$mh = curl_multi_init();
// $workers format: [(int)$ch]=url
$workers = array();
$max_connections = min($max_connections, count($urls_unique));
$unemployed_workers = array();
for ($i = 0; $i < $max_connections; ++ $i) {
$unemployed_worker = curl_init();
if (! $unemployed_worker) {
throw new \RuntimeException("failed creating unemployed worker #" . $i);
}
$unemployed_workers[] = $unemployed_worker;
}
unset($i, $unemployed_worker);

$work = function () use (&$workers, &$unemployed_workers, &$mh, &$ret): void {
assert(count($workers) > 0, "work() called with 0 workers!!");
$still_running = null;
for (;;) {
do {
$err = curl_multi_exec($mh, $still_running);
} while ($err === CURLM_CALL_MULTI_PERFORM);
if ($err !== CURLM_OK) {
$errinfo = [
"multi_exec_return" => $err,
"curl_multi_errno" => curl_multi_errno($mh),
"curl_multi_strerror" => curl_multi_strerror($err)
];
$errstr = "curl_multi_exec error: " . str_replace([
"\r",
"\n"
], "", var_export($errinfo, true));
throw new \RuntimeException($errstr);
}
if ($still_running < count($workers)) {
// some workers has finished downloading, process them
// echo "processing!";
break;
} else {
// no workers finished yet, sleep-wait for workers to finish downloading.
// echo "select()ing!";
curl_multi_select($mh, 1);
// sleep(1);
}
}
while (false !== ($info = curl_multi_info_read($mh))) {
if ($info['msg'] !== CURLMSG_DONE) {
// no idea what this is, it's not the message we're looking for though, ignore it.
continue;
}
if ($info['result'] !== CURLM_OK) {
$errinfo = [
"effective_url" => curl_getinfo($info['handle'], CURLINFO_EFFECTIVE_URL),
"curl_errno" => curl_errno($info['handle']),
"curl_error" => curl_error($info['handle']),
"curl_multi_errno" => curl_multi_errno($mh),
"curl_multi_strerror" => curl_multi_strerror(curl_multi_errno($mh))
];
$errstr = "curl_multi worker error: " . str_replace([
"\r",
"\n"
], "", var_export($errinfo, true));
throw new \RuntimeException($errstr);
}
$ch = $info['handle'];
$ch_index = (int) $ch;
$url = $workers[$ch_index];
$ret[$url] = curl_multi_getcontent($ch);
unset($workers[$ch_index]);
curl_multi_remove_handle($mh, $ch);
$unemployed_workers[] = $ch;
}
};
$opts = array(
CURLOPT_URL => '',
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_ENCODING => ''
);
if (! empty($additional_curlopts)) {
// i would have used array_merge(), but it does scary stuff with integer keys.. foreach() is easier to reason about
foreach ($additional_curlopts as $key => $val) {
$opts[$key] = $val;
}
}
foreach ($urls_unique as $url) {
while (empty($unemployed_workers)) {
$work();
}
$new_worker = array_pop($unemployed_workers);
$opts[CURLOPT_URL] = $url;
if (! curl_setopt_array($new_worker, $opts)) {
$errstr = "curl_setopt_array failed: " . curl_errno($new_worker) . ": " . curl_error($new_worker) . " " . var_export($opts, true);
throw new RuntimeException($errstr);
}
$workers[(int) $new_worker] = $url;
curl_multi_add_handle($mh, $new_worker);
}
while (count($workers) > 0) {
$work();
}
foreach ($unemployed_workers as $unemployed_worker) {
curl_close($unemployed_worker);
}
curl_multi_close($mh);
return $ret;
}`

Broken images

I used your library to download images from internet and have two problems:

1- i just parsed html string to extract image urls i do save those extracted urls in array and then pass it to Zebra_cURL like:

$curl = new Zebra_cURL();
$curl->download( $urls,$save_to, 'callback');

sometime the downloaded image appears like:

i.e image content is partially downloaded.

2- I am not able to download images if image urls come with ?itok=FpNAAWYx like:

http://www.ansar-allah.net/sites/default/files/public/styles/node_image_full/public/11224-1443794907.jpg?itok=FpNAAWYx

do i need to set some HEADER option to solve those problems?

Thanks

all requests not sent

i currently use this library to submit requests to a server and i currently submit about 200 request at once but only around 140 are realy submited. i do not know where the rest of the requests go.

ps : i submit an array of about 200 url. Could this create this issue or is there any limit on the number of request to put on queue. I own the server and there is no memory limit (1024MB) as this could truncate my array.

I plan to submit 1000s of urls soon but it seems this class won't allow me to achieve this.

Best regards
Williem

Retries Before Failed Using Different Proxies

I have proxies list retrieved from anonymous proxy providers.
At most there's will always bad proxies in it.
I wonder if I can make retries on a get request before consider failed using different proxies.

Reusing a Zebra cURL object in the callback triggers timeout

Reproduction steps

Run the following minimal example:

<?php
require_once 'third_party/Zebra_cURL.php';

$curl = new Zebra_cURL();
$curl->get('http://www.example.com', function () use ($curl) {
    echo "Call 1 finished.\n";
    $curl->get('http://www.example.net', function () {
        echo "Call 2 finished.\n";
    });
});

?>

Expected Result

No error should be triggered.

Actual Result

Fatal error: Maximum execution time of 30 seconds exceeded in /usr/share/nginx/html/maps/api/third_party/Zebra_cURL.php on line 2439

Authorization is broken

because on the _requests options, the CURLOPT_USERPWD is set to NULL, any user password set in the options will be wiped ... therefore authorization is broken

bests
wirtsi

Attempt to run with PHP 8.0.0 reports errors

Attempt to run $curl->get with PHP 8.0.0 reports error as listed below.

Fatal error: Uncaught TypeError: preg_replace(): Argument #3 ($subject) must be of type array|string, CurlHandle given in .../stefangabos/zebra_curl/Zebra_cURL.php:2733 Stack trace: #0

        // initialize individual cURL handle with the URL
        $handle = curl_init($request['url']);

        // get the handle's ID
        $resource_number = preg_replace('/Resource id #/', '', $handle);

Curl init functions curl_init, curl_multi_init, and curl_share_init functions returned PHP resources prior to PHP 8. From PHP 8 and forward, curl_init function returns an instance of \CurlHandle class.

Different options per request

I'm using Zebra for the async multi curl requests.
But i need to attach different header data for same links.
How can i do that?

Thanks...

get() method _key

hi Mr. Gabos

the get() method as you see there is request multiple resource

// get RSS feeds of some popular tech websites
$curl->get(array(
    'http://rss1.smashingmagazine.com/feed/',
    'http://allthingsd.com/feed/',
    'http://feeds.feedburner.com/nettuts',
    'http://www.webmonkey.com/feed/',
    'http://feeds.feedburner.com/alistapart/main',
), 'callback');

in the callback method, how to know which response is which resource, maybe add some _key in the callback response?

Some Questions

Any plans to add more Transport handlers other than cURL ?
Is it possible to use this library with roadRunner / Swoole ?

private function _user_agent() {

        // browser version: 9 or 10
        $version = rand(9, 10);

        // windows version; here are the meanings:
        // Windows NT 6.2   ->  Windows 8                                       //  can have IE10
        // Windows NT 6.1   ->  Windows 7                                       //  can have IE9 or IE10
        // Windows NT 6.0   ->  Windows Vista                                   //  can have IE9
        $major_version = 6;

        $minor_version =

            // for IE9 Windows can have "0", "1" or "2" as minor version number
            $version == 8 || $version == 9 ? rand(0, 2) :

            // for IE10 Windows will have "2" as major version number
            2;

        // add some extra information
        $extras = rand(0, 3);

        // return the random user agent string
        return 'Mozilla/5.0 (compatible; MSIE ' . $version . '.0; Windows NT ' . $major_version . '.' . $minor_version . ($extras == 1 ? '; WOW64' : ($extras == 2 ? '; Win64; IA64' : ($extras == 3 ? '; Win64; x64' : ''))) . ')';

    }

Thank a lot!

huge array of URLs leads to many timeout errors

I'm not sure if there is some bug but for me there are only a few requests that go through and many fail at the end.

<?php
require_once 'zebra-curl.php';
ini_set('memory_limit', '320M');

// instantiate the class
$curl = new Zebra_cURL();

$urls = file('urls.txt', FILE_IGNORE_NEW_LINES);
check_url($urls, $curl);

function check_url($urls, $curl)
{

    // process given URLs
    // and execute a callback function for each request, as soon as it finishes
    // the callback function receives as argument an object with 4 properties
    // (info, header, body and response)
    $curl->header($urls, function ($result) {

        // everything went well at cURL level
        if ($result->response[1] == CURLE_OK) {

            // if server responded with code 200 (meaning that everything went well)
            // see https://httpstatus.es/ for a list of possible response codes
            if ($result->info['http_code'] == 200) {

                // see all the returned data
                //print_r('<pre>');
                //print_r($result);
                echo $result->info['url'] . "\n";

                // show the server's response code
            }

            // something went wrong
            // ($result still contains all data that could be gathered)
        } else {
            echo "cURL responded with: " . $result->info['url'] . $result->response[0] . "\n";
        }

    });
}

stefangabos / zebra_curl Goto Github PK

zebra_curl's Introduction

Zebra cURL

Features

📔 Documentation

🎂 Support the development of this project

Requirements

Installation

How to use

zebra_curl's People

Contributors

Stargazers

Watchers

Forkers

zebra_curl's Issues

Reproduction steps

Expected Result

Actual Result

Recommend Projects

Recommend Topics

Recommend Org