Coder Social home page Coder Social logo

Comments (12)

polyfractal avatar polyfractal commented on September 24, 2024

There are a couple things at play here, and I'll say upfront that I don't have a good answer. First is a simple rounding problem. If you change the code to something like this:

$start = microtime(true);
$test = $this->params['transport'];
$end = microtime(true) - $start;

print_r("Raw Timing: $end sec\n");
print_r('Rounded: '.round($end, 2)."\n");
print_r('Rounded then multiplied: '.(round($end, 2) * 1000)." ms\n");
print_r('True timing: '.($end * 1000)." ms\n");

You'll see there is a discrepancy between the reported and actual time (Edit: c/p the wrong example, fixed):

Raw Timing: 0.01535597038269 sec
Rounded: 0.02
Rounded then multiplied: 20 ms
True timing: 15.35597038269 ms

So that is at least part of the problem, although admittedly at most 5ms in either direction :)

Next, profiling shows that a good portion of the time is spent loading classes through the autoloader. If you dump a classmap with Composer, you can shave a few more ms off:

$ composer dump-autoload --optimize
Generating autoload files

$ php main.php
Raw Timing: 0.012829065322876 sec
Rounded: 0.01
Rounded then multiplied: 10 ms
True timing: 12.829065322876 ms

Lastly, a good 10ms is spent due to the initial host ping process in StaticConnectionPool. The key here is that the time is not spent on network transfer/curl as one might expect, but mostly time in the autoloader pulling PHP files off disk:

selection_056

If you comment out the scheduleCheck() line, the number improve significantly:

Raw Timing: 0.0039072036743164 sec
Rounded: 0
Rounded then multiplied: 0 ms
True timing: 3.9072036743164 ms

But this number is deceptive, since it is basically avoiding the overhead associated with loading Guzzle and all the associated classes. If you measure the speed of these two snippets:

$start = microtime(true);
$client = new Client();     // Includes scheduleCheck()
$end = microtime(true);
print_r($end - $start);

And:

$start = microtime(true);
$client = new Client();     // Does not include scheduleCheck()
$client->ping();            // But includes a network call after instantiation
$end = microtime(true);
print_r($end - $start);

You get identical times:

0.020578145980835
0.020431041717529

Which basically shows that the cost is simply deferred until the first network option. So, what to be done? We could rip Guzzle out and replace it with a "lighter" component like my CurlMultiConnection class...but in my benchmarks this is identical to Guzzle for steady-state indexing/search but lacks all the knowledge/bugfixes that Guzzle has baked in.

We can remove the scheduleCheck() pinging process that happens when the client is instantiated, but this ultimately just delays the cost until your first network operation. I'm unsure if that is useful.

In practice, a opcode cache like APC or Zend should mitigate most of this performance problem. The vast majority of the overhead is loading the PHP files themselves, so once the opcode is cached the autoloader no longer plays a big part. I dont, however, have numbers on this...I can start working on benchmarks to see if my theory is correct.

I'm definitely open to opinions/suggestions/thoughts.

What kind of hardware are you running? Xdebug or APC/Zend enabled? PHP Version?

from elasticsearch-php.

andy3rdworld avatar andy3rdworld commented on September 24, 2024

Thanks for the detailed response - much appreciated.

I cited the 40ms tested on a Windows box without APC enabled but with composer autoload already dumped (and xdebug on), and running php 5.4.3.

My question was prompted mostly by the difference in time between running something like the following:

$search_host = 'localhost';
$search_port = '9200';
$baseUri = $search_host.':'.$search_port.'/_stats';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $baseUri);
curl_setopt($ch, CURLOPT_PORT, $search_port);
$response = curl_exec($ch);

vs simpling instantiating the client. This minimal curl example took 10ms (which fetches stats as well) whereas only the client api instantiation took 40ms. On my linux box with APC enabled (sorry about jumping around OS/settings) and xdebug disabled, the curl example ran in 1-2ms whereas the api client instantiation took 7-8ms. Admittedly these are now both very small times, but my use case is auto-complete (using a completion suggestion) and I've been trying to shave off as many milliseconds as possible.

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 24, 2024

Ah, I see. I agree with you...auto-complete needs to be as fast as possible, so even a few ms make a big difference.

I'll run some benchmarks on my CurlMultiConnection class and see how much faster startup is compared to Guzzle. I think I also have a CurlSingleConnection class laying around uncommitted too. They both need some fixup since I stopped maintaining them a long time ago, but otherwise should work fine.

The client will still be slower than a simple curl, simply because there is some overhead associated with autoloading various client classes, etc. But with a lighter HTTP connection class I'm hoping we can shave a few more ms off and get it closer to a single curl instantiation.

I think it makes sense to create a StaticConnectionPool derivative that doesn't ping the initial host list too, for cases where you need absolute speed. It won't save on autoloading time, but it will save an extra network roundtrip.

from elasticsearch-php.

bennimmo avatar bennimmo commented on September 24, 2024

Just following on from this comment by @polyfractal :

"I think it makes sense to create a StaticConnectionPool derivative that doesn't ping the initial host list too, for cases where you need absolute speed. It won't save on autoloading time, but it will save an extra network roundtrip"

I think this is a great idea... Also currently in testing if I were to take a node down the script pauses for a second (time for timeout to occur) which is caused by the scheduleCheck() function. Should the check not only be run when a node is chosen by the selector and If that node is down then move on to another.

An example where this improves speed. If there were 3 nodes (and one was down) then only 1 in 3 connections would get the timeout occurring and a stall of the 1 second. Would this not be much better?

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 24, 2024

@bennimmo Fixed the StaticConnectionPool logic. The previous logic was very poor indeed...it now pings only on the first time a node is used (and some other logic depending on if it has failed previous pings or not)

Also opened a new ticket to implement a "no ping" connection pool, for cases where you don't want pinging at all. I'll implement and add tests for that class soon.

from elasticsearch-php.

bennimmo avatar bennimmo commented on September 24, 2024

@polyfractal this looks really good... I will run through it in a little more detail this evening and do some tests.

Great turn around time though. Loving ElasticSearch right now!!

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 24, 2024

Great! Let me know if you run into any strange problems, or some edge case the tests don't cover!

BTW, the timeout values (60s, max of 1hr) match the other clients...but it may make sense to reduce them for PHP. I doubt there are many PHP scripts running for 1+ hrs, so the ping timeouts may need to be set a bit more aggressively to compensate. In practice I suspect that the timeouts will not get used often, since a script will either succeed and exit, or fail all nodes and die. Curious to hear your thoughts.

@andy3rdworld Still working on your original problem. Will update this ticket when I have an update.

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 24, 2024

Just making a note: in addition to implementing a "lightweight" connection class, I'm going to investigate if we can use some of the tricks that Symfony uses: http://symfony.com/doc/current/book/performance.html

In particular, the dependency bootstrap file and caching the autoloader itself.

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 24, 2024

@andy3rdworld So I updated the self-contained connection class and have some promising results. These timings include the recent changes to the StaticConnectionPool (not pinging hosts, etc). All done on my laptop with xdebug turned off, no APC because it is command line:

Guzzle

No Network
> 7.095 ms

Network Ping
> 15.856 ms

CurlMultiConnection

No Network
> 4.345 ms

Network Ping
> 7.122 ms

So the self-contained curl class basically cuts initialization of both the client and the networking components in half. Note: the steady-state query speed is unaffected (they both perform identically) since this latency is almost entirely caused by autoloader speed.

I still would still recommend using the default Guzzle class for everything that you can, since the Guzzle author has had much more time finding strange edge-cases and bugs. It is going to be more robust. But if you absolutely need blistering speed for autocomplete, you may consider this class. To enable it:

$params['connectionClass'] = '\Elasticsearch\Connections\CurlMultiConnection';
$client = new Client($params);

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 24, 2024

As a final note, if you are really looking for bleeding-edge performance, consider using HHVM. I just ran the test suite on HHVM and everything passes. Once the JIT is warmed, performance is awesome:

Guzzle

No Network
  > APC:  0.989 ms
  > HHVM: 0.551 ms

Network Ping
  > APC:  3.782 ms
  > HHVM: 2.559 ms

CurlMultiConnection

No Network
  > APC:  1.713 ms
  > HHVM: 1.008 ms

Network Ping
  > APC:  1.986 ms
  > HHVM: 1.461 ms

Edit: Added APC timing for a truly fair comparison

from elasticsearch-php.

andy3rdworld avatar andy3rdworld commented on September 24, 2024

@polyfractal Your code changes are great, very nice improvements. I'll give HHVM a test run as well.

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 24, 2024

Going to close this...feel free to comment if you have issues related to the ticket and I'll reopen, or make a new ticket for anything new/related!

from elasticsearch-php.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.