Comments (12)
There are a couple things at play here, and I'll say upfront that I don't have a good answer. First is a simple rounding problem. If you change the code to something like this:
$start = microtime(true);
$test = $this->params['transport'];
$end = microtime(true) - $start;
print_r("Raw Timing: $end sec\n");
print_r('Rounded: '.round($end, 2)."\n");
print_r('Rounded then multiplied: '.(round($end, 2) * 1000)." ms\n");
print_r('True timing: '.($end * 1000)." ms\n");
You'll see there is a discrepancy between the reported and actual time (Edit: c/p the wrong example, fixed):
Raw Timing: 0.01535597038269 sec
Rounded: 0.02
Rounded then multiplied: 20 ms
True timing: 15.35597038269 ms
So that is at least part of the problem, although admittedly at most 5ms in either direction :)
Next, profiling shows that a good portion of the time is spent loading classes through the autoloader. If you dump a classmap with Composer, you can shave a few more ms off:
$ composer dump-autoload --optimize
Generating autoload files
$ php main.php
Raw Timing: 0.012829065322876 sec
Rounded: 0.01
Rounded then multiplied: 10 ms
True timing: 12.829065322876 ms
Lastly, a good 10ms is spent due to the initial host ping process in StaticConnectionPool. The key here is that the time is not spent on network transfer/curl as one might expect, but mostly time in the autoloader pulling PHP files off disk:
If you comment out the scheduleCheck()
line, the number improve significantly:
Raw Timing: 0.0039072036743164 sec
Rounded: 0
Rounded then multiplied: 0 ms
True timing: 3.9072036743164 ms
But this number is deceptive, since it is basically avoiding the overhead associated with loading Guzzle and all the associated classes. If you measure the speed of these two snippets:
$start = microtime(true);
$client = new Client(); // Includes scheduleCheck()
$end = microtime(true);
print_r($end - $start);
And:
$start = microtime(true);
$client = new Client(); // Does not include scheduleCheck()
$client->ping(); // But includes a network call after instantiation
$end = microtime(true);
print_r($end - $start);
You get identical times:
0.020578145980835
0.020431041717529
Which basically shows that the cost is simply deferred until the first network option. So, what to be done? We could rip Guzzle out and replace it with a "lighter" component like my CurlMultiConnection class...but in my benchmarks this is identical to Guzzle for steady-state indexing/search but lacks all the knowledge/bugfixes that Guzzle has baked in.
We can remove the scheduleCheck() pinging process that happens when the client is instantiated, but this ultimately just delays the cost until your first network operation. I'm unsure if that is useful.
In practice, a opcode cache like APC or Zend should mitigate most of this performance problem. The vast majority of the overhead is loading the PHP files themselves, so once the opcode is cached the autoloader no longer plays a big part. I dont, however, have numbers on this...I can start working on benchmarks to see if my theory is correct.
I'm definitely open to opinions/suggestions/thoughts.
What kind of hardware are you running? Xdebug or APC/Zend enabled? PHP Version?
from elasticsearch-php.
Thanks for the detailed response - much appreciated.
I cited the 40ms tested on a Windows box without APC enabled but with composer autoload already dumped (and xdebug on), and running php 5.4.3.
My question was prompted mostly by the difference in time between running something like the following:
$search_host = 'localhost';
$search_port = '9200';
$baseUri = $search_host.':'.$search_port.'/_stats';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $baseUri);
curl_setopt($ch, CURLOPT_PORT, $search_port);
$response = curl_exec($ch);
vs simpling instantiating the client. This minimal curl example took 10ms (which fetches stats as well) whereas only the client api instantiation took 40ms. On my linux box with APC enabled (sorry about jumping around OS/settings) and xdebug disabled, the curl example ran in 1-2ms whereas the api client instantiation took 7-8ms. Admittedly these are now both very small times, but my use case is auto-complete (using a completion suggestion) and I've been trying to shave off as many milliseconds as possible.
from elasticsearch-php.
Ah, I see. I agree with you...auto-complete needs to be as fast as possible, so even a few ms make a big difference.
I'll run some benchmarks on my CurlMultiConnection class and see how much faster startup is compared to Guzzle. I think I also have a CurlSingleConnection class laying around uncommitted too. They both need some fixup since I stopped maintaining them a long time ago, but otherwise should work fine.
The client will still be slower than a simple curl, simply because there is some overhead associated with autoloading various client classes, etc. But with a lighter HTTP connection class I'm hoping we can shave a few more ms off and get it closer to a single curl instantiation.
I think it makes sense to create a StaticConnectionPool derivative that doesn't ping the initial host list too, for cases where you need absolute speed. It won't save on autoloading time, but it will save an extra network roundtrip.
from elasticsearch-php.
Just following on from this comment by @polyfractal :
"I think it makes sense to create a StaticConnectionPool derivative that doesn't ping the initial host list too, for cases where you need absolute speed. It won't save on autoloading time, but it will save an extra network roundtrip"
I think this is a great idea... Also currently in testing if I were to take a node down the script pauses for a second (time for timeout to occur) which is caused by the scheduleCheck() function. Should the check not only be run when a node is chosen by the selector and If that node is down then move on to another.
An example where this improves speed. If there were 3 nodes (and one was down) then only 1 in 3 connections would get the timeout occurring and a stall of the 1 second. Would this not be much better?
from elasticsearch-php.
@bennimmo Fixed the StaticConnectionPool
logic. The previous logic was very poor indeed...it now pings only on the first time a node is used (and some other logic depending on if it has failed previous pings or not)
Also opened a new ticket to implement a "no ping" connection pool, for cases where you don't want pinging at all. I'll implement and add tests for that class soon.
from elasticsearch-php.
@polyfractal this looks really good... I will run through it in a little more detail this evening and do some tests.
Great turn around time though. Loving ElasticSearch right now!!
from elasticsearch-php.
Great! Let me know if you run into any strange problems, or some edge case the tests don't cover!
BTW, the timeout values (60s, max of 1hr) match the other clients...but it may make sense to reduce them for PHP. I doubt there are many PHP scripts running for 1+ hrs, so the ping timeouts may need to be set a bit more aggressively to compensate. In practice I suspect that the timeouts will not get used often, since a script will either succeed and exit, or fail all nodes and die. Curious to hear your thoughts.
@andy3rdworld Still working on your original problem. Will update this ticket when I have an update.
from elasticsearch-php.
Just making a note: in addition to implementing a "lightweight" connection class, I'm going to investigate if we can use some of the tricks that Symfony uses: http://symfony.com/doc/current/book/performance.html
In particular, the dependency bootstrap file and caching the autoloader itself.
from elasticsearch-php.
@andy3rdworld So I updated the self-contained connection class and have some promising results. These timings include the recent changes to the StaticConnectionPool (not pinging hosts, etc). All done on my laptop with xdebug turned off, no APC because it is command line:
Guzzle
No Network
> 7.095 ms
Network Ping
> 15.856 ms
CurlMultiConnection
No Network
> 4.345 ms
Network Ping
> 7.122 ms
So the self-contained curl class basically cuts initialization of both the client and the networking components in half. Note: the steady-state query speed is unaffected (they both perform identically) since this latency is almost entirely caused by autoloader speed.
I still would still recommend using the default Guzzle class for everything that you can, since the Guzzle author has had much more time finding strange edge-cases and bugs. It is going to be more robust. But if you absolutely need blistering speed for autocomplete, you may consider this class. To enable it:
$params['connectionClass'] = '\Elasticsearch\Connections\CurlMultiConnection';
$client = new Client($params);
from elasticsearch-php.
As a final note, if you are really looking for bleeding-edge performance, consider using HHVM. I just ran the test suite on HHVM and everything passes. Once the JIT is warmed, performance is awesome:
Guzzle
No Network
> APC: 0.989 ms
> HHVM: 0.551 ms
Network Ping
> APC: 3.782 ms
> HHVM: 2.559 ms
CurlMultiConnection
No Network
> APC: 1.713 ms
> HHVM: 1.008 ms
Network Ping
> APC: 1.986 ms
> HHVM: 1.461 ms
Edit: Added APC timing for a truly fair comparison
from elasticsearch-php.
@polyfractal Your code changes are great, very nice improvements. I'll give HHVM a test run as well.
from elasticsearch-php.
Going to close this...feel free to comment if you have issues related to the ticket and I'll reopen, or make a new ticket for anything new/related!
from elasticsearch-php.
Related Issues (20)
- Path in hosts configuration is ignored HOT 2
- check the index exsists ,it have a error HOT 1
- Connecting to Elasticsearch v8.x using the v7.17.x client HOT 1
- Received a 403 Forbidden error when attempting to index HOT 1
- `Response\Elasticsearch::offsetGet()` return type declaration HOT 2
- Need a new Release 6.8.3 HOT 4
- ServerError was not handled correctly. HOT 2
- Add support for Elasticsearch with Bulk API and data stream HOT 1
- How to pass specific characters password to ElasticSearch through Sulu/ArticleBundle HOT 5
- [Request feedback] Looking for feedback about the UX experience
- Calling static trait method ... is deprecated HOT 1
- Inquiries about version use HOT 2
- [Proposal] Add a mapTo(class) function for map ES|QL response into objects HOT 1
- Why Can't I update the mapping with Laravel Scout? HOT 1
- 使用PHP GET方法,不能获取到文档内容 HOT 4
- please help me change the php code from pagination from + size to search_after HOT 5
- failed to get the last sort using search_after HOT 1
- Add OpenTelemetry support HOT 1
- will elasticsearch-php pre-open the urls on the page when I visit it? HOT 3
- "Type: illegal_argument_exception Reason: "Fielddata is disabled on [pageCategory] in [elastic_web_index]. Text fields are not optimised for operations that require per-document field data like aggregations HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-php.