Coder Social home page Coder Social logo

Comments (13)

polyfractal avatar polyfractal commented on September 23, 2024 4

Ah, I think I see the problem. It's the way Elasticsearch operates, which can sometimes cause strange results when dealing with it from an algorithm (usually when doing integration tests).

By default, Elasticsearch performs a refresh operation every second. This refresh makes new documents visible to search...until the refresh is executed the documents are effectively invisible to search operations. A count request is just a special type of search that counts the total number of documents, so it is influenced by this refresh interval.

When you are indexing into ES rapidly, it is possible to index and then call a Count request in just a few milliseconds...way below the 1s threshold. When you run the curl request manually, enough time has elapsed so the docs are "visible".

To fix this in your test code, just add a refresh command:

$client = new \Elasticsearch\Client();

$deleteParams['index'] = 'test';
$client->indices()->delete($deleteParams);

$indexParams['index'] = 'test';
$client->indices()->create($indexParams);

$doc = new \stdClass();
$doc->id = 123;
$doc->field = "abc";
$doc->field2 = "xyz";

$params = array();
$params['id'] = $doc->id;
$params['index'] = 'test';
$params['type'] = 'item';
$params['body'] = (array)$doc;
$client->index($params);

// This refresh command will force a refresh and you'll see correct counts
$client->indices()->refresh(array('index' => 'test'));

$result = $client->count(array('index' => 'test'));
Array
(
    [count] => 1
    [_shards] => Array
        (
            [total] => 5
            [successful] => 5
            [failed] => 0
        )

)

Of course, this is just for testing...you shouldn't call a refresh after every document is indexed or you will hurt your indexing speed and performance.

from elasticsearch-php.

bretrzaun avatar bretrzaun commented on September 23, 2024 1

Yes - that did help, I added the refresh just before the count and everything works as expected now.
Thanks for your help ! Made my starting experience with Elasticsearch very pleasant.

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 23, 2024

I'm a little confused, the output from your curl command also shows a count of zero?

[count] => 0

??

If you can paste your entire set of commands I'd be happy to run them myself and see if I can recreate the situation. Are you on Elasticsearch 1.0 or an older version?

from elasticsearch-php.

bretrzaun avatar bretrzaun commented on September 23, 2024

I am sorry - I copied in the wrong example.
Now updated...

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 23, 2024

Ok, that makes more sense :)

Can you paste a set of commands which recreate the issue? It's much easier to debug an example than to just start digging through the code. Thanks!

from elasticsearch-php.

bretrzaun avatar bretrzaun commented on September 23, 2024

I posted the commands I am using and the log output here:

http://pastebin.com/5u6dWNZV

PS: Yes - I am using Elasticsearch 1.0.0

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 23, 2024

Great! Glad to help...some parts of Elasticsearch can be confusing for new users, and you ran into one of them. I'll see about adding a blurb to the docs about this problem for future users.

Lemme know if you run into anything else, bug or otherwise! :)

from elasticsearch-php.

eddiejaoude avatar eddiejaoude commented on September 23, 2024

@polyfractal @bretrzaun It is possible to update the index in a single request with an additional parameter...I have updated your example as follows

$client = new \Elasticsearch\Client();

$deleteParams['index'] = 'test';
$client->indices()->delete($deleteParams);

$indexParams['index'] = 'test';
$client->indices()->create($indexParams);

$doc = new \stdClass();
$doc->id = 123;
$doc->field = "abc";
$doc->field2 = "xyz";

$params = array();
// -------------------------
$params['refresh'] = true; // no need for a separate refresh request
// -------------------------
$params['id'] = $doc->id;
$params['index'] = 'test';
$params['type'] = 'item';
$params['body'] = (array)$doc;
$client->index($params);

$result = $client->count(array('index' => 'test'));

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 23, 2024

Yep, you can indeed do this :)

I generally hide this option from new users because it is easy to slap a "refresh" onto all your indexing commands...and then forget that "refresh" has been toggled. This becomes very expensive because it is refreshing on each new document.

I prefer to introduce the concept as an explicit API call so that people are aware that it is an additional operation which is being performed, so they consider the overhead of calling it repeatedly.

But you're right, you can absolutely add it to individual commands instead of a second request.

from elasticsearch-php.

eddiejaoude avatar eddiejaoude commented on September 23, 2024

I prefer to introduce the concept as an explicit API call so that people are aware that it is an additional operation which is being performed, so they consider the overhead of calling it repeatedly.

Fair point 😄 👍

from elasticsearch-php.

eddiejaoude avatar eddiejaoude commented on September 23, 2024

Where is the curl & json equivalent of this in the ElasticSearch Docs? I found it once before, but can't seem to find it again

from elasticsearch-php.

polyfractal avatar polyfractal commented on September 23, 2024

@eddiejaoude It's described (briefly, without code sample) here: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-refresh

from elasticsearch-php.

eddiejaoude avatar eddiejaoude commented on September 23, 2024

Thanks @polyfractal , that is the Documentation I had found too. I was looking for an actual example.

However, I got more information from the IRC & I documented it here https://github.com/TransformCore/elasticsearch-example-docs/blob/master/docs/4-runtime-parameters/refresh-parameter.md#example

from elasticsearch-php.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.