Coder Social home page Coder Social logo

Comments (8)

taraslayshchuk avatar taraslayshchuk commented on May 26, 2024

Hi,

I have added this check to prevent multiple reads exception and script freeze(#9).

Elasticsearch has 30m timeout per scroll page and 120sec per http request.
You should provide more information about your Elasticsearch (version, architecture, index settings, index mapping) and more information about es2csv args.
If you are losing some information probably it could be hardware issue. Logs from Elasticsearch during scroll process can dot your i's and cross your t's.

from es2csv.

taraslayshchuk avatar taraslayshchuk commented on May 26, 2024

@WormsCH, @conradlee is this issue still reproduced for you?

from es2csv.

conradlee avatar conradlee commented on May 26, 2024

yes I encountered it again, even with your patch

On Mon, Oct 24, 2016, 10:34 AM Taras Layshchuk [email protected]
wrote:

@WormsCH https://github.com/WormsCH, @conradlee
https://github.com/conradlee is this issue still reproduced for you?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#10 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAOGJZKHHp-GiaBOv4zSvl9POVIklMWKks5q3MH3gaJpZM4KGFC7
.

from es2csv.

taraslayshchuk avatar taraslayshchuk commented on May 26, 2024

@conradlee You should provide more information about your Elasticsearch (version, architecture, index settings, index mapping) and more information about es2csv args, version, python and pip versions, OS version.

from es2csv.

conradlee avatar conradlee commented on May 26, 2024

sorry on the road now but I'll try to replicate this problem and document
all those important details when I'm done traveling next week

On Thu, Oct 27, 2016, 6:20 PM Taras Layshchuk [email protected]
wrote:

@conradlee https://github.com/conradlee You should provide more
information about your Elasticsearch (version, architecture, index
settings, index mapping) and more information about es2csv args, version,
python and pip versions, OS version.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#10 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAOGJSRP7rzhQNP3lFC-5Dy9SIIMPJsYks5q4SPIgaJpZM4KGFC7
.

from es2csv.

conradlee avatar conradlee commented on May 26, 2024

Ok, I can provide you with some information:

  • Elasticsearch version: 1.7
  • elasticsearch-py version: 2.4.0
  • Python Version: 2.7.3
  • Pip version: 8.1.1

I have a theory about what's causing the infinite loop. The query I'm running selects all documents with a saved date less than some specified cutoff. It's a big query though, so it takes around 12 hours for es2csv to scroll through all the results and save them. In the meantime, some of the documents in the original result set have been re-saved, removing them from the result set.

Depending on how the scrolling is implemented, this could mean that the final result set is smaller than the original result set, which means that the while loop never exits.

from es2csv.

taraslayshchuk avatar taraslayshchuk commented on May 26, 2024

The es2csv is using under the hood scroll-api, or rather to be precise elasticsearch-py.scroll-api.
I have never test it on editable indexes and can not find any documentation about logic how it works.
So my advice is to copy your index (to make it read only) and to query it with your request.
Logs from ES could help too.

from es2csv.

taraslayshchuk avatar taraslayshchuk commented on May 26, 2024

@conradlee
Oh, looks like I found out the root cause(source):

For Elasticsearch 2.0 and later, use the major version 2 (2.x.y) of the library.

For Elasticsearch 1.0 and later, use the major version 1 (1.x.y) of the library.

So an issue can be that es2csv is using elasticsearch-py version: 2.4.0 and You have Elasticsearch version: 1.7.

from es2csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.