Coder Social home page Coder Social logo

Comments (11)

vtajzich avatar vtajzich commented on July 28, 2024

Hi,

thank you for your detailed info. I would like to reproduce the issue and profile our csv river to find root cause.

  • are you able provide us w/ test data set for testing?
  • did you tried it on more powerful server? Just to see if the issue is not caused by indexing in ES itself

Regards,

Vitek

from elasticsearch-river-csv.

proxylab avatar proxylab commented on July 28, 2024

Hi Vitek,

Thanx for the reponse. Well right now I haven't test it to a stronger server although I will soon. Previously I was doing bulk insert with bulk API and didn't face any similar behaviour. As for the sample data I guess we must agree to push it into a public directory since is about a couple of thousands csv files.

Alex

from elasticsearch-river-csv.

vtajzich avatar vtajzich commented on July 28, 2024

Hi,

If you don't mind you can share the data just w/ us in order to fix river. Don't have to share them publicly.

In the mean time I'll look for some big data set and will try to profile river.

from elasticsearch-river-csv.

proxylab avatar proxylab commented on July 28, 2024

Hello Vitek, I have just found out that I apart from 80-90% I/O(program total time spending on I/O) using iotop I get values around 50%wa with top command which is a strong indicator that my CPU waits for the disk too long. It seems that I have an issue with the disk although I have send you some sample data at your e-mail.

Thank you,
Alex

from elasticsearch-river-csv.

vtajzich avatar vtajzich commented on July 28, 2024

I'm going to look on it. However I'm too busy these days. Just wanted to let you know I think of it.

from elasticsearch-river-csv.

dagr9782 avatar dagr9782 commented on July 28, 2024

Hello proxylab, I am having the same situation here, the first thing I thought was to separate the input files (csv) and the ES database into different disks for better I/O, but that does not sounds promising. Have you come to any conclusions on that issue?

from elasticsearch-river-csv.

vtajzich avatar vtajzich commented on July 28, 2024

@dagr9782 I'll try to reproduce the issue to fix it. How big (columns & rows) csv file is needed?

from elasticsearch-river-csv.

vtajzich avatar vtajzich commented on July 28, 2024

I just tried to index some test data. 23M of lines needs ~ 10.5GB of RAM.

Could you, please, try it once more again on more powerful HW?

Also try to disable all plugins during such a heavy load (except csv river and head ... )

from elasticsearch-river-csv.

vtajzich avatar vtajzich commented on July 28, 2024

@proxylab do you need further help?

from elasticsearch-river-csv.

dagr9782 avatar dagr9782 commented on July 28, 2024

Hello Vitek,

thanks a lot for you attention.

I loaded 3 indices:

  1. size: 18.0Gi / docs: 199.680.808 / 3 rows (took 4 hours)
  2. size: 27.0Gi / docs: 182.834.166 / 3 rows (took 5 hours)
  3. size: 115Gi /docs: 548.358.452 / 9 rows

I believe 2 things impacted most on the performance:

  1. increasing csv-river bulk_size to 20000 (more than that crashed the JVM
    for Heap Space - limited HW)
  2. ordering and removing duplicates on the input file

As I suspected the network impacted the load speed (at night it was much
faster), I also disabled the refresh_interval and the replicas during the
load. After successful load, I enabled those again "on the fly".

So I believe it was a matter of adjusting things.

Best Regards,

Diego

2015-10-02 4:21 GMT-03:00 Vitek Tajzich [email protected]:

@proxylab https://github.com/proxylab do you need further help?


Reply to this email directly or view it on GitHub
#45 (comment)
.

from elasticsearch-river-csv.

vtajzich avatar vtajzich commented on July 28, 2024

@dagr9782 thank you for your reply. As the river works correctly I going to close the issue.

from elasticsearch-river-csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.