Comments (11)
Hi,
thank you for your detailed info. I would like to reproduce the issue and profile our csv river to find root cause.
- are you able provide us w/ test data set for testing?
- did you tried it on more powerful server? Just to see if the issue is not caused by indexing in ES itself
Regards,
Vitek
from elasticsearch-river-csv.
Hi Vitek,
Thanx for the reponse. Well right now I haven't test it to a stronger server although I will soon. Previously I was doing bulk insert with bulk API and didn't face any similar behaviour. As for the sample data I guess we must agree to push it into a public directory since is about a couple of thousands csv files.
Alex
from elasticsearch-river-csv.
Hi,
If you don't mind you can share the data just w/ us in order to fix river. Don't have to share them publicly.
In the mean time I'll look for some big data set and will try to profile river.
from elasticsearch-river-csv.
Hello Vitek, I have just found out that I apart from 80-90% I/O(program total time spending on I/O) using iotop I get values around 50%wa with top command which is a strong indicator that my CPU waits for the disk too long. It seems that I have an issue with the disk although I have send you some sample data at your e-mail.
Thank you,
Alex
from elasticsearch-river-csv.
I'm going to look on it. However I'm too busy these days. Just wanted to let you know I think of it.
from elasticsearch-river-csv.
Hello proxylab, I am having the same situation here, the first thing I thought was to separate the input files (csv) and the ES database into different disks for better I/O, but that does not sounds promising. Have you come to any conclusions on that issue?
from elasticsearch-river-csv.
@dagr9782 I'll try to reproduce the issue to fix it. How big (columns & rows) csv file is needed?
from elasticsearch-river-csv.
I just tried to index some test data. 23M of lines needs ~ 10.5GB of RAM.
Could you, please, try it once more again on more powerful HW?
Also try to disable all plugins during such a heavy load (except csv river and head ... )
from elasticsearch-river-csv.
@proxylab do you need further help?
from elasticsearch-river-csv.
Hello Vitek,
thanks a lot for you attention.
I loaded 3 indices:
- size: 18.0Gi / docs: 199.680.808 / 3 rows (took 4 hours)
- size: 27.0Gi / docs: 182.834.166 / 3 rows (took 5 hours)
- size: 115Gi /docs: 548.358.452 / 9 rows
I believe 2 things impacted most on the performance:
- increasing csv-river bulk_size to 20000 (more than that crashed the JVM
for Heap Space - limited HW) - ordering and removing duplicates on the input file
As I suspected the network impacted the load speed (at night it was much
faster), I also disabled the refresh_interval and the replicas during the
load. After successful load, I enabled those again "on the fly".
So I believe it was a matter of adjusting things.
Best Regards,
Diego
2015-10-02 4:21 GMT-03:00 Vitek Tajzich [email protected]:
@proxylab https://github.com/proxylab do you need further help?
—
Reply to this email directly or view it on GitHub
#45 (comment)
.
from elasticsearch-river-csv.
@dagr9782 thank you for your reply. As the river works correctly I going to close the issue.
from elasticsearch-river-csv.
Related Issues (20)
- "script_before_file" doesn't work HOT 3
- Is field type mapping possible ? HOT 2
- java.lang.ArrayIndexOutOfBoundsException: 1 in elasticsearch running HOT 1
- How to use the plugin in the client HOT 1
- add support for parent field
- Disable polling / run on demand HOT 2
- Drop bad records but continue HOT 1
- Mappings update HOT 1
- Data type is always String HOT 12
- Gitches with two double quotes HOT 2
- Not supporting geo_location type HOT 1
- Index is not created HOT 15
- Disable the pool parameter HOT 1
- Not importing data HOT 16
- Support for sub folder while pooling HOT 1
- Insertion Stops After 25000 HOT 10
- Notification for csv indexed HOT 1
- csv file is not importing into my index HOT 1
- Plugin properties not found
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-river-csv.