Comments (12)
Can you provide us a little more details? What is the maximum configure HEAP size of that node?
Which Crate Data, JVM and OS version are you running? Also how much shards did you configured? What is the average document size?
As you see, importing a lot of documents, specially into partitioned tables with lot of shards configured, can result in huge memory consumption which can have a lot of reasons ;)
from crate.
One thing to mention: We're trying to prevent Out Of Memory exceptions in any case, but this is a hard run (fighting with the JVM GC ;). This is an ongoing task, so your information would be very helpful.
from crate.
The HEAP size is the default (256m) because I couldn't quickly figure out the best way to set it for the upstart environment (any advice on this would be appreciated).
Crate 0.40.2
java version "1.7.0_55"
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
Ubuntu Linux 3.13.0-29-generic (latest Amazon AMI)
5 shards (default) but on a cluster of 3 nodes.
Example of one document:
{"x":0.3246917724609375,"device_id":"02de7e4a-ead2-4824-b43a-361383c890de","y":-0.725860595703125,"z":-0.6339263916015625,"d":"140411","ts":1397202979359}
I recently encountered an OutOfMemory exception when writing a tool to migrate this data from Cassandra and what I learned is that it can happen when you load all the items into memory (of course) rather than providing a view, or streaming them. I know it's difficult to pin down the cause of the issue and can provide Crate with the exact data file if needed. Regardless of the HEAP size set, it should be able to manage its resources without crashing.
For now I will try and increase the HEAP size as I was meaning to do this anyway and see if it works, or break the file into smaller pieces.
from crate.
Also the schema:
create table accelerometerdata (
d string,
device_id string,
ts timestamp,
x double,
y double,
z double,
primary key (d, device_id, ts)
)
partitioned by (d)
clustered by (device_id);
from crate.
I just tried splitting the file into smaller files of 100,000 lines each and it still failed:
[2014-07-31 10:15:34,749][ERROR][io.crate.executor.transport.TransportCollectNodeAction] Error when creating result futures
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.action.admin.indices.stats.CommonStatsFlags.getFlags(CommonStatsFlags.java:85)
at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:122)
at org.elasticsearch.action.admin.indices.stats.ShardStats.<init>(ShardStats.java:49)
at org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:211)
at org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:174)
at org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at io.crate.operation.reference.sys.node.NodeHostnameExpression.value(NodeHostnameExpression.java:42)
at io.crate.operation.reference.sys.node.NodeHostnameExpression.value(NodeHostnameExpression.java:28)
at io.crate.analyze.EvaluatingNormalizer.visitReference(EvaluatingNormalizer.java:92)
at io.crate.analyze.EvaluatingNormalizer.visitReference(EvaluatingNormalizer.java:49)
at io.crate.planner.symbol.Reference.accept(Reference.java:76)
at io.crate.planner.symbol.SymbolVisitor.process(SymbolVisitor.java:30)
at io.crate.analyze.EvaluatingNormalizer.normalize(EvaluatingNormalizer.java:146)
at io.crate.analyze.EvaluatingNormalizer.normalize(EvaluatingNormalizer.java:118)
at io.crate.planner.node.dql.CollectNode.normalize(CollectNode.java:246)
at io.crate.operation.collect.MapSideDataCollectOperation.handleNodeCollect(MapSideDataCollectOperation.java:173)
at io.crate.operation.collect.MapSideDataCollectOperation.collect(MapSideDataCollectOperation.java:157)
at io.crate.executor.transport.TransportCollectNodeAction.nodeOperation(TransportCollectNodeAction.java:115)
at io.crate.executor.transport.TransportCollectNodeAction.access$700(TransportCollectNodeAction.java:52)
at io.crate.executor.transport.TransportCollectNodeAction$AsyncAction$1.run(TransportCollectNodeAction.java:173)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-07-31 10:15:34,752][ERROR][io.crate.executor.transport.TransportCollectNodeAction] Error when creating result futures
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:421)
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:431)
at org.elasticsearch.common.io.stream.BytesStreamOutput.<init>(BytesStreamOutput.java:60)
at org.elasticsearch.common.io.stream.BytesStreamOutput.<init>(BytesStreamOutput.java:55)
at org.elasticsearch.common.io.stream.BytesStreamOutput.<init>(BytesStreamOutput.java:45)
at io.crate.analyze.Id.bytes(Id.java:107)
at io.crate.analyze.Id.stringValue(Id.java:127)
at io.crate.operation.projectors.AbstractIndexWriterProjector.setNextRow(AbstractIndexWriterProjector.java:129)
at io.crate.operation.collect.files.FileReadingCollector.doCollect(FileReadingCollector.java:180)
at io.crate.operation.collect.MapSideDataCollectOperation.handleNodeCollect(MapSideDataCollectOperation.java:189)
at io.crate.operation.collect.MapSideDataCollectOperation.collect(MapSideDataCollectOperation.java:157)
at io.crate.executor.transport.TransportCollectNodeAction.nodeOperation(TransportCollectNodeAction.java:115)
at io.crate.executor.transport.TransportCollectNodeAction.access$700(TransportCollectNodeAction.java:52)
at io.crate.executor.transport.TransportCollectNodeAction$AsyncAction$1.run(TransportCollectNodeAction.java:173)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
I will try and increase the heap now.
from crate.
Yes you're totally right, it should never crash regardless of the given resources. But as I mention this is not so easy to achieve... we'll definitely work on this, pls be patient and increase your heap meanwhile. 256M is a small value, I recommend to increase it to at least 2G by either starting crate with -Xmx2g or setting an env var: _JAVA_OPTIONS=-Xmx2g
from crate.
I've increase the heap to 2048m and it still fails with files split to 1mil records. I wonder if it is because it trying to overwrite existing entries this time?
From the crate upstart log (no visible warning in the crate log).
[2014-07-31 11:19:05,203][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj2] already exists
[2014-07-31 11:19:11,430][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj4] already exists
[2014-07-31 11:19:12,126][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj8] already exists
[2014-07-31 11:19:49,742][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6kp3g] already exists
[2014-07-31 11:20:49,991][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj2] already exists
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid959.hprof ...
Unable to create java_pid959.hprof: Permission denied
from crate.
Can you please increase the HEAP to find out how much HEAP is needed for your import?
Another thing: its always a good idea to disable refreshing and replicas while importing, this can also cause the high memory consumption (while testing and developing with big datasets we've always disabled both):
alter table accelerometerdata set (refresh_interval=0, number_of_replicas=0)
and afterwards:
alter table accelerometerdata reset (refresh_interval, number_of_replicas)
to re-enabled both with default values.
from crate.
I really don't think increasing the HEAP will fix the issue as it will continue to grow.
I will try disabling refreshing and replicas first and then if that doesn't work I'll upgrade the ec2 instance to a higher memory (perhaps 60GB) and try again.
from crate.
It still falls over, trying a larger instance type now.
from crate.
Maybe it makes sense we will continue talking via IRC #crate at freenode? ;)
from crate.
Ok I had some luck with 48.8GB of HEAP:
COPY OK, 7830956 rows affected (727.139 sec)
It cycled up to ~36GB of used HEAP and then back down again during the import, settling at 24.4GB afterwards.
Here is the log output which includes an exception of some sort:
[2014-07-31 15:45:20,221][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj2] already exists
[2014-07-31 15:45:21,972][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj4] already exists
[2014-07-31 15:45:22,278][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj8] already exists
[2014-07-31 15:45:29,822][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6kp3g] already exists
[2014-07-31 15:45:32,405][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6oo36] already exists
[2014-07-31 15:45:33,197][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6oo38] already exists
[2014-07-31 15:45:33,414][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6oo3e] already exists
[2014-07-31 15:45:34,046][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6op3i] already exists
[2014-07-31 15:45:34,264][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6soj6] already exists
[2014-07-31 15:45:34,979][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6go3i] already exists
[2014-07-31 15:46:21,970][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goj0] already exists
[2014-07-31 15:46:39,744][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6cp3e] already exists
[2014-07-31 15:46:50,923][INFO ][index.engine.internal ] [Wiccan] [.partitioned.accelerometerdata.043j2d1g6go3i][2] now throttling indexing: numMergesInFlight=6, maxNumMerges=5
[2014-07-31 15:46:50,937][INFO ][index.engine.internal ] [Wiccan] [.partitioned.accelerometerdata.043j2d1g6go3i][2] stop throttling indexing: numMergesInFlight=4, maxNumMerges=5
[2014-07-31 15:49:36,542][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6ko3e] already exists
[2014-07-31 15:50:17,573][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6goja] already exists
[2014-07-31 15:51:20,580][ERROR][io.crate.executor.transport.TransportCollectNodeAction] Error when creating result futures
io.crate.exceptions.UnhandledServerException: unknown shard id 1 on index '.partitioned.accelerometerdata.043j2d1g6go38'
at io.crate.operation.collect.MapSideDataCollectOperation.handleShardCollect(MapSideDataCollectOperation.java:280)
at io.crate.operation.collect.MapSideDataCollectOperation.collect(MapSideDataCollectOperation.java:160)
at io.crate.executor.transport.TransportCollectNodeAction.nodeOperation(TransportCollectNodeAction.java:115)
at io.crate.executor.transport.TransportCollectNodeAction.access$700(TransportCollectNodeAction.java:52)
at io.crate.executor.transport.TransportCollectNodeAction$TransportHandler.messageReceived(TransportCollectNodeAction.java:233)
at io.crate.executor.transport.TransportCollectNodeAction$TransportHandler.messageReceived(TransportCollectNodeAction.java:223)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.index.IndexShardMissingException: [.partitioned.accelerometerdata.043j2d1g6go38][1] missing
at org.elasticsearch.index.service.InternalIndexService.shardInjectorSafe(InternalIndexService.java:292)
at io.crate.operation.collect.MapSideDataCollectOperation.handleShardCollect(MapSideDataCollectOperation.java:272)
... 9 more
[2014-07-31 15:53:08,556][WARN ][action.bulk ] rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@3338106c, retrying
[2014-07-31 15:53:08,708][WARN ][action.bulk ] rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@666e6884, retrying
[2014-07-31 15:53:08,800][WARN ][action.bulk ] rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@46d33186, retrying
[2014-07-31 15:53:08,945][WARN ][action.bulk ] rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@5c31caa4, retrying
[2014-07-31 15:53:09,071][WARN ][action.bulk ] rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@7a2ba51f, retrying
[2014-07-31 15:53:09,213][WARN ][action.bulk ] rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@6cf65fdf, retrying
[2014-07-31 15:53:09,306][WARN ][action.bulk ] rejected execution (queue capacity 50) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@784406bb, retrying
[2014-07-31 15:55:42,632][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6cp30] already exists
[2014-07-31 15:55:43,717][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6cp36] already exists
[2014-07-31 15:55:44,109][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6cp38] already exists
[2014-07-31 15:55:44,376][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6cp3a] already exists
[2014-07-31 15:55:45,288][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6ko3c] already exists
[2014-07-31 15:55:51,179][DEBUG][action.bulk ] copy from index [.partitioned.accelerometerdata.043j2d1g6sp34] already exists
from crate.
Related Issues (20)
- NPE on `MIN` aggregation on a mixed-version cluster 5.8.0/5.7.3 HOT 1
- Alternating exceptions/results with geo_shape UDF and MATCH USING WITHIN
- Enable logging in ECS JSON format
- Allow users with `AL` privileges to manage session settings defaults for other users HOT 1
- `COUNT()` from a nested view should use `COUNT` operator instead of `COLLECT`
- StaticInformationSchemaQueryTest.testIsNotNull test failure HOT 2
- Success `statusCode` for failed bulk inserts with HTTP endpoint HOT 1
- Extra flags to provide further details on failed records on HTTP endpoint bulk insertion HOT 1
- Improve `ARRAY_LENGTH` performance for arrays within object arrays HOT 1
- Improve performance in HyperLogLogDistinctAggregation
- Invalid results when filtering with `ARRAY_LENGTH` on an array in array of objects HOT 2
- IndexOutOfBoundsException on query with ORDER BY in 3 nodes cluster HOT 3
- ShardInfo in NodeInfo MXBean is missing the schema name leading to duplicates HOT 4
- Table with dynamic column policy is silently failing first insert if it includes an empty array HOT 5
- Unexpected result when using empty `OBJECT` literal HOT 2
- Behaviour of numeric scale and casting inconsistent / unspecified HOT 2
- PostgreSQL wire protocol: type OID for text responses in prepared statements is invalid HOT 4
- PostgreSQL compatibility: SQL statement containing only comment not accepted HOT 3
- io.crate.integrationtests.LuceneQueryBuilderIntegrationTest#testNullOperators HOT 5
- Cannot add new column to a partitioned table after changing the number of shards HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crate.