batch size too large about filodb HOT 11 CLOSED

filodb commented on June 12, 2024

batch size too large

from filodb.

Comments (11)

velvia commented on June 12, 2024

Hi Alexander,

What is the specific error that you see? I’m about to merge in a very big PR, you might want to try the velvia/multiple-keys-refactor branch (be sure to re-read the README first as the data model has been enhanced). Among the changes are a new throttling mechanism on writes that should work much better, as well as the ability to configure the read and connect network timeouts, and an ability to change the number of segments batch written at one time.

-Evan

On Feb 15, 2016, at 8:49 AM, alexander-branevskiy [email protected] wrote:

Hi guys! Working with your project i faced off with problem that cassandra throw exception trying handle too large batch. How can i configure it (mb decrease batch size or sth else)? I havn't found any examples and spending a lot of time with your source has no affect. I solved this problem configuring cassandra config (param : batch_size_fail_threshold_in_kb) but it's not good solution for me. Any ideas?

—
Reply to this email directly or view it on GitHub #60.

from filodb.

alexander-branevskiy commented on June 12, 2024

Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. The error appears when i invoke .saveAsFiloDataset. Here is full stacktrace:

*ERROR phantom: Batch too large
ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0)
filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19)
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
*

from filodb.

velvia commented on June 12, 2024

Right now, we write all the chunks in a single segment at once, so most likely your segment size is quite big…. would you know how much data is in a segment? (Run filo-cli —command analyze —dataset for me and dump the output)

I’ll add a config for the batch size and make it configurable.

On Feb 15, 2016, at 9:34 AM, alexander-branevskiy [email protected] wrote:

Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. It's stacktrace:

*ERROR phantom: Batch too large
ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0)
filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19)
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
*

—
Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

alexander-branevskiy commented on June 12, 2024

Hi here is your output:

numSegments: 1
numPartitions: 1
===== # Rows in a segment =====
Min: 201
Max: 201
Average: 201.0 (1)
| 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Chunks in a segment =====
Min: 21
Max: 21
Average: 21.0 (1)
| 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Segments in a partition =====
Min: 1
Max: 1
Average: 1.0 (1)
| 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).

from filodb.

velvia commented on June 12, 2024

Huh, that’s really interesting. What version of Cassandra are you running? You must have a really small batch size configured. I’m running 2.1.6 locally with default settings and have never run into this.

On Feb 15, 2016, at 10:18 AM, alexander-branevskiy [email protected] wrote:

Hi here is your output:

numSegments: 1
numPartitions: 1
===== # Rows in a segment =====
Min: 201
Max: 201
Average: 201.0 (1)
| 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Chunks in a segment =====
Min: 21
Max: 21
Average: 21.0 (1)
| 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Segments in a partition =====
Min: 1
Max: 1
Average: 1.0 (1)
| 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).

—
Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

alexander-branevskiy commented on June 12, 2024

I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.

from filodb.

velvia commented on June 12, 2024

Ah, ok. 50kb is really small for FiloDB, because we write big binary blobs, so would definitely need to be increased.
From what I read the size is in KB.

On Feb 15, 2016, at 10:27 AM, alexander-branevskiy [email protected] wrote:

I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.

—
Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

alexander-branevskiy commented on June 12, 2024

Thank you. And the last question, is it possible to build you project with scala 2.11.7?

from filodb.

velvia commented on June 12, 2024

Sure, if that would help. I’ll do that before releasing next version — or at least make it possible to build it easily yourself. Spark 1.x is still on Scala 2.10, so that means 2.10 has to still be an option for people.

What is your use case, out of curiosity?

On Feb 15, 2016, at 10:52 AM, alexander-branevskiy [email protected] wrote:

Thank you. And the last question, is it possible to build you project with scala 2.11.7?

—
Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

velvia commented on June 12, 2024

Added an option columnstore.chunk-batch-size, to control the number of statements per unlogged batch. This is in a branch, will be merged to master soon, along with lots of improvements.

On Feb 15, 2016, at 10:52 AM, alexander-branevskiy [email protected] wrote:

Thank you. And the last question, is it possible to build you project with scala 2.11.7?

—
Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

velvia commented on June 12, 2024

@alexander-branevskiy this should be resolved with PR #30 merged.

from filodb.

batch size too large about filodb HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent