Coder Social home page Coder Social logo

batch size too large about filodb HOT 11 CLOSED

filodb avatar filodb commented on June 12, 2024
batch size too large

from filodb.

Comments (11)

velvia avatar velvia commented on June 12, 2024

Hi Alexander,

What is the specific error that you see? I’m about to merge in a very big PR, you might want to try the velvia/multiple-keys-refactor branch (be sure to re-read the README first as the data model has been enhanced). Among the changes are a new throttling mechanism on writes that should work much better, as well as the ability to configure the read and connect network timeouts, and an ability to change the number of segments batch written at one time.

-Evan

On Feb 15, 2016, at 8:49 AM, alexander-branevskiy [email protected] wrote:

Hi guys! Working with your project i faced off with problem that cassandra throw exception trying handle too large batch. How can i configure it (mb decrease batch size or sth else)? I havn't found any examples and spending a lot of time with your source has no affect. I solved this problem configuring cassandra config (param : batch_size_fail_threshold_in_kb) but it's not good solution for me. Any ideas?


Reply to this email directly or view it on GitHub #60.

from filodb.

alexander-branevskiy avatar alexander-branevskiy commented on June 12, 2024

Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. The error appears when i invoke .saveAsFiloDataset. Here is full stacktrace:

*ERROR phantom: Batch too large
ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0)
filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19)
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
*

from filodb.

velvia avatar velvia commented on June 12, 2024

Right now, we write all the chunks in a single segment at once, so most likely your segment size is quite big…. would you know how much data is in a segment? (Run filo-cli —command analyze —dataset for me and dump the output)

I’ll add a config for the batch size and make it configurable.

On Feb 15, 2016, at 9:34 AM, alexander-branevskiy [email protected] wrote:

Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. It's stacktrace:

*ERROR phantom: Batch too large
ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0)
filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19)
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
*


Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

alexander-branevskiy avatar alexander-branevskiy commented on June 12, 2024

Hi here is your output:

numSegments: 1
numPartitions: 1
===== # Rows in a segment =====
Min: 201
Max: 201
Average: 201.0 (1)
| 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Chunks in a segment =====
Min: 21
Max: 21
Average: 21.0 (1)
| 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Segments in a partition =====
Min: 1
Max: 1
Average: 1.0 (1)
| 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).

from filodb.

velvia avatar velvia commented on June 12, 2024

Huh, that’s really interesting. What version of Cassandra are you running? You must have a really small batch size configured. I’m running 2.1.6 locally with default settings and have never run into this.

On Feb 15, 2016, at 10:18 AM, alexander-branevskiy [email protected] wrote:

Hi here is your output:

numSegments: 1
numPartitions: 1
===== # Rows in a segment =====
Min: 201
Max: 201
Average: 201.0 (1)
| 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Chunks in a segment =====
Min: 21
Max: 21
Average: 21.0 (1)
| 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Segments in a partition =====
Min: 1
Max: 1
Average: 1.0 (1)
| 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

I used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).


Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

alexander-branevskiy avatar alexander-branevskiy commented on June 12, 2024

I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.

from filodb.

velvia avatar velvia commented on June 12, 2024

Ah, ok. 50kb is really small for FiloDB, because we write big binary blobs, so would definitely need to be increased.
From what I read the size is in KB.

On Feb 15, 2016, at 10:27 AM, alexander-branevskiy [email protected] wrote:

I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.


Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

alexander-branevskiy avatar alexander-branevskiy commented on June 12, 2024

Thank you. And the last question, is it possible to build you project with scala 2.11.7?

from filodb.

velvia avatar velvia commented on June 12, 2024

Sure, if that would help. I’ll do that before releasing next version — or at least make it possible to build it easily yourself. Spark 1.x is still on Scala 2.10, so that means 2.10 has to still be an option for people.

What is your use case, out of curiosity?

On Feb 15, 2016, at 10:52 AM, alexander-branevskiy [email protected] wrote:

Thank you. And the last question, is it possible to build you project with scala 2.11.7?


Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

velvia avatar velvia commented on June 12, 2024

Added an option columnstore.chunk-batch-size, to control the number of statements per unlogged batch. This is in a branch, will be merged to master soon, along with lots of improvements.

On Feb 15, 2016, at 10:52 AM, alexander-branevskiy [email protected] wrote:

Thank you. And the last question, is it possible to build you project with scala 2.11.7?


Reply to this email directly or view it on GitHub #60 (comment).

from filodb.

velvia avatar velvia commented on June 12, 2024

@alexander-branevskiy this should be resolved with PR #30 merged.

from filodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.