Comments (11)
Hi Alexander,
What is the specific error that you see? I’m about to merge in a very big PR, you might want to try the velvia/multiple-keys-refactor branch (be sure to re-read the README first as the data model has been enhanced). Among the changes are a new throttling mechanism on writes that should work much better, as well as the ability to configure the read and connect network timeouts, and an ability to change the number of segments batch written at one time.
-Evan
On Feb 15, 2016, at 8:49 AM, alexander-branevskiy [email protected] wrote:
Hi guys! Working with your project i faced off with problem that cassandra throw exception trying handle too large batch. How can i configure it (mb decrease batch size or sth else)? I havn't found any examples and spending a lot of time with your source has no affect. I solved this problem configuring cassandra config (param : batch_size_fail_threshold_in_kb) but it's not good solution for me. Any ideas?
—
Reply to this email directly or view it on GitHub #60.
from filodb.
Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. The error appears when i invoke .saveAsFiloDataset. Here is full stacktrace:
*ERROR phantom: Batch too large
ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0)
filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19)
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
*
from filodb.
Right now, we write all the chunks in a single segment at once, so most likely your segment size is quite big…. would you know how much data is in a segment? (Run filo-cli —command analyze —dataset for me and dump the output)
I’ll add a config for the batch size and make it configurable.
On Feb 15, 2016, at 9:34 AM, alexander-branevskiy [email protected] wrote:
Hello Velvia! Thank you for fast response. If i understand right, you use phantom to working with cassandra and mb problem with him. It's stacktrace:
*ERROR phantom: Batch too large
ERROR DatasetCoordinatorActor: Error in reprojection task (test1/0)
filodb.core.StorageEngineException: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:19)
at filodb.cassandra.Util$ResultSetToResponse$$anonfun$toResponse$1.applyOrElse(Util.scala:18)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:180)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:186)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:44)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:754)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:576)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1007)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:930)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
*—
Reply to this email directly or view it on GitHub #60 (comment).
from filodb.
Hi here is your output:
numSegments: 1
numPartitions: 1
===== # Rows in a segment =====
Min: 201
Max: 201
Average: 201.0 (1)
| 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Chunks in a segment =====
Min: 21
Max: 21
Average: 21.0 (1)
| 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Segments in a partition =====
Min: 1
Max: 1
Average: 1.0 (1)
| 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
I used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).
from filodb.
Huh, that’s really interesting. What version of Cassandra are you running? You must have a really small batch size configured. I’m running 2.1.6 locally with default settings and have never run into this.
On Feb 15, 2016, at 10:18 AM, alexander-branevskiy [email protected] wrote:
Hi here is your output:
numSegments: 1
numPartitions: 1
===== # Rows in a segment =====
Min: 201
Max: 201
Average: 201.0 (1)
| 00000100: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Chunks in a segment =====
Min: 21
Max: 21
Average: 21.0 (1)
| 00000010: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
===== # Segments in a partition =====
Min: 1
Max: 1
Average: 1.0 (1)
| 00000000: 00000001 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXI used your dataset GDELT-1979-1984-100000.csv (just try to write it in cassandra). Actually this output can be wrong(because of exception).
—
Reply to this email directly or view it on GitHub #60 (comment).
from filodb.
I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.
from filodb.
Ah, ok. 50kb is really small for FiloDB, because we write big binary blobs, so would definitely need to be increased.
From what I read the size is in KB.
On Feb 15, 2016, at 10:27 AM, alexander-branevskiy [email protected] wrote:
I use cassandra 2.2.4 with default max batch size (50kb). Actually i can configure it(increase), but i don't know what will be with performance.
—
Reply to this email directly or view it on GitHub #60 (comment).
from filodb.
Thank you. And the last question, is it possible to build you project with scala 2.11.7?
from filodb.
Sure, if that would help. I’ll do that before releasing next version — or at least make it possible to build it easily yourself. Spark 1.x is still on Scala 2.10, so that means 2.10 has to still be an option for people.
What is your use case, out of curiosity?
On Feb 15, 2016, at 10:52 AM, alexander-branevskiy [email protected] wrote:
Thank you. And the last question, is it possible to build you project with scala 2.11.7?
—
Reply to this email directly or view it on GitHub #60 (comment).
from filodb.
Added an option columnstore.chunk-batch-size, to control the number of statements per unlogged batch. This is in a branch, will be merged to master soon, along with lots of improvements.
On Feb 15, 2016, at 10:52 AM, alexander-branevskiy [email protected] wrote:
Thank you. And the last question, is it possible to build you project with scala 2.11.7?
—
Reply to this email directly or view it on GitHub #60 (comment).
from filodb.
@alexander-branevskiy this should be resolved with PR #30 merged.
from filodb.
Related Issues (20)
- Filo actors unreachable in filodb 0.7 HOT 16
- Filo full scan freeze HOT 3
- IN optimization and controlling task size during multipartition scan HOT 1
- Predicate pushdown is not working when a single table query has multiple conditions on the same column HOT 1
- Ability to merge ranges and create a larger token range to reduce number of tasks
- Errors setting up ingestion: ArrayBuffer HOT 5
- sbt test are failing HOT 4
- Try using Quotient Filters
- Unable to fetch data for a specific partition key when partition key is defined with more than 4 columns. HOT 5
- FiloDB Write format filodb.spark giving errors HOT 2
- FiloDB write format fails for Binary HOT 1
- Dataset creation ERROR DatasetCoordinatorActor: HOT 29
- Configured Filodb Failed HOT 1
- JVM Errors/Java Nullpointer exceptions HOT 5
- Upgrade to Scala 2.12 HOT 19
- Upgrade to SBT 1.x HOT 2
- Using Chaos Mesh to enhance FiloDB's stability
- Google groups links in the README do not work HOT 1
- E2E benchmarking of FiloDB HOT 1
- Unify akka versions used by dependencies
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from filodb.