rax-maas / blueflood Goto Github PK

View Code? Open in Web Editor NEW

597.0 95.0 102.0 11.92 MB

A distributed system designed to ingest and process time series data

Home Page: http://www.blueflood.io

License: Apache License 2.0

Java 96.44% Shell 1.40% Python 0.74% Ruby 0.17% JavaScript 0.61% HTML 0.01% Dockerfile 0.02% PowerShell 0.61%

blueflood's Introduction

Blueflood

Discuss - Code - Site

Introduction

Blueflood is a multi-tenant, distributed metric processing system. Blueflood is capable of ingesting, rolling up and serving metrics at a massive scale.

Getting Started

The latest code will always be here on Github.

git clone https://github.com/rax-maas/blueflood.git
cd blueflood

Building

Blueflood builds and runs on Java 8. Ensure you're using an appropriate JDK before proceeding.

Blueflood builds with Maven. Use typical Maven lifecycle phases:

mvn clean removes build artifacts.
mvn test runs unit tests.
mvn verify runs all tests.
mvn package builds a Blueflood package for release.

Important build profiles to know about:

skip-unit-tests skips unit tests in all modules.
skip-integration-tests skips the integration tests.

Blueflood's main artifact is an 'uber jar', produced by the blueflood-all module.

After compiling, you can also build a Docker image with mvn docker:build. See blueflood-docker for the Docker-related files.

Running

You can easily build a ready-to-run Blueflood jar from source:

mvn package -P skip-unit-tests,skip-integration-tests

However, it requires Cassandra to start and Elasticsearch for all its features to work. The best place to start is the 10 minute guide.

Additional Tools

The Blueflood team maintains a number of tools that are related to the project, but not essential components of it. These tools are kept in various other repos:

Performance Tests: Scripts for load testing a blueflood installation using The Grinder. https://github.com/rackerlabs/raxmetrics-perf-test-scripts
Carbon Forwarder: a process that receives data from carbon (one of the components of Graphite) and sends it to a Blueflood instance. https://github.com/rackerlabs/blueflood-carbon-forwarder
Blueflood-Finder: a plugin for graphite-web and graphite-api that allows them to using a Blueflood instance as a data backend. https://github.com/rackerlabs/blueflood-graphite-finder
StatsD plugin: a statsD backend that sends metrics a Blueflood instance. https://github.com/rackerlabs/blueflood-statsd-backend

Contributing

First, we welcome bug reports and contributions. If you would like to contribute code, just fork this project and send us a pull request. If you would like to contribute documentation, you should get familiar with our wiki

Also, we have set up a Google Group to answer questions.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

blueflood's People

Contributors

Stargazers

Watchers

Forkers

lakshmi-kannan gdusbabek dlobue tilmans awesome benarent obazoud wolfdancer ratanasv yinzhiqiang chinmay-gupte robert-chiniquy fourk hy heiko-braun narayana1208 rphillips tsindot callmetim alisheikh georgejahad tilogaat bwstearns stackedsax gnomix tdp100 csm animeshinvinci rampage644 firstteam hexiaofeng btravers mindis aweeks rm-you ebereketoglu johnrees goru97 vinnyq bmumdluri izrik usnavi shintasmith chandraaddala dogild ksheedlo kaustavha kvlou ranjithpeddi sksundaram-learning pccowboy kevinjmendel johann8384 maniacs-db danielni cloudxtreme desperado1992 tempbottle pratikmallya mbrukman doarthon nagyistge wy-zhang nilkonto sgran xephon-contrib rui-hrh raceli klarrio jakewarr8 baolincloud sjanulonoks flakytestdetection pologood bspindler sangfei leusonmario cloud-robotics elrondvega fe-ordan sandeep-sidhu icse18-refactorings mritunjaykumar rubbikumari guilhermejccavalcanti leoaugust19 sinopower tool-recommender-bot abrincloud dizhaung andypeng2015 jlleitschuh suyambuganesh tsdb-io dyet92k ericjohnsohnisc muyembeii anoopbutola rohitsngh27 richarxt

blueflood's Issues

Using blueflood in private cloud/on premise

Blue Flood is very interesting. I am currently using Cyanite. I am interested to know whether blue flood only works on rackspace, or I can use it in on premise/private cloud setup environments?

Thanks in advance

Fix == vs != bug in BasicRollup

https://github.com/rackerlabs/blueflood/blob/master/blueflood-core/src/main/java/com/rackspacecloud/blueflood/types/BasicRollup.java#L205

Include a test case in the fix.

Blueflood needs operations tool set

Query database for metrics directly.
Tool to kick off rollups for a time period.

Have a class that checks schema on startup

and creates column families if they are not present.

Paginate results from /metrics/search

I develop Rackspace Intelligence, which uses Blueflood extensively via Rackspace Monitoring and Cloud Metrics. I've been debugging a performance problem where we need to find a device that generates metrics so we can display it on a particular graph page, not knowing which device it will be ahead of time.

Currently we call an API in Rackspace Monitoring that delegates to Metrics (which is Blueflood). The Blueflood API does not support any limits on the size of the response beyond the hard coded limit of 100,000 in the code. I believe this is causing the time to respond to go up without a reasonable bound on large accounts. On the customer's account that alerted us to the issue, I saw response times from Blueflood up to 20 seconds.

We would like to speed up this operation by calling Rackspace Cloud Metrics directly, but we need an upper bound on the latency of this operation for it to make sense. It would be nice to expose the from and size parameters to Elasticsearch as queryable parameters to the /metrics/search API.

Startup time is too slow when zookeeper is enabled.

I had 3 blueflood rollup servers deployed, shard space initially split into 3 regions. I decided to use zookeeper clustering.

I realized that Blueflood startup time became very slow as soon as I enabled the Zookeeper clustering. (my deployment environment has health check of the form "if the given ports doesn't open within N seconds, consider the deploy to be failed".)

I enabled the DEBUG log, and i realized the logic was at ZKBasedShardLockManager.prefetchLocksAndScheduleLocksScavenging. Specifically at the loop

            for (int shard : shards) {
                    worker.submit(lock.acquirer()).get();
                    if (lock.isHeld() && ++locksObtained >= maxLocksToPrefetch) {
                        break;
                    }
            }

each iteration of loop is taking ~1 sec for me, thus entire startup takes about 1 minute. relevant log entries are:

014-10-22 22:18:01 DEBUG asedShardLockManager:212 - Initial lock attempt for 30
2014-10-22 22:18:01 DEBUG asedShardLockManager:563 - Trying ZK lock for 30
2014-10-22 22:18:01 DEBUG asedShardLockManager:576 - Acquired ZK lock for 30
...
2014-10-22 22:19:11 DEBUG asedShardLockManager:212 - Initial lock attempt for 38
2014-10-22 22:19:11 DEBUG asedShardLockManager:563 - Trying ZK lock for 38
2014-10-22 22:19:12 DEBUG asedShardLockManager:581 - Acquire ZK failed for 38

I propose two possible solutions for this:

make this loop parallel. (seems like one blocker for this is ThreadPoolExecutor worker having the max pool size of 1 - effectively serializing the background worker pool. any history behind this)?
allow the app to start without this task finishing. (I don't know why this logic is prerequisite for startup up the app - any background on this)?

Any thoughts?

Parent pom should contain set up for integration tests

That way modules (e.g.: UDP) can have their own integration tests.

Build error on Ubuntu 14.04

Hi,

I followed the 10 minute guide to build Blueflood from the sources, but I got the following errors.

[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single (make-assembly) on project blueflood-all: Execution make-assembly of goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single failed: A required class was missing while executing org.apache.maven.plugins:maven-assembly-plugin:2.4:single: org/apache/commons/lang/StringUtils
[ERROR] -----------------------------------------------------
[ERROR] realm = plugin>org.apache.maven.plugins:maven-assembly-plugin:2.4
[ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] = file:/home/ubuntu/.m2/repository/org/apache/maven/plugins/maven-assembly-plugin/2.4/maven-assembly-plugin-2.4.jar
[ERROR] urls[1] = file:/home/ubuntu/.m2/repository/org/slf4j/slf4j-jdk14/1.5.6/slf4j-jdk14-1.5.6.jar
[ERROR] urls[2] = file:/home/ubuntu/.m2/repository/org/slf4j/slf4j-api/1.5.6/slf4j-api-1.5.6.jar
[ERROR] urls[3] = file:/home/ubuntu/.m2/repository/org/slf4j/jcl-over-slf4j/1.5.6/jcl-over-slf4j-1.5.6.jar
[ERROR] urls[4] = file:/home/ubuntu/.m2/repository/org/apache/maven/reporting/maven-reporting-api/2.2.1/maven-reporting-api-2.2.1.jar
[ERROR] urls[5] = file:/home/ubuntu/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.1/doxia-sink-api-1.1.jar
[ERROR] urls[6] = file:/home/ubuntu/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.1/doxia-logging-api-1.1.jar
[ERROR] urls[7] = file:/home/ubuntu/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar
[ERROR] urls[8] = file:/home/ubuntu/.m2/repository/org/codehaus/plexus/plexus-interactivity-api/1.0-alpha-4/plexus-interactivity-api-1.0-alpha-4.jar
[ERROR] urls[9] = file:/home/ubuntu/.m2/repository/backport-util-concurrent/backport-util-concurrent/3.1/backport-util-concurrent-3.1.jar
[ERROR] urls[10] = file:/home/ubuntu/.m2/repository/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
[ERROR] urls[11] = file:/home/ubuntu/.m2/repository/org/sonatype/plexus/plexus-cipher/1.4/plexus-cipher-1.4.jar
[ERROR] urls[12] = file:/home/ubuntu/.m2/repository/org/apache/maven/shared/maven-common-artifact-filters/1.4/maven-common-artifact-filters-1.4.jar
[ERROR] urls[13] = file:/home/ubuntu/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.15/plexus-interpolation-1.15.jar
[ERROR] urls[14] = file:/home/ubuntu/.m2/repository/org/codehaus/plexus/plexus-archiver/2.2/plexus-archiver-2.2.jar
[ERROR] urls[15] = file:/home/ubuntu/.m2/repository/org/apache/maven/shared/file-management/1.1/file-management-1.1.jar
[ERROR] urls[16] = file:/home/ubuntu/.m2/repository/org/apache/maven/shared/maven-shared-io/1.1/maven-shared-io-1.1.jar
[ERROR] urls[17] = file:/home/ubuntu/.m2/repository/org/apache/maven/shared/maven-filtering/1.1/maven-filtering-1.1.jar
[ERROR] urls[18] = file:/home/ubuntu/.m2/repository/org/sonatype/plexus/plexus-build-api/0.0.4/plexus-build-api-0.0.4.jar
[ERROR] urls[19] = file:/home/ubuntu/.m2/repository/org/codehaus/plexus/plexus-io/2.0.6/plexus-io-2.0.6.jar
[ERROR] urls[20] = file:/home/ubuntu/.m2/repository/org/apache/maven/maven-archiver/2.5/maven-archiver-2.5.jar
[ERROR] urls[21] = file:/home/ubuntu/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar
[ERROR] urls[22] = file:/home/ubuntu/.m2/repository/org/codehaus/plexus/plexus-utils/3.0.8/plexus-utils-3.0.8.jar
[ERROR] urls[23] = file:/home/ubuntu/.m2/repository/org/apache/maven/shared/maven-repository-builder/1.0-alpha-2/maven-repository-builder-1.0-alpha-2.jar
[ERROR] Number of foreign imports: 1
[ERROR] import: Entry[import from realm ClassRealm[maven.api, parent: null]]
[ERROR]
[ERROR] -----------------------------------------------------: org.apache.commons.lang.StringUtils
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginContainerException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :blueflood-all

Regards,
Erdinc

Enable Findbugs / checkstyle check for blueflood

This will allow us to maintain higher quality code & style, allowing more people to contribute to project.

Perform batch writes for rollups

Currently, rollups are written to disk one at a time. We should have a way to write rollups in batches. This would reduce the number of cassandra operations.

Things to keep in mind

Figure out the optimum batch size in terms of performance
Make sure that the slot state is marked as Rolled only after rollups are persisted on disk.

BufferedMetrics counter broken for Http Ingestor

The counter never gets incremented, only decremented.

Counter can be found in HttpMetricsIngestionServer.java:65

Repleace all the ColumnFamilyMapper instances with a single thing

We currently use a separate instance each for normal metrics, preaggregated metrics, and histograms.

There must be a clean way to do this with a single interface.

We're using netty 3 but depending on netty 4.

astyanax-cassandra requires netty 3.

When our web services were initially created, the netty 3 classes were inadvertently used.

We need to fix this, but it's going to require some refactoring, as the HttpRequest/Response API changed a good bit between 3 and 4.

Supporting percentiles and histograms for numeric metrics in blueflood.

I am currently experimenting with some ideas for implementing histograms (and thereby percentiles) for numeric data. I'll update with details when I have more data.

We should have JMX exposures for controlling the Async pipeline for metrics ingestion

Prevent CounterRollup from taking non-integer count

CounterRollup can be built with passing a double/floating value. Prevent this from happening and throw an error when someone tries to build it using a non-integer value.

InstrumentedThreadPoolExecutor should expose better parameters

Currently InstrumentedThreadPoolExecutor exposes only the queue size, but it would be super helpful to have other parameters like active thread count being exposed as well.

Blob format

I'm having trouble reading data directly from Cassandra (from Python).

cqlsh:DATA> DESCRIBE KEYSPACE DATA;
....
CREATE TABLE metrics_5m (
key text,
column1 bigint,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
....

I see the 'value' is of type 'blob'.

!/use/bin/env python

import pycassa
import struct

pool = pycassa.ConnectionPool('DATA', server_list=['c1','c2'])
col_fam = pycassa.ColumnFamily(pool, 'metrics_5m')
res = col_fam.get('7023,70fa66a.diskuse.__srv')

for ts, v in res.items():
print struct.unpack('bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb', v)

.... snip ....
(0, 5, 0, 110, 0, 0, 0, 0, 0, 0, 8, 64, 1, 110, 0, 0, 0, 0, 0, 0, 0, 0, 2, 110, 0, 0, 0, 0, 0, 0, 8, 64, 3, 110, 0, 0, 0, 0, 0, 0, 8, 64)
(0, 5, 0, 110, 0, 0, 0, 0, 0, 0, 8, 64, 1, 110, 0, 0, 0, 0, 0, 0, 0, 0, 2, 110, 0, 0, 0, 0, 0, 0, 8, 64, 3, 110, 0, 0, 0, 0, 0, 0, 8, 64)
.... snip ....

I'm sure that I am missing something simple, but what is the recommended way to read this data directly from Cassandra?

Need integration test to validate that RollupEventHandler is being called properly

No test validates this today.

Zookeeper error when run Blueflood

I am trying to run blueflood-all.jar with the following configurations

log4j.properties:

log4j.appender.console.additionalFields={'environment': 'dev', 'application': 'bf', 'instance_id': '0'}
log4j.appender.console.extractStacktrace=true
log4j.appender.console.addExtendedInformation=true
log4j.appender.console.facilityIsLogger=true
log4j.appender.console.layout=org.apache.log4j.PatternLayout

log4j.logger.httpclient.wire.header=WARN
log4j.logger.httpclient.wire.content=WARN

log4j.category.org.apache.zookeeper.ClientCnxn=WARN
log4j.category.org.apache.zookeeper.client.ZooKeeperSaslClient=ERROR

log4j.logger.org.apache.http.client.protocol=INFO
log4j.logger.org.apache.http.wire=INFO
log4j.logger.org.apache.http.impl=INFO
log4j.logger.org.apache.http.headers=INFO

log4j.rootLogger=INFO, console

java -cp /home/ubuntu/blueflood/blueflood-all/target/blueflood-all-2.0.0-SNAPSHOT-jar-with-dependencies.jar \
-Dblueflood.config=file:///home/ubuntu/blueflood/blueflood-core/src/main/resources/configDefaults/blueflood.properties \
-Dlog4j.configuration=file:///home/ubuntu/blueflood/blueflood-core/log4j.properties \
com.rackspacecloud.blueflood.service.BluefloodServiceStarter

But I am getting these errors:

log4j:ERROR Could not find value for key log4j.appender.console
log4j:ERROR Could not instantiate appender named "console".
log4j:ERROR Could not find value for key log4j.appender.console
log4j:ERROR Could not instantiate appender named "console".

bf-rollups-delay.py support new version of Cassandra with CQL 1.4.0

status error An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was TTransportException: Could not connect to 192.168.1.1:9160
Traceback (most recent call last):
File "blueflood-rollup-delay.py", line 139, in
main(sys.argv[1])
File "blueflood-rollup-delay.py", line 136, in main
raise ex
pycassa.pool.AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was TTransportException: Could not connect to 192.168.1.1:9160

Update python scripts to reflect new endpoint URIs

https://github.com/rackerlabs/blueflood/blob/master/demo/ingest.py and https://github.com/rackerlabs/blueflood/blob/master/demo/retrieve.py need to use the v2.0 endpoints.

This will also require an update to the 10 minute guide to where the output from these scripts is used.

Create wiki page on how to enable cassandra runner for integration tests when creating a new blueflood module

We need better instrumentation for the async pipeline to figure out bottlenecks

demo Dockerfile was out of time

Is there a new Dockerfile to build a container contains blueflood?Thanks.

refactor rollup read/write logic

It's gotten pretty complicated with the latest sets of changes (pre-aggregated metrics support, rollup write batching)

11:21 <@gdusbabek> jburkhart, lakspace: after this we may want to take a few steps back and examine
 how rollups are read and written.  I think we could improve our use of threadpools and make the code cleaner.
11:22 <@gdusbabek> it's getting a little difficult to grok.
...
11:23 <@lakspace> We have to revisit everything from LocatorFetchRunnable 
(that's where the confusion starts, I think)

While we're at it, it might make sense to find a way to do batch reads of metrics when doing rollups. We could see performance gains there.

One potential optimization could be to redo the way we shard locators by using org.apache.cassandra.dht.RandomPartitioner, that way we end up with better locality for all reads for a given shard.

That probably only improves things if number of cass servers goes evenly into number of shards, but that seems like it should describe the vast majority of deployments

Refactor ShardStateWorker

The main reason would be to simplify the boilerplate required to push/pull shard state.

One way to do this would be to refactor StardStateWorker by adding start/stop methods, then creating another class (ShardStateServices?) that has reference each to a push and pull worker that can be managed.

Then the boilerplate becomes simply:

new ShardStateServices().start();

Loading the schema into cassandra fails

I just cloned the blueflood git repo and built the project.
Then I tried to import the schema into cassandra (2.0.16) (also tried on 2.2.3)

Here's the command that I used

cqlsh -f /blueflood/src/cassandra/cli/load.script

But it failed

/blueflood/src/cassandra/cli/load.script:4:Bad Request: Failed parsing statement: [CREATE KEYSPACE DATA
WITH placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
AND strategy_options = {replication_factor:1};] reason: NullPointerException null
/blueflood/src/cassandra/cli/load.script:5:Bad Request: Keyspace 'data' does not exist
/blueflood/src/cassandra/cli/load.script:7:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:8:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:9:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:10:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:11:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:13:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:14:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:15:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:16:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:17:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:18:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:20:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:21:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:22:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:23:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:24:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:25:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:27:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:28:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:29:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:30:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
/blueflood/src/cassandra/cli/load.script:31:Bad Request: line 1:7 no viable alternative at input 'COLUMN'
Aborted (core dumped)

Fix metadatacache not to load single meta from db at a time.

Consider a separate code path for string metrics during ingestion.

We currently keep them intermingled with normal metrics until the end of the async chain. Consider splitting them out for separate processing earlier.

Or getting rid of async stuff.

Grafana support

Is there any progress for the Grafana integration?

Request: Graphite backend

I see there's a statsd backend which is awesome. Requesting a graphite backend so that I can set my carbon-relay to stream metrics to bluefood in the same way we can stream to blueflood-statsd.

Would really allow us (and I'm sure many others) to hook in to blueflood much easier.

We have too many metrics going to graphite (not via statsd) right now that switching to blueflood is going to be super difficult and time consuming. A graphite backend would make this process much easier to get started, allowing us to slowly convert everything else in the future, while still getting the immediate benefits.

Error in running tests on Ubuntu

I get an error under both Java 7 and Java 8 when running mvn test in Ubuntu 15.04:

com.rackspacecloud.blueflood.service.BluefloodServiceStarterTest  Time elapsed: 0.794 sec  <<< ERROR!
java.lang.IllegalStateException: Failed to transform class with name com.rackspacecloud.blueflood.service.BluefloodServiceStarter. Reason: java.io.IOException: invalid constant type: 18
    at javassist.bytecode.ConstPool.readOne(ConstPool.java:1090)
    at javassist.bytecode.ConstPool.read(ConstPool.java:1033)
    at javassist.bytecode.ConstPool.<init>(ConstPool.java:149)
    at javassist.bytecode.ClassFile.read(ClassFile.java:764)
    at javassist.bytecode.ClassFile.<init>(ClassFile.java:108)
    at javassist.CtClassType.getClassFile2(CtClassType.java:190)
    at javassist.CtClassType.subtypeOf(CtClassType.java:303)
    at javassist.CtClassType.subtypeOf(CtClassType.java:318)
    at javassist.compiler.MemberResolver.compareSignature(MemberResolver.java:247)
    at javassist.compiler.MemberResolver.lookupMethod(MemberResolver.java:119)
    at javassist.compiler.MemberResolver.lookupMethod(MemberResolver.java:96)
    at javassist.compiler.TypeChecker.atMethodCallCore(TypeChecker.java:704)
    at javassist.expr.NewExpr$ProceedForNew.setReturnType(NewExpr.java:243)
    at javassist.compiler.JvstTypeChecker.atCallExpr(JvstTypeChecker.java:146)
    at javassist.compiler.ast.CallExpr.accept(CallExpr.java:45)
    at javassist.compiler.TypeChecker.atVariableAssign(TypeChecker.java:248)
    at javassist.compiler.TypeChecker.atAssignExpr(TypeChecker.java:217)
    at javassist.compiler.ast.AssignExpr.accept(AssignExpr.java:38)
    at javassist.compiler.CodeGen.doTypeCheck(CodeGen.java:241)
    at javassist.compiler.CodeGen.atStmnt(CodeGen.java:329)
    at javassist.compiler.ast.Stmnt.accept(Stmnt.java:49)
    at javassist.compiler.CodeGen.atStmnt(CodeGen.java:350)
    at javassist.compiler.ast.Stmnt.accept(Stmnt.java:49)
    at javassist.compiler.CodeGen.atIfStmnt(CodeGen.java:404)
    at javassist.compiler.CodeGen.atStmnt(CodeGen.java:354)
    at javassist.compiler.ast.Stmnt.accept(Stmnt.java:49)
    at javassist.compiler.Javac.compileStmnt(Javac.java:568)
    at javassist.expr.NewExpr.replace(NewExpr.java:206)
    at org.powermock.core.transformers.impl.MainMockTransformer$PowerMockExpressionEditor.edit(MainMockTransformer.java:428)
    at javassist.expr.ExprEditor.loopBody(ExprEditor.java:211)
    at javassist.expr.ExprEditor.doit(ExprEditor.java:90)
    at javassist.CtClassType.instrument(CtClassType.java:1384)
    at org.powermock.core.transformers.impl.MainMockTransformer.transform(MainMockTransformer.java:75)
    at org.powermock.core.classloader.MockClassLoader.loadMockClass(MockClassLoader.java:203)
    at org.powermock.core.classloader.MockClassLoader.loadModifiedClass(MockClassLoader.java:145)
    at org.powermock.core.classloader.DeferSupportingClassLoader.loadClass(DeferSupportingClassLoader.java:65)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at sun.reflect.generics.factory.CoreReflectionFactory.makeNamedType(CoreReflectionFactory.java:114)
    at sun.reflect.generics.visitor.Reifier.visitClassTypeSignature(Reifier.java:125)
    at sun.reflect.generics.tree.ClassTypeSignature.accept(ClassTypeSignature.java:49)
    at sun.reflect.annotation.AnnotationParser.parseSig(AnnotationParser.java:439)
    at sun.reflect.annotation.AnnotationParser.parseClassValue(AnnotationParser.java:420)
    at sun.reflect.annotation.AnnotationParser.parseClassArray(AnnotationParser.java:724)
    at sun.reflect.annotation.AnnotationParser.parseArray(AnnotationParser.java:531)
    at sun.reflect.annotation.AnnotationParser.parseMemberValue(AnnotationParser.java:355)
    at sun.reflect.annotation.AnnotationParser.parseAnnotation2(AnnotationParser.java:286)
    at sun.reflect.annotation.AnnotationParser.parseAnnotations2(AnnotationParser.java:120)
    at sun.reflect.annotation.AnnotationParser.parseAnnotations(AnnotationParser.java:72)
    at java.lang.Class.createAnnotationData(Class.java:3521)
    at java.lang.Class.annotationData(Class.java:3510)
    at java.lang.Class.getAnnotations(Class.java:3446)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.classAnnotations(PowerMockJUnit44RunnerDelegateImpl.java:163)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.getDescription(PowerMockJUnit44RunnerDelegateImpl.java:155)
    at org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl.run(PowerMockJUnit44RunnerDelegateImpl.java:118)
    at org.powermock.modules.junit4.common.internal.impl.JUnit4TestSuiteChunkerImpl.run(JUnit4TestSuiteChunkerImpl.java:102)
    at org.powermock.modules.junit4.common.internal.impl.AbstractCommonPowerMockRunner.run(AbstractCommonPowerMockRunner.java:53)
    at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
    at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
    at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)

My Java versions:

java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)

and

java version "1.8.0_72"
Java(TM) SE Runtime Environment (build 1.8.0_72-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.72-b15, mixed mode)

Have a way of tracking intervals of raw data

This would replace GET_BY_POINTS_ASSUME_INTERVAL and make GetByPoints queries return a number of points closer to the requested amount in cases where data is sent at an interval that does not match GET_BY_POINTS_ASSUME_INTERVAL.

replaces usage of Writer.insertFull with Writer.insertMetrics

This will require a bit of refactoring to properly support strings/booleans in a single method.

Add support for other Rollups like Histogram/Timer/Counter to the Kafka Serializer

Currently the Kafka Serializer takes into consideration only the BasicRollup and puts the type of metric as numeric in the serialized data. But we want to add support for serializing other Rollup types like Histogram/Timer/Counter. In short, make the Kafka Serializer generic.

Compatibility with Java 8

Are there any plans to make BlueFlood ready for Java 8? The main reason for that decision could be that Java 7 is not supported since April 2015.
Currently it is impossible to build BlueFlood with JDK 8 because of the blueflood-core module's failing test:

Running com.rackspacecloud.blueflood.io.serializers.IMetricSerializerTest
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.79 sec <<< FAILURE!
testSetSerialization(com.rackspacecloud.blueflood.io.serializers.IMetricSerializerTest)  Time elapsed: 1.24 sec  <<< FAILURE!
junit.framework.ComparisonFailure: expected:<..."count":9,"hashes":[[746007989,1875251108,98262,103159993,1727114331,-1034140067,98699,1062516268,99644]]}> but was:<..."count":9,"hashes":[[-1034140067,746007989,1875251108,98262,1062516268,1727114331,98699,99644,103159993]]}>
        at junit.framework.Assert.assertEquals(Assert.java:85)
        at junit.framework.Assert.assertEquals(Assert.java:91)
        at com.rackspacecloud.blueflood.io.serializers.IMetricSerializerTest.testSetSerialization(IMetricSerializerTest.java:76)

Also BlueFlood built with JDK 7 throws an exception during start under JVM 8:

2015-08-14 14:56:39 INFO  efloodServiceStarter:302 - Starting blueflood services
2015-08-14 14:56:39 INFO  tionPoolMBeanManager:45  - Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=MyConnectionPool,ServiceType=connectionpool
2015-08-14 14:56:39 INFO  onnectionPoolMonitor:239 - AddHost: 127.0.0.1
2015-08-14 14:56:40 INFO  efloodServiceStarter:76  - Shard push and pull services started
2015-08-14 14:56:40 INFO  efloodServiceStarter:98  - Loading ingestion service module com.rackspacecloud.blueflood.service.HttpIngestionService
2015-08-14 14:56:40 INFO  efloodServiceStarter:106 - Starting ingestion service module com.rackspacecloud.blueflood.service.HttpIngestionService with writer: AstyanaxMetricsWriter
2015-08-14 14:56:40 ERROR Instrumentation     :60  - Unable to register mbean for Instrumentation
javax.management.NotCompliantMBeanException: Interface is not public: com.rackspacecloud.blueflood.io.InstrumentationMBean
        at com.sun.jmx.mbeanserver.MBeanAnalyzer.<init>(MBeanAnalyzer.java:114)
        at com.sun.jmx.mbeanserver.MBeanAnalyzer.analyzer(MBeanAnalyzer.java:102)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.getAnalyzer(StandardMBeanIntrospector.java:67)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.getPerInterface(MBeanIntrospector.java:192)
        at com.sun.jmx.mbeanserver.MBeanSupport.<init>(MBeanSupport.java:138)
        at com.sun.jmx.mbeanserver.StandardMBeanSupport.<init>(StandardMBeanSupport.java:60)
        at com.sun.jmx.mbeanserver.Introspector.makeDynamicMBean(Introspector.java:192)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:898)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
        at com.rackspacecloud.blueflood.io.Instrumentation.<clinit>(Instrumentation.java:58)
        at com.rackspacecloud.blueflood.io.AstyanaxReader.getShardState(AstyanaxReader.java:198)
        at com.rackspacecloud.blueflood.io.AstyanaxShardStateIO.getShardState(AstyanaxShardStateIO.java:19)
        at com.rackspacecloud.blueflood.service.ShardStatePuller.performOperation(ShardStatePuller.java:42)
        at com.rackspacecloud.blueflood.service.ShardStateWorker.run(ShardStateWorker.java:88)
        at java.lang.Thread.run(Thread.java:745)
2015-08-14 14:56:40 INFO  tricsIngestionServer:103 - Starting metrics listener HTTP server on port 19000
2015-08-14 14:56:40 INFO  tricsIngestionServer:112 - Starting tracker service
2015-08-14 14:56:40 INFO  efloodServiceStarter:109 - Successfully started ingestion service module com.rackspacecloud.blueflood.service.HttpIngestionService with writer: AstyanaxMetricsWriter
2015-08-14 14:56:40 INFO  efloodServiceStarter:128 - Started 1 ingestion services
2015-08-14 14:56:40 INFO  efloodServiceStarter:147 - Loading query service module com.rackspacecloud.blueflood.service.HttpQueryService
2015-08-14 14:56:40 INFO  efloodServiceStarter:152 - Starting query service module com.rackspacecloud.blueflood.service.HttpQueryService
2015-08-14 14:56:40 INFO  etricDataQueryServer:73  - Starting metric data query server (HTTP) on port 20000
2015-08-14 14:56:40 INFO  efloodServiceStarter:154 - Successfully started query service module com.rackspacecloud.blueflood.service.HttpQueryService
2015-08-14 14:56:40 INFO  efloodServiceStarter:173 - Started 1 query services
2015-08-14 14:56:40 INFO  efloodServiceStarter:240 - No event listener modules configured.
2015-08-14 14:56:40 INFO  efloodServiceStarter:308 - All blueflood services started

My environment:

Ubuntu 14.04
Maven 3.0.5
Java 7 build: 1.7.0u80
Java 8 build: 1.8.0u51

Split serializers into their own files.

https://github.com/rackerlabs/blueflood/pull/148/files

If a `RollupRunnable` fails in the woods and nobody is there to catch it, did it really fail?

The answer is 'yes', but we have no way of knowing since RollupRunnable swallows all exceptions.

This makes it hard to test, among other things.

A good place to note the exception is the RollupCpontext object that gets attached to the runnable.

change ZOO_KEEPER_CLUSTER key in 10 min guide

If I'm not mistaken, ZOO_KEEPER_CLUSTER should be ZOOKEEPER_CLUSTER

on this page https://github.com/rackerlabs/blueflood/wiki/10MinuteGuide

Documentation is incomplete

Documentation is the most important feature in an open source project.
It's impossible to use this as it is.

Kafka Performance Benchmarking

Here we are documenting our findings on Kafka:

Explore bucketing rows

There are several motivations for this:

Cassandra cache tunables after 1.0 are global--row and/or key caches are on everywhere or off everywhere. If we wish to use row cache, we'll need to tend toward smaller rows.
Bucketed rows will allow BF to be a more viable alternative to high frequency signals. (The current design was conceived with a 30-second period in mind.)

Add a configuration value for specifying the bind address for the HTTP Ingestor

Currently it binds to 0.0.0.0 with no way of configuring it to bind elsewhere.

More details on implementing other ingestion/query protocols

It appears the ingestion protocol layer is pluggable, but based on the quick mention of Thrift in the readme, the details mentioned in the readme around this assume quite a bit of existing working knowledge of BlueFlood internals.

To add a custom ingestion protocol, are these more or less the minimum pieces one must implement?

com.rackspacecloud.blueflood.inputs.handlers:

MyProtocolMetricsIngestionServer
MyProtocolMetricsServerPipelineFactory : ChannelPipelineFactory
MyProtocolMetricsIngestionHandler : ProtocolRequestHandler

com.rackspacecloud.blueflood.inputs.formats:

MyEncapsulationMetricsContainer : MetricsContainer

Further, I'm still digging through, but is the ingestion and query layer hard-coded to use the reference HTTP classes, or is there a configuration value?

It would be great to see more details or small examples on how one might implement a sample layer for something small and simple like MessagePack, or plain-text UDP (similar to statsd).

You can use another database ？

You can use another database ? I want use hbase or Elasticsearch to save monitor data .

ElasticSearch search queries returning empty list.

ISSUE:

root@cassy1:/home/lakshmi# curl -X GET http://localhost:20000/v2.0/tenant-id/metrics/search?query="*"
[]root@cassy1:/home/lakshmi#

Hit elastic search directly:

root@cassy1:/home/lakshmi# curl -X GET "http://cassy1:9200/metric_metadata/_search?pretty=true&q=tenantId:st2analytics-tester&size=121"
{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 135,
</snap>

DEBUG INFO:
We are using Blueflood master (jar with all dependencies).

OS - Ubuntu 14.04

root@cassy1:/home/lakshmi# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"
root@cassy1:/home/lakshmi#

Java 8 runtime (jar probably compiled with java 7)

root@cassy1:/home/lakshmi# java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
root@cassy1:/home/lakshmi#

Relevant elastic search config:

root@cassy1:/home/lakshmi# grep "ELASTICSEARCH" /opt/blueflood/blueflood.conf
ELASTICSEARCH_HOSTS=localhost:9300
ELASTICSEARCH_CLUSTERNAME=elasticsearch
root@cassy1:/home/lakshmi#

OTHER INFO

I browsed the code and it looks like the search is performed on this EVENT_INDEX https://github.com/rackerlabs/blueflood/blob/master/blueflood-elasticsearch/src/main/java/com/rackspacecloud/blueflood/io/EventElasticSearchIO.java#L82

but when metrics are indexed, the index used is metrics_metadata
https://github.com/rackerlabs/blueflood/blob/master/blueflood-elasticsearch/src/main/java/com/rackspacecloud/blueflood/service/ElasticIOConfig.java#L22

Looking at elastic search cluster, I just have the metrics_metadata index.

root@cassy1:/home/lakshmi# curl 'localhost:9200/_cat/indices?v'health status index           pri rep docs.count docs.deleted store.size pri.store.size
yellow open   metric_metadata   5   1        135            0     45.3kb         45.3kb
root@cassy1:/home/lakshmi#

This looks like a bug in configuration settings. It looks like the search should use the index in configuration. Am I missing something or is the fix what I just said? I can make a PR. Just need pointers. Thanks for looking!

NumericSerializer.typeOf should not throw an exception

Instead it should return a type that indicates invalidity. Type.INVALID or something.

Class that manages schema

We need a class that verifies and updates schema on startup. It could start out simple, but we'd want to end up with something that:

Works across versions of Cassandra, beginning with 1.1
Allows schema to be injected by other modules but protects core parts of the schema.

Improvement: reader to support new types and simplified read path.

From IRC:

1. Batch reader should support new types
2. Eliminate single read methods
3. Eliminate scan all CFs.