zalando-zmon / kairosdb Goto Github PK

This project forked from kairosdb/kairosdb

Fast scalable time series database

License: Apache License 2.0

Groovy 0.98% HTML 9.36% JavaScript 25.37% Python 0.47% Java 63.20% Shell 0.29% Batchfile 0.04% CSS 0.28% Dockerfile 0.02%

cassandra kairosdb timeseries zmon

kairosdb's People

Contributors

Stargazers

Watchers

kairosdb's Issues

Switch GC algorithm to G1

Identify metrics to collect.
Provide evidence of performance enhancement.

Implement a more accurate failure response to data points write

Data points are written in async way that could lead to falsely show write operation failed to the caller.

Example:

[Caller] ----put data points with timeout=5s---> [KairosDB]----data_points_insert takes 10s----|

Caller will assume data points were not written and could start a retry operation which amplifies the load on KairosDB.

Clean filesystem cache

Query cache cleanup is disabled, leading to full disks and degradation of service.

Grafana requests are sent with cache: 0, which skips the query cache

We might need to either:

Enable query cache cleanup
Disable query cache completely

Implement circuit breaker for read CQL queries

Due to compactions in Cassandra certain C* nodes consume all the threads/semaphores in Kairosdb resulting in broken queries for all Grafana Dashboards. This issues is created to analysis and assess impact of implementing a circuit breaker for CQL read queries in Kairosdb.

Fix invalid row_key_split_index query

While querying row_key_split_index, there's a bug in setting time parameters: in second occurrence they are set on positions 1 and 2 instead of 3 and 4 (see the code. This should be fixed, and code has to be simplified to avoid this.

fix kairosdb.http.datapoints_requested

This is supposed to be the number of datapoints returned by KDB HTTP API. However, it looks like this data is not fully valid. The number of datapoints is stored in a class-level variable after every HTTP response, however, it's stored to Cassandra only once in a minute by trigger. So this value is sampled a lot

Upgrade Netty to 4.1.32

Optimize row_key_index to avoid causing Cassandra large partitions

The current implementation could cause large partitions in Cassandra.
Possible solutions:

SOLUTION I
Remove dependency on row_key_index

Can KairosDB work without a row_key_index?

SOLUTION II
time-bucket row_key_index

Rotate row_key_index with data_points

Extend KDB critical query logging

Include the number of data points and rows (alternative C* log available?)

Fork Status

Hi,

Apologies for the possible abuse of this fork's issue tracker but I just wanted to ask about the current state and possible future of this fork of Kairos, and couldn't see a better public channel.

Looking at the network, and the commit history, it seems this fork is receiving a lot of updates, but has diverged a reasonable amount from the original.

If I may ask: is there a significant change in this fork that - for someone looking to start experimenting and integrating with kairosdb - we might be better off looking at this one (Hey, it's got a Dockerfile!! 👍 )

Or do you think the changes are so heavily angled towards your ZMON platform it wouldn't be worth looking at, or some other inherant risks)

Please feel free to delete this ( and email back ;) )

Thanks.

Rob

Describe the problems faced with our time series database

Context

There are lots of problems due to large Cassandra partition sizes, and can lead to grafana not functioning.

What is Expected

Background information on why we're doing this investigations
Description of problems in current code
Description of limitations in current implementation
Risks of not fixing the issue

What is NOT expected:

Possible solutions

Acceptance

One document where all team members collect and collate their information.
All team members must review and provide feedback

Get rid of tag value and tag name caches

Just as memo that might either help to fix Kairos or to remove unused code.

User story

As a developer of KairosDB fork
I want to remove all functionality related to storing tag names and tag values in Cassandra table string_index
So that there is no dangling and unused functionality and no more unnecessary requests to Cassandra in putDataPoint method

Explanation

Current fork of KairosDB supports storing tag names and tag values in Cassandra table string_index . It keeps them in keys tag_names and tag_values. However, those values are never read. Because actual tags and values are fetched by selecting latest 9999 data row keys for a specific metric. Start looking from org.kairosdb.core.http.rest.MetricsResource#getMeta which serves "/datapoints/query/tags" endpoint. Eventually you'll reach org.kairosdb.datastore.cassandra.CassandraDatastore#queryMetricTags:

    public TagSet queryMetricTags(DatastoreMetricQuery query) {
        TagSetImpl tagSet = new TagSetImpl();
        Collection<DataPointsRowKey> rowKeys = getKeysForQueryIterator(query, 9999);

        MemoryMonitor mm = new MemoryMonitor(20);
        for (DataPointsRowKey key : rowKeys) {
            for (Map.Entry<String, String> tag : key.getTags().entrySet()) {
                tagSet.addTag(tag.getKey(), tag.getValue());
                mm.checkMemoryAndThrowException();
            }
        }

        return (tagSet);
    }

It works so, because every data row key contains associated tags.
Thus, tag_names and tag_values keys in string_index table and all related functionality is, in fact, not needed and we can safely get rid of it.
However, metric_names key in string_index table is still in use. Thus, just dropping whole string_index table is not an option.

This change is specific for ZMON and drags the fork further from original KairosDB.

Expose datapoints requested rate

Similar to datapoints ingested rate, we need a similar metric for read/consumed datapoints.