Coder Social home page Coder Social logo

m3db / m3 Goto Github PK

View Code? Open in Web Editor NEW
4.6K 110.0 445.0 101.6 MB

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform

Home Page: https://m3db.io/

License: Apache License 2.0

Makefile 0.31% Go 96.99% Thrift 0.09% Shell 1.29% HTML 0.77% CSS 0.01% JavaScript 0.29% Dockerfile 0.06% TLA 0.11% HCL 0.05% Jsonnet 0.03% Assembly 0.01%
prometheus kubernetes graphite metrics tsdb query-engine aggregator m3

m3's Introduction

M3

GoDoc Build Status FOSSA Status

M3 Logo

Distributed TSDB and Query Engine, Prometheus Sidecar, Metrics Aggregator, and more such as Graphite storage and query engine.

Table of Contents

More Information

Community Meetings

You can find recordings of past meetups here: https://vimeo.com/user/120001164/folder/2290331.

Install

Dependencies

The simplest and quickest way to try M3 is to use Docker, read the M3 quickstart section for other options.

This example uses jq to format the output of API calls. It is not essential for using M3DB.

Usage

The below is a simplified version of the M3 quickstart guide, and we suggest you read that for more details.

  1. Start a Container
docker run -p 7201:7201 -p 7203:7203 --name m3db -v $(pwd)/m3db_data:/var/lib/m3db quay.io/m3db/m3dbnode:v1.0.0
  1. Create a Placement and Namespace
#!/bin/bash
curl -X POST http://localhost:7201/api/v1/database/create -d '{
  "type": "local",
  "namespaceName": "default",
  "retentionTime": "12h"
}' | jq .
  1. Ready a Namespace
curl -X POST http://localhost:7201/api/v1/services/m3db/namespace/ready -d '{
  "name": "default"
}' | jq .
  1. Write Metrics
#!/bin/bash
curl -X POST http://localhost:7201/api/v1/json/write -d '{
  "tags": 
    {
      "__name__": "third_avenue",
      "city": "new_york",
      "checkout": "1"
    },
    "timestamp": '\"$(date "+%s")\"',
    "value": 3347.26
}'
  1. Query Results

Linux

curl -X "POST" -G "http://localhost:7201/api/v1/query_range" \
  -d "query=third_avenue" \
  -d "start=$(date "+%s" -d "45 seconds ago")" \
  -d "end=$( date +%s )" \
  -d "step=5s" | jq .  

macOS/BSD

curl -X "POST" -G "http://localhost:7201/api/v1/query_range" \
  -d "query=third_avenue > 6000" \
  -d "start=$(date -v -45S "+%s")" \
  -d "end=$( date +%s )" \
  -d "step=5s" | jq .

Contributing

You can ask questions and give feedback in the following ways:

M3 welcomes pull requests, read contributing guide to help you get setup for building and contributing to M3.


This project is released under the Apache License, Version 2.0.

m3's People

Contributors

andrewmains12 avatar arnikola avatar benraskin92 avatar cw9 avatar dgromov avatar fishie9 avatar gediminasgu avatar gibbscullen avatar haijuncao avatar jeromefroe avatar justinjc avatar linasm avatar martin-mao avatar mway avatar nbroyles avatar nikunjgit avatar notbdu avatar prateek avatar rallen090 avatar richardartoul avatar robskillington avatar ryanhall07 avatar schallert avatar soundvibe avatar teddywahle avatar vdarulis avatar vpranckaitis avatar wesleyk avatar xichen2020 avatar yyin-sc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

m3's Issues

Investigate why adding back a node being removed causes high fetch latency

After a node is added back during a remove (backing out of a remove node) - we have noticed that the fetch latency spikes up to 100x the normal latency (of well less than 1s) for quite some time or until the node is restarted. The write latency seems not to be affected or other functions however.

The latency seems to only affect that single node and overall cluster latency is steady using a read consistency of majority or lower.

Refactor series block merging

Changes:

  • Our integration tests caught (yay!) an edge case in repairs during block merging (discussed below). Address that.
  • Introduce a write lock within a series buffer, and modify the current series lock to be used appropriately (can become a read lock on the series, and a write lock on the buffer)
  • Block merging logic exists in both series.buffer and series.blocks. Buffer only attempts to rotating blocks out of the buffer, refactor this to transfer block ownership from buffer.blocks to series.blocks, and perform the merge lazily there.

Integration test issue:

  • Say we have 2 m3db replicas (R0, R1), each with a block for foo at time t0 (block b0 with R0 and block b1 with R1). Each block has content not present in the other, i.e. needs to be merged.
  • At some later time t1, R0 starts repairing the differences - it fetches metadata from its peer, observes the difference, and issues a Merge() on the block.
  • Underneath the covers, this queues a merge to be triggered when the block is read next, and because of the way we order the code, we swap metadata to reflect that of the peer block retrieved. The merge of the blocks is not triggered upon metadata retrieval.
  • So down the road, when R1 starts it's repairs, it asks for metadata from R0 and gets back same metadata it has, and doesn't attempt a merge.
  • Thereby, failing to repair data it should.

Speed up cache shard indices on bootstrap

Right now bootstrapping takes some time simply to open all the file descriptors and read the shard indices when caching the shard indices during a bootstrap.

This could be done with some reasonable level of parallelism to speed this up.

Refactor M3DB to not rely upon commitlog filename timestamp

The fs package of M3DB has a function called "filesBefore" which returns a list of commitlog who's block starts before a given time period. Instead of relying upon the timestamps in the filenames, we should delete this function, and packages that depend on it (currently just the cleanup manager) just use the fs package to get a list of all commitlog files, and then use the ReadLogInfo function in the commitlog package to determine the block start time for each file.

Make `all-gen` idempotent on `master`

Currently running make all-gen creates changes in generated files even when the source files are un-changed. Address this to minimise noise in PRs/overhead on developers.

Switch to model where metadata for unread blocks can be removed from memory

Right now we retain block metadata in memory after we unwire the actual data from memory. This means looking up whether series exist or don't exist is very fast but also means we use a whole lot more memory than necessary. Since our performance currently is adversely affected by how large our heap is due to Go's GC and that we also do not want to be memory bound long term, we need to move to a model where we can look this up on demand.

This requires a significant amount of changes at the shard layer due to our current checks of existence is a simple map lookup, etc.

It is something that can drastically reduce the hardware footprint required for large datasets (100s of TBs).

Why not Prometheus or Influx

this seems like a big engineering effort you guys are reinventing the wheel instead of using one of the current solution. why not use Prometheus or Influx?

Better support for dynamic namespace updates/removals

The current implementation listens to KV for namespace changes, and does the following:

  • (a) If any new namespaces are listed, it creates the corresponding namespaces, bootstraps, and starts serving reads/writes for the new namespaces.
  • (b) If any namespaces currently running in the process are not listed in the KV update, it does NOT remove the corresponding in-memory objects. These changes will be applied when the process restarts.
  • (c) If any namespaces current running in the process have different settings in the KV update (e.g. new retention), it does NOT apply the updates to the in-memory objects. These changes will be applied the process restarts.

We should enhance the code to support (b) & (c) safely.

Add back the ability to blocking flush after a bootstrap

The following PR works around an issue we encountered when naively trying to flush after a bootstrap:
#268

It does this by simply waiting for the next tick and a flush to occur to actually finish the bootstrap process.

Ideally we do actually flush directly after a bootstrap, however we need to coordinate with the ticking procedure to ensure that all buffers have been rotated into each series by waiting for a tick to rotate all the buffers before we can start a flush cycle (otherwise me miss blocks that are just about to rotate in for that time window when we flush and we don't flush them again because the time window is marked flushed).

Add support for writing/reading arbitrary values not just floats

Currently all the interfaces both at RPC and the package level only allow for float64 values to be written and read into M3DB.

There is no need for such such a restriction, as long as users can provide a stream encoder and decoder there's no reason why they can't write/read arbitrary data structures to M3DB for time series data.

Perhaps we can just add some generic write/read methods alongside the current specialized float64 methods.

Avoid redundant creating pools during Options construction

In the production code path, we call storage.NewOptions(), which constructs all underlying pools. And then overwrite them to the pools specified in the configuration. We can save a lot of allocs by avoiding this redundant creation. One possible way to do this is to take a create ctor which doesn't initialise pool values.

We should audit the code-base to see where else this applies โ€“ pretty much all Options which have pools specified need to be considered.

Experiment with drwmutex technique (using CPUID instruction) for faster pooling of objects

@prateek mentioned drwmutex the other day, which uses a CPU instruction that provides the current executing core - albeit only on Linux x86:
https://github.com/jonhoo/drwmutex

All our object pools currently use a channel backed object pool that might be a contention based bottleneck when scaling up further (this should first be proved).

We should investigate when attempting to push the writes per node to the limit using a lockless based object pool that is implemented in Go assembler, at least the get() and the put() calls, and uses CPUID to select the per core queue of pooled objects and returns either the next available pooled object or nil on empty. Currently Go assembler code is not preempted and it is unlikely to ever be, this will make it safe to provide this functionality.

Optimise Commit Log Bootstrapping

Two changes:
(1) The commit log bootstrapper reads all the files present in the commit log directory, regardless of the time range it's bootstrapping for. This can be optimised to only read the correct files.
(2) The commit log bootstrapper is run per namespace, which means we read all commit log files present on disk for each namespace we bootstrap. Optimise this to only require a single read pass across all the namespaces.

Rethink peer streaming fetch block retry semantics

At some point we should rethink our retry semantics here. If any of these fail they won't be retried against the peer we want them from and we will fallback to other peers and attempt to fetch from them instead.

Now that we are doing merged reads for the peers bootstrapper, we can't really fallback to the next peer (we want blocks from all the peers in the case where we need merges).

For now this is fine, but we probably want real retry semantics in the future for trying to retrieve from the same peer on failure.

Address peer bootstrapping retries strategy for nodes that go down

Right now they'll be retried against a different peer, but for checksums that don't agree requests go out to all peers. Thus a retry against a different peer will be a duplicative request and also not be valuable as the entire reason it was issued to that peer was to merge the results together.

Test faulty build using DTest

Using this ticket to track known bad builds we should test once we have a decent (any?) workload being sent in DTests. We should ensure the DTest suite is able to capture these issues:

  • 888db3e is known to have double frees

/cc @robskillington

Address panic in multireader iterator

panic: runtime error: index out of range

goroutine 2815650 [running]:
panic(0xa76940, 0xc42000e0e0)
        /usr/lib/go-1.7/src/runtime/panic.go:500 +0x1a1
code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding/m3tsz.(*readerIterator).Current(0xd2d51e8180, 0xecf9e1f01, 0x0, 0xeefe60, 0x28b4576c1162192c, 0x413c01, 0x0, 0x0, 0x0)
        /var/cache/udeploy/r/statsdex_m3dbnode/sjc1-produ-0000000283-v2/tmp/src/code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding/m3tsz/iterator.go:355 +0x139
code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding.(*multiReaderIterator).moveIteratorToValidNext(0xc9c3fb73b0, 0x7f010f41b378, 0xd2d51e8180, 0x7f010f41b378)
        /var/cache/udeploy/r/statsdex_m3dbnode/sjc1-produ-0000000283-v2/tmp/src/code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding/multi_reader_iterator.go:167 +0x8e
code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding.(*multiReaderIterator).moveIteratorsToNext(0xc9c3fb73b0)
        /var/cache/udeploy/r/statsdex_m3dbnode/sjc1-produ-0000000283-v2/tmp/src/code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding/multi_reader_iterator.go:146 +0x10d
code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding.(*multiReaderIterator).moveToNext(0xc9c3fb73b0)
        /var/cache/udeploy/r/statsdex_m3dbnode/sjc1-produ-0000000283-v2/tmp/src/code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding/multi_reader_iterator.go:95 +0x309
code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding.(*multiReaderIterator).Next(0xc9c3fb73b0, 0xecf9e1ffb)
        /var/cache/udeploy/r/statsdex_m3dbnode/sjc1-produ-0000000283-v2/tmp/src/code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/encoding/multi_reader_iterator.go:67 +0xa9
code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/client.(*blocksResult).mergeReaders(0xc597a55e60, 0xecf9df480, 0xc500000000, 0xeefe60, 0xc43bc842a0, 0x2, 0x2, 0x0, 0x0, 0x0, ...)
        /var/cache/udeploy/r/statsdex_m3dbnode/sjc1-produ-0000000283-v2/tmp/src/code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/client/session.go:1851 +0x174
code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/client.(*blocksResult).addBlockFromPeer(0xc597a55e60, 0xec70c0, 0xc536a033c0, 0xd0a6c55680, 0x0, 0x0)
        /var/cache/udeploy/r/statsdex_m3dbnode/sjc1-produ-0000000283-v2/tmp/src/code.uber.internal/infra/statsdex/vendor/github.com/m3db/m3db/client/session.go:1829 +0x81a

Investigate TestCommitLogBootstrap integration test failing for large blockNum values

The integration test TestCommitLogBootstrap currently passes when the blockNum is set to 30. However if you increase this value to 300 it fails (some series / datapoint simply don't get bootstrapped)

I've narrowed the issue down to the bootstrapper discarding the data due to the following check:

blockStart := dp.Timestamp.Truncate(blockSize)
		blockEnd := blockStart.Add(blockSize)
		blockRange := xtime.Range{
			Start: blockStart,
			End:   blockEnd,
		}
		if !ranges.Overlaps(blockRange) {
			// Data in this block does not match the requested ranges
			continue
		}

Either the test is generating invalid data, or the bootstrapper logic is incorrect and its throwing away correct data.

[bootstrap] Investigate increasing GOMAXPROCS to improve IO performance

I found this interesting discussion recently on improving the IO throughput of Go programs. The tldr is that setting GOMAXPROCS to a value which is higher than the number of cores can improve IO throughput at the expense of a negligible increase in CPU from increased context switching. I know we've been trying to cut down on commit log bootstrap time so I thought I would bring this up as a potential knob we can optimize if IO is a bottleneck contributing to longer bootstrapping times.

cc @robskillington @prateek

Fix approach to generating mocks

The current approach to generating mocks is untenable as we add developers that aren't the original authors, as it requires manual deduction to trace back from the generated code to the process used to generate that code, esp since there isn't a 1:1 mapping between the generated mocks and the interfaces used to generate them. We need a simple reproducible build target that can be used to re-generate mocks for the types that require them.

Either state that M3TSZ is lossy or fix precision loss for some values interpreted as "int like"

This test case fails if adding it to roundtrip_test.go:

func TestExtraPreciseRoundTrip(t *testing.T) {
	start := time.Now().Truncate(time.Second)
	testRoundTrip(t, []ts.Datapoint{
		{Timestamp: start, Value: 484.81953300000004},
		{Timestamp: start.Add(2 * time.Second), Value: 336.34150700000004},
		{Timestamp: start.Add(3 * time.Second), Value: 138.22963599999997},
		{Timestamp: start.Add(4 * time.Second), Value: 442.91275199999995},
	})
}

cc @martin-mao @xichen2020

[database] Investigate use of a distributed Read-Write Mutex

I came across an interesting package recently that implements a distributed read/write mutex and I thought it might be worth investigating for m3db. Under the hood, the package uses a slice of sync.RWMutex whose length is equal to the number of CPUs on the server. Calls to RLock first check which CPU the goroutine is running on and use that to index into the underlying slice and call RLock on the resulting mutex. Calls to Lock, on the other hand, iterate through the slice and call Lock on each mutex. As a result, the lock can lead to pretty drastic performance improvements (the README provides some benchmarks) for workloads which run on servers with a large number of CPUs and for which reads greatly dominate writes. It seemed plausible to me that the workload characteristics of db might be such that it would see improvements, so I figured I might bring it up.

cc @robskillington @prateek

Add ability to cancel a bootstrap during a topology change

Right now a bootstrap will continue to try and complete regardless of whether a topology change invalidates a whole set of shards it is bootstrapping.

To avoid wasted work we should make bootstraps cancellable and ensure we restart the cycle should another topology change occur.

Investigate long bootstrap times for the commit log bootstrapper

This may be simply a performance issue or could be an edge case with the commit log iterator, but it seems when bootstrapping from multiple large commit log files the commit log bootstrapper takes an exceptionally long time - much longer than previously benchmarked on a single file.

Add bootstrapping status endpoint

We need the ability to tell if a m3dbnode is currently bootstrapping for dtest. Expose that in the health endpoint if it's cheap, else expose under a new detailed status endpoint.

Add instrumentation for background processing tasks

Add more instrumentation for background processes (tick/flush/bootstrap/cleanup/...), i.e.
(1) Add logging per namespace (info level),
(2) Add metrics per namespace,
(3) Add logging per shard (probably debug level, with ability to turn up),
(4) Add write/read numbers per shard - useful to gauge load distribution

While we're here, also:
(4) Migrate to using Zap for logging
(5) Use lumberjack to specify output directory for logs, rotation policy
(6) Document dynamic log level changing per namespace/shard via curl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.