The xephon-k from xephonhq

Bench KairosDB

It got similar performance with Xephon-K, though it seems it lose connection to cassandra?

	//INFO[0010] basic report finished via context pkg=k.b.reporter
	//INFO[0010] total request 14191 pkg=k.b.reporter
	//INFO[0010] fastest 237171 pkg=k.b.reporter
	//INFO[0010] slowest 349506588 pkg=k.b.reporter

Data File Encoding Timestamp and Value

Tracker #32
Related #45 (Writer) #46 (Reader)

Actually the encoding problem is not limited to on disk store, it applies to cold data in memory, and data stored in external storage (when treating them as large blob store), so it might be better to put in pkg/encoding instead of pkg/storage/disk/encoding

general

interface for encoder decoder
(discussion) let the encoder decoder write to underlying writer directly instead of returning a large chunk of encoded bytes since we already have the writer buffered

non-compression encoding

a1da8aa

bigendian
- timestamp (int64 instead of uint64 because there might be historical data older than unix epoch)
- int64
- float64
- string?
  - need to store the length , and is how golang represent string universal? (kind of lost about the string encoding stuff, and I'd like to store stuff like 服务器着火了, 管理员跑路了)
- boolean? (I don't think we have plan to support it for now)
small endian
- simply change the BigEndian to LittleEndian should work, I suppose there is some switch for that, so we don't need to to use generator? There is an interface called ByteOrder, but do note binary.BigEndian and binary.LittleEndian are package level variable instead of struct type, so you can only distinguish them via the String method
- maybe we can use unsafe to cast int to byte slice directly
benchmark to see which one is faster

variable length encoding

d8bdaef

PutUvarint from encoding/binary
- signed int is encoded using zig-zag encoding https://gist.github.com/mfuerstenau/ba870a29e16536fdbaba
it should be able to apply to float after using math.Float64bits
C/C++ https://techoverflow.net/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/

delta encoding

delta
delta+delta
gorilla tsz? might simply modify https://github.com/dgryski/go-tsz like influx and prometheus guys will do I suppose
- https://github.com/facebookincubator/beringei/blob/master/beringei/lib/TimeSeriesStream.cpp

delta + RLE

RLE is implemented, but it actually makes encoded data bigger without delta
delta + RLE should perform pretty well if the data comes at regular interval
influxdb seems only use RLE when one block can be encoded as a whole, so there can't be multiple RLE?

general compression

one problem is how can fit those general purpose encoding into current codecs
zlib
gzip
lzw
a replacement for std lib https://github.com/klauspost/compress
https://github.com/golang/snappy
https://github.com/pierrec/lz4

Ref

Apache Orc https://orc.apache.org/docs/encodings.html
Apache Parquet https://github.com/apache/parquet-format/blob/master/Encodings.md

Add startup banner

Not very useful (根本没用好么), but I like it

Bench Prometheus

The problem is .... can't find the API endpoint for using HTTP API for push

TODO

~~support gRPC in the loader~~ this does not work, it does not have a gRPC server, the best I can find is https://github.com/prometheus/prometheus/tree/master/documentation/examples/remote_storage
which means we need to find the scrapper code to see how they call the underlying storage layer, seems to be Append method in retrieval/target.go, there is an interface called storage.SampleAppender which is implemented by the local storage

Ref

https://github.com/prometheus/prometheus/blob/master/storage/remote/client.go#L64 it has gRPC endpoint for pushing data, though I didn't find the default url, I guess there is no url when using gRPC? or they are just using protobuf without using gRPC

Replace sort.Sort with sort.Slice

https://golang.org/pkg/sort/#Slice
https://zhuanlan.zhihu.com/p/26608856
stable or not stable?
- stable for sorting points
- not stable for sorting series ids
benchmark?

Go1.8 发布后带来一些新特性，其中有两个特性对我们比较有用。第一是 GC 进一步优化，很多时候 GC 能够在 10 μs 内完成，这点对提高 TiDB 稳定性很有用。第二点是增加了 sort.Slice 方法，TiDB 中有大量的排序需求，我们经过 benchmark 发现这个方法比以前的 sort.Sort 有不小的优势

Wrap go-kit services

Most services have a common interface, though current there is only three services, info, write, read, it's better to have a unified the interface, so the server can easily generate them

Different with the go-kit example

write makeServiceEndpoint as the service's method makeEndPoint, though it is only called when creating the server
write decodeServiceRequest as the service's method decodeRequest

TODO

Container Metrics Collector

cAdvisor seems to be the de facto one, but it has trouble with the storage google/cadvisor#1458

TODO

make a fork and push its data into Xephon-K
(long term) the TSDB proxy project https://github.com/xephonhq/tsdb-proxy

Ref

Bench InfluxDB extremely poor performance

Based on other reports, it should be much faster 400k/sec (though if it is point, then we have batch size as 100, it is 4k/sec ...... ) However in current test, for 5 seconds, it can only handle around 100 requests, with each has size of 100, that's 2k/sec .....

Possible problems

wrong schema?
wrong configuration, I am using the out of box configuration
~~serialize problem in loader~~ the start and end time does not counter the time of serialize payload, but it duration is still very long
- https://docs.influxdata.com/influxdb/v1.2/administration/config/#meta
- cache-snapshot-memory-size = 26214400, which is 25 MB (too small)
- The cache snapshot memory size is the size at which the engine will snapshot the cache and write it to a TSM file, freeing up memory

Maybe it's better to investigate other TSDB ... prometheus seems to be best choice for now

Ref

https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit#gid=0

Record bench metrics and store in TSDB

As mentioned by YCSB https://github.com/brianfrankcooper/YCSB/wiki/Running-a-Workload

While a histogram of latencies is often useful, sometimes a timeseries is more useful.

Process metrics collector

Related #21

TODO

understand /proc/pid

Ref

https://github.com/iovisor started it long time ago, found @gaocegege stared it recently

Timestamp precision (millisecond) does not match struct field name (TimeNano)

Currently the time field is called TimeNano in both IntSeries (common) and Generator. But collector and test cases are using millisecond. This is due to following reasons:

the IntSeries code is copied from Xephon-B which use TimeNano
Cassandra and KairosDB is using millisecond (Cassandra's timestamp if in millisecond), thus the early develop in last quarter just keep using millisecond under the name TimeNano
JavaScript's timestamp is in millisecond Date.now()

Example of different precision Unix time-stamp

2147483647 int 32 max value
1495087243 time.Now().Unix()
1495087204207 time.Now().Unix() * 1000
1495087266782634656 time.Now().UnixNano()

Drawbacks of nano second

data are more likely to have irregular time interval
need to truncate it when storing to Cassandra if still using timestamp as type
need to multiple 10e6, 10e9 if incoming data is in millisecond, second

Drawbacks of second & millisecond

it seems there are no performance boost when computing over millisecond and nanosecond on 64 bit machine, since both need int64. (even second may has same performance on 64 bit machine)
some use case need nanosecond, like BTrDB's energy grid sensor (though I don't know why)

TODO

Endianness

Came across the little and big endian problem when trying to start the on disk storage #32 . Looked around some existing tsdb but couldn't figure out which they are really using and why

Prometheus

the new tsdb seems to be using BigEndian https://github.com/prometheus/tsdb/blob/master/encoding_helpers.go (only have BE, no LE)
the old seems to be using LittleEndian https://github.com/prometheus/prometheus/blob/master/storage/local/persistence.go#L574
- but there is a encoding helper which only have BigEndian as well https://github.com/prometheus/prometheus/blob/master/storage/local/codable/codable.go#L113

InfluxDB

seems to be BigEndian, at least for header and checksum https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/writer.go
but for the data, they don't specify encoding explicitly in https://github.com/influxdata/influxdb/blob/master/tsdb/engine/tsm1/encoding.go

SQLite

use BigEndian http://www.sqlite.org/fileformat2.html

Gob

Floating-point numbers are always sent as a representation of a float64 value. That value is converted to a uint64 using math.Float64bits. The uint64 is then byte-reversed and sent as a regular unsigned integer. The byte-reversal means the exponent and high-precision part of the mantissa go first. Since the low bits are often zero, this can save encoding bytes. For instance, 17.0 is encoded in only three bytes (FE 31 40)

Parquet

use little-endian https://github.com/Parquet/parquet-format/blob/master/Encodings.md

Arrow

use little-endian https://github.com/apache/arrow/blob/master/format/Layout.md

BDB

https://docs.oracle.com/cd/E17275_01/html/programmer_reference/am_misc_faq.html
use little endian result in bad performance because BDB sort integer as byte strings

small endian system

254	fe 0 0 0
255	ff 0 0 0
256	 0 1 0 0
257	 1 1 0 0

sort badly

big endian system

254	0 0 0 fe
255	0 0 0 ff
256	0 0 1 0
257	0 0 1 1

Config Docker volume

On Ubuntu, it seems aufs is the default, there is also overlayFS, and mount volume

https://docs.docker.com/engine/userguide/storagedriver/aufs-driver/
there should be some performance overhead when we run db in containers, but need to measure them https://stackoverflow.com/questions/25491212/docker-container-io-performance
btw: data is persist after container is stopped https://stackoverflow.com/questions/28574433/do-docker-containers-retain-file-changes
docker volume ls
it seems InfluxDB and Cassandra all have volume configured in Dockerfile
- https://github.com/influxdata/influxdata-docker/blob/3a8019600cefcb4ffc85c3e3a155980d2dc3f5ff/influxdb/1.2/Dockerfile#L22
- https://docs.docker.com/engine/reference/builder/#volume
https://docs.docker.com/engine/tutorials/dockervolumes/#add-a-data-volume

Data File Writer

Tracker #32
Related #46 (Reader) #47 (Encoding)

Each data file is a self contained file, it contains the full information of all the series stored in it (not just series id). The full doc can be found in pkg/storage/disk/data_writer.go

header
footer
data block
- encoding #47
index
- index of indexes
- IndexEntries, also contains series meta data, all encoded using protobuf
  - IndexEntry
- aggregated values, min, max, time, avg etc. we need them to skip loading certain blocks into memory
  - min, max time is added to the Series interface
index of index
checksum
- use flag to enable and disable it, if we disable checksum, we write 0 as place holder, if checksum value is 0, we skip the checksum check when read

Remove go-kit

Related to #6

It is not the best time to do this kind of refactor, but I just can't control it

use plain net/http to serve HTTP API
support grpc server
- client
internal metrics #28

Meta index and store

This is a tracking issue

When trying to implement Web UI #19 , the query need to filter series using tags, which previous implementation did not deal with.

As InfluxDB guys mentioned a tsdb can be treated as two parts, the index and the data
(Prometheus guys also mentioned it https://fabxc.org/blog/2017-04-10-writing-a-tsdb/ )

Data File Reader

Tracker #32
Related #45 (Writer)

Need to read what you write

Expose internal metrics

We need to expose some metrics like memory usage, counters for monitoring and benchmarks

TODO

Golang

GC
Heap?

Meta store

number of indexes
longest postings

Disk store

Distributed

Ref

Golang

Instrument and Tracing

https://github.com/spacemonkeygo/monkit

Database

HTTP Write Service

Depends on

TODO

GoDoc chose comment generated by protobuf as package documentation

i.e.

So the fact is, godoc put all the top comments as package level comment, but code generated by protobuf use /* indent ... */ so it got highlighted. It's not a problem for normal usage of protobuf since they might be in a package of its own, while for xephon-k, we use struct generated by protobuf directly.

One solution is to automatically remove the comment and only keep the DO NOT EDIT. Btw: the DO NOT EDIT for generated double series is also shown in package level comment.

remove protobuf comment
remove DO NOT EDIT for double series in godoc (not in source code)
add example (I think I have created a similar issue somewhere since golang support run in doc https://blog.golang.org/examples )

In general, current godoc is a mess ...

Bench InfluxDB

Realted to #12

~~create schema using API (just too lazy to use the shell~~ it seems the best we to do it is to use shell
serialize to its line protocol
~~the driver for load the data~~ no need, just need to know the endpoint

Tags in series are passed by reference not value

The GetTags() method in common.Series implemented by common.IntSeries and common.DoubleSeries simply return its filed, which is a reference, this cause the following two problems

Memory, SeriesStore only store reference to the Tags of the original series, this would result in the original series can't be garbage collected I believe, though the impact is not large when we have small cardinality because we only reference the first series
Disk, IndexEntries need to store series information like name, tag as well, one easy way is to use json to encode the meta, and put Name as "__name__": "cpu.idle", but if we do tags["__name__"] = name it will change the original series as well

// NewIntSeriesStore creates a IntSeriesStore
func NewIntSeriesStore(s common.IntSeries) *IntSeriesStore {
	series := common.NewIntSeries(s.Name)
	series.Tags = s.Tags
	series.Precision = s.Precision
	// TODO: maybe we should copy the points if any
	series.Points = make([]common.IntPoint, 0, initPointsLength)
	return &IntSeriesStore{series: *series, length: 0}
}

Possible solutions are

add GetTagsCopy to return a copy of map[string]string
add GetMetaSeries to return a new MetaSeries, which can be embed in SeriesStore and IndexEntries e4972f6

In Memory inverted index

Related to #25

TODO

add ca30754
get
- support get by tag key + value
- support get by filter 20e8083
intersect 6047028 to a52a68a
union f21084a to f37f865
API for query index only

NOTE

ca30754 was planing to use tagKey as map key, map[string]InvertedIndex and use tagValue as term for InvertedIndex, but this won't work if key have more than one value, and one solution is to use map[string][]InvertedIndex, the requires log(n) to find the tagValue and the list of InvertedIndex must be sorted
ca30754 however, when using tagKey=tagValue as map key and term, map[string]InvertedIndex, you can't do query for hasKey(key1), thus we need to maintain another nested map for tagKey and tagValue map[string]map[string]bool where the first key is tagKey and the second key is tagValue, bool is used cause golang don't have built in set. https://github.com/dyweb/gommon/blob/master/structure/set.go
6047028 in fact, the intersect is just the join in RDBMS, which Pavlo's class has mentioned for sure
18d9d93 exponential search seems be pretty useful when you have 1, [2, 3, 4, 5, 6] since full binary search need log(5) where exponential search is just log(1) ....
Union also got a paper k-way merging and k-ary sorts http://cs.uno.edu/people/faculty/bill/k-way-merge-n-sort-ACM-SE-Regl-1993.pdf

Possible Improvement

add a fixed size map as cache to avoid checking inverted index's postings (the id list), because if it is already in postings, there is no need to insert it, and it's still log(n) using binary search. And for normal workload, it's likely the cache can cache a lot
- Robin Hood Hashing https://github.com/influxdata/influxdb/blob/tsi/pkg/rhh/rhh.go
- Golang use chaining http://shlegeris.com/2017/01/06/hash-maps
Union is now using linear search merge, which is O(nk), it would be better to use Heap merge, but using default container/heap might be slow due to function calls, and I am not actually sure if Union is right, it might be I didn't figure out the test case that would break it
Support regular expression https://swtch.com/~rsc/regexp/regexp4.html

Ref

https://blog.twitter.com/2016/omnisearch-index-formats

RESTful API

Related #3, #6

Orignally I want to provide gRPC API instead of JSON API, however JSON API is more easier to develop and test and can work with visualization solution like Grafana

Example and/or e2e test

~~TL;DR use python~~ The Web UI itself #19 could be a good test for read

Currently e2e test is down via postman, which is not portable and not keeping track of, it is OK to use Golang to do the test, but since the request and response format is changing very often at this early stage, a static language is not very flexible, Node.js is cool, but async is just a burden when I want to test things out.

TODO

generate fake series
test inserting series
test read
load data from CSV

Ref

System wide metrics collector

Related #10 #19

Need to feed some real data for test/demo, and the best one is the machine metrics

TODO

I think I have related issue, and have a collection of libraries like go-sigar
- Process management in at15/hadoop-spark-perf#4
system
- mem d413b19
- cpu 15aaeec
- load
- /proc/stat have cpu usage, need to find the man page http://man7.org/linux/man-pages/man5/proc.5.html
  - http://www.linuxhowtos.org/System/procstat.htm
  - use scanf and only support kernel 4.x
- disk
  - https://github.com/minio/minio/blob/master/pkg/disk/stat_linux.go use syscall.Statfs

/proc/stat

CPU 15aaeec
ctx, process running 56dbf39
intr
softirq

/proc/meminfo d413b19

/proc/self

Ref

Support Protocol Buffers

use https://github.com/gogo/protobuf instead of official
play with protobuf
don't copy data from protobuf generated struct to own struct
migrate old code
- support cache series ID, may need to add ID to meta
support transport in go-kit

Ref

Protobuf 是比较标准的 rpc 序列化工具，被广泛的应用，TiDB 也用 protobuf 作为序列化工具。当涉及到大数据量传输以及大量的请求时，rpc 的效率至关重要。我们开始的时候用的 golang/protobuf，但是在 profile 的时候发现速度并不理想，并且会申请很多内存，后来看到了 gogo/protobuf 这个项目，提供更快的编解码速度，并且对于原始类型，可以设置 nullable=fasle 标签，一方面减少指针的数量，减少内存分配次数，另一方面代码写起来也方便些。

Simple Web UI

Using graph libs directly is faster than using Grafana

Waiting for #21 #9 (I don't think the read API is working)

TODO

Chart ~~(for drawing bench result)~~

http://echarts.baidu.com/gallery/editor.html?c=mix-line-bar
http://echarts.baidu.com/demo.html#bar1
tiny chart in svg https://github.com/benpickles/peity
~~https://plot.ly/javascript/bar-charts/~~
~~https://developers.google.com/chart/interactive/docs/gallery/barchart~~ this is what cadvisor is using
https://plot.ly/javascript/time-series/

system metrics reported by collector always have a length of 10

Related #21

When using xkc to report system metrics, the series length is always 10. And the collector buffer 10 points for each series.

DEBU[0038] mem:data create new entry cpu.6.user in map pkg=k.s.mem 
DEBU[0038] mem:data create new entry cpu.total.nice in map pkg=k.s.mem 
DEBU[0038] mem:data create new entry cpu.2.system in map pkg=k.s.mem

DEBU[0113] mem:data merge with existing series cpu.0.iowait pkg=k.s.mem 
DEBU[0113] length 10 pkg=k.s.mem 
DEBU[0113] mem:data merge with existing series cpu.5.irq pkg=k.s.mem 
DEBU[0113] length 10 pkg=k.s.mem

But test in series_store_test.go shows it correctly handled merge data, one explanation could be the collector is using same time stamp

TODO

write a test to reproduce this problem
- reduce number of series
every time you restart the collector, the length can grow, if you restart collector, the length will become 20, restart collector again, 30.
the collector never reset the points for each series
the serializer has problem, it is not reset? Yes
- but the server does not report any error?

Conclusion

the serializer is not reset after each send, and server drop the invalid part silently when parsing JSON

JSON format

compare existing tsdb formats
- https://github.com/xephonhq/xephon-k/blob/master/doc/impl/api-write.md
- https://github.com/xephonhq/xephon-k/blob/master/doc/impl/api-read.md
custom serializer and deserializer 289e0f5
timestamp format,~~nanosecond?~~ millisecond

Cassandra Backend

Naiveschema + Tags

cli for creating the schema (though actually not very needed) 7dfc37a
current naive schema with tags is WRONG, tags should be in partiton key .... 16b0675
gocql can handle map properly
but you can't filter using map, cassandra think this is a very expensive operation and you must use allow fitlering
read 60cccf7
write 16b0675
unify the keyspace to avoid sprintf feb07ab
split int and double table feb07ab
use buffer like KairosDB did, write large batch into Cassandra, currently just use the batch size of client

Can't decode nested array in JSON

Related #9

type readRequest struct {
	StartTime int64          `json:"start_time,omitempty"`
	EndTime   int64          `json:"end_time,omitempty"`
	Queries   []common.Query `json:"queries"`
}

You would always get Queries as empty, I don't know if nested object will work, but that's not my preferred way, and what would it be for protobuf, maybe there need to be a convert process?

http://stackoverflow.com/questions/24377907/golang-issue-with-accessing-nested-json-array-after-unmarshalling

http too many open file

When benchmarking in localhost, got the error, could because the loader and the server is in the same machine, need to containerize xephon-k to confirm this.

Seems related to traefik/traefik#157 and traefik/traefik#127 Though I am not sure if the MaxIdleConnection per host applies to server, I know it works for client.

Unmarshal IndexEntries is empty

Found this when PrintAll is implemented in data_reader.go. After add more logging, I found the id wrote is not the id read. we can see the index bytes are all zero except the count

INFO[0000] Meta:<id:15836903516738211145 type:1 precision:1000000 name:"s" tags:<key:"machine" value:"machine-01" > tags:<key:"os" value:"ubuntu" > > Entries:<Offset:9 BlockSize:34 >  pkg=k.s.disk 
INFO[0000] [10 57 8 201 154 195 202 244 250 254 227 219 1 16 1 24 192 132 61 34 1 115 42 12 10 2 111 115 18 6 117 98 117 110 116 117 42 21 10 7 109 97 99 104 105 110 101 18 10 109 97 99 104 105 110 101 45 48 49 18 4 8 9 16 34] pkg=k.s.disk 
INFO[0000] write: id 15836903516738211145 pkg=k.s.disk 
INFO[0000] write: index offset 0 pkg=k.s.disk 
INFO[0000] write: index length 65 pkg=k.s.disk 
INFO[0000] write: full bits of index pkg=k.s.disk 
INFO[0000] [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] pkg=k.s.disk 
INFO[0000] read: size 153 idx offset 43 idx of idx offset 65 series count 1 pkg=k.s.disk 
INFO[0000] read: full bits of index pkg=k.s.disk 
INFO[0000] [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0] pkg=k.s.disk 
INFO[0000] read: id 4294967296 pkg=k.s.disk 
INFO[0000] read: index offset 0 pkg=k.s.disk 
INFO[0000] read: index length 0 pkg=k.s.disk 
Print all data in /tmp/xephon680946347
size: 153 series count: 1
index size: 85
id: 4294967296 blocks: 0 meta: {%!s(uint64=0) %!s(int64=0) %!s(int64=0)  map[]}
All data printed

The reason is in func (idx *LocalDataFileIndexWriter) WriteAll we wrote to the marshaled bytes of IndexEntries (b) instead of pre-allocated index slice.

index := make([]byte, 4+IndexOfIndexUnitLength*len(ids))
binary.BigEndian.PutUint32(index[:4], uint32(len(ids)))
for i, id := range ids {
   b, err := entries.Marshal()
   start := 4 + i*IndexOfIndexUnitLength
   binary.BigEndian.PutUint64(b[start:start+8], uint64(id))
   binary.BigEndian.PutUint32(b[start+8:start+12], uint32(N))
}

After fixed this problem in 430db5f the print still got empty output, but another strange thing is when ReadAllIndexEntries is called twice, the full bytes are printed twice, the problem is when looping use range, golang makes a copy of the value, so the update of the value is not reflected to the value stored in the map, the solution is to use pointer, does it work if I have the following?

for k, v := range m1 {
   v2 = m1[k]
   v2.prop1 = "abc"
  // what is m1[k] now?
}

INFO[0000] write: IndexEntries Meta:<id:15836903516738211145 type:1 precision:1000000 name:"s" tags:<key:"machine" value:"machine-01" > tags:<key:"os" value:"ubuntu" > > Entries:<Offset:9 BlockSize:34 >  pkg=k.s.disk 
INFO[0000] write: full bytes of IndexEntries pkg=k.s.disk 
INFO[0000] [10 57 8 201 154 195 202 244 250 254 227 219 1 16 1 24 192 132 61 34 1 115 42 12 10 2 111 115 18 6 117 98 117 110 116 117 42 21 10 7 109 97 99 104 105 110 101 18 10 109 97 99 104 105 110 101 45 48 49 18 4 8 9 16 34] pkg=k.s.disk 
INFO[0000] write: id 15836903516738211145 pkg=k.s.disk 
INFO[0000] write: index offset 0 pkg=k.s.disk 
INFO[0000] write: index length 65 pkg=k.s.disk 
INFO[0000] write: full bytes of index pkg=k.s.disk 
INFO[0000] [0 0 0 1 219 199 251 215 73 80 205 73 0 0 0 0 0 0 0 65] pkg=k.s.disk 
INFO[0000] read: size 153 idx offset 43 idx of idx offset 65 series count 1 pkg=k.s.disk 
INFO[0000] read: full bytes of index pkg=k.s.disk 
INFO[0000] [219 199 251 215 73 80 205 73 0 0 0 0 0 0 0 65] pkg=k.s.disk 
INFO[0000] read: id 15836903516738211145 pkg=k.s.disk 
INFO[0000] read: index offset 0 pkg=k.s.disk 
INFO[0000] read: index length 65 pkg=k.s.disk 
INFO[0000] read: full bytes of IndexEntries pkg=k.s.disk 
INFO[0000] [10 57 8 201 154 195 202 244 250 254 227 219 1 16 1 24 192 132 61 34 1 115 42 12 10 2 111 115 18 6 117 98 117 110 116 117 42 21 10 7 109 97 99 104 105 110 101 18 10 109 97 99 104 105 110 101 45 48 49 18 4 8 9 16 34] pkg=k.s.disk 
Print all data in /tmp/xephon608830327
size: 153 series count: 1
index size: 85
INFO[0000] read: full bytes of IndexEntries pkg=k.s.disk 
INFO[0000] [10 57 8 201 154 195 202 244 250 254 227 219 1 16 1 24 192 132 61 34 1 115 42 12 10 2 111 115 18 6 117 98 117 110 116 117 42 21 10 7 109 97 99 104 105 110 101 18 10 109 97 99 104 105 110 101 45 48 49 18 4 8 9 16 34] pkg=k.s.disk 
id: 15836903516738211145 blocks: 0 meta: {%!s(uint64=0) %!s(int64=0) %!s(int64=0)  map[]}
All data printed

Use Series intreface for both IntSeries and DoubleSeries

Currently, only IntSeries is supported, and on the way of change time precision #35 (which is going over all the code), I decided to add double support at same time. And during work on hash the series #36, the new Series interface is added, so the API service, storage may all use this interface to avoid duplicate code and/or code generation at a cost of some condition and type cast (which I don't know how much it would cost).

TODO

read service #9 a29d4cf
write service #8 0db794b
in memory store #3
double series code is copy and pasted
on disk store (which has nothing implemented for now, it's the end of the quarter....) #32
cassandra store #11

Future

update the Cassandra backend, just need to figure out a way to not let it crash the build for now

Switch to Vagrant from Docker on Fedora

Fedora requires(has)

start docker service
sudo when run a image
sudo when host program connect to docker mapped ports
strange network problem, i.e. can run spark example, may have to do with default network and security configuration (selinux?)

So one easy solution is to switch to vagrant and built a box with Cassandra etc. included, it could also benefits people who want to develop on windows (if any)

Related xephonhq/xephon-b#4

Read query semantics

Now need to have the syntax and semantics for read #9 , mainly three problems

filter the series using tag
down sample (aggregate) on the fly
group by (operation across series)
- http://stackoverflow.com/questions/2421388/using-group-by-on-multiple-columns

Refactor payload serializer

Existing serializer only have one method WriteInt, which makes it thread safe, since the method use nothing from the object, but this introduce the problem when we want to write multiple series

A better implementation might be

WriteOpen() { buf.Write("[") }
WriteInt( .....
WriteDouble( ....
WriteClose() { buf.Write("]") }
Reader() // or maybe we can make it a reader closer
Reset()

deal with comma in json array
- one way is to use a flag to indicate if this is the first element in array
- a better would be always add comma, and remove the last one, when finalize
- might use truncate

In Memory Store

Though at start, the motivation is to rewrite KairosDB in Golang, a in memory store become very important for the following reasons

needed by tsdb-ql which is also the idea comes from
improve performance, it can be a write through/back cache
if no compression is added, a in memory store is not very hard to write

TODO

hash series, name + sorted tags 6c9eb50
write series to correct IntSeriesStore 28ac98a
index tags
- can't figure out a good data structure yet, might use set and do set join?
- now using inverted index, see #27
read
- read all by name
- read using filter by time and tag
flush/drop data periodically or by threshold
write to disk #32

Ref

https://github.com/allegro/bigcache

Use integer for series id

Currently, string is used for series id, this brings much more overhead for storage and computing. We are now using MD5.

the length of the hash output 64 bit
use ~~raw byte OR int64~~ (uint64?)
try non crypto hash https://golang.org/pkg/hash/#pkg-subdirectories
benchmark non crypto hash and crypto hash https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go
- inline fnv > fnv > md5 (> means faster)
inline fnv can't be used as writer -> simply call Write([]byte("str"))

cannot use h (type InlineFNV64a) as type io.Writer in argument to io.WriteString:
InlineFNV64a does not implement io.Writer (Write method has pointer receiver)

Add a Hashable interface for both Series (Int and Double ) and Query

Ref

InfluxDB's implementation is in models, points.go HashID, which use FNV64a

Command line for metrics collector

HTTP Read Service

Depend on

mem data read #3

TODO

[Test] Switch to BDD test before the test grows large

Current test are just plain golang unit test using assert library, which works perfectly with IDE like Gogland but hard to know what is doing in a glance.

Some candidates

https://onsi.github.io/ginkgo/

Columnar format in memory representation for series

Currently we are using a row format as series representation

Pros

each point is an object, has the overhead of object creation and destruction (should have some profile)
sometime you just need the value
we may have table like representation, where multiple columns share a same time column

Cons

when sorting, you can swap the point instead of swapping two arrays, the latter is not supported by sort package
- but we can inline the sorting code, unless there are special optimization for std lib in golang

row based

type IntSeries struct {
     SeriesMeta
    Points []IntPoint
}

type IntPoint {
   T int64
   V int64
}

column based

type IntSeriesColumnar struct {
   SeriesMeta
  T []int64
  V []int64
}

type TableSeriesColumnar struct {
   SeriesMeta
   FieldsMap map[string]uint8
   T  []int64
   // I don't know if golang supports this kind of syntax for two dimensional array
   IV  [][]int64
   DV [][]float64
   SV [][]string
}

Local Disk store

This issue is used for tracking. Need to persist both tag index and data to disk (since I am taking a storage class this quarter)

Data File

Writer #45
Reader #46

Index File

Might just re-build inverted index in memory from data file for now? I don't want to write a search engine (for now)

Legacy

mmap
- golang
- c, c++
- use unsafe to support larger than 4GB https://github.com/boltdb/bolt/blob/master/bolt_unix.go#L48
  - maybe for 64 bit platform, larger than 4GB mmap is supported by default, no need to use unsafe, need to try it out
magic number, used to identify file without relying on file extension 73c8c9a
- why the magic number use 9 byte when using binary.PutUvarint while 1 just use 1 and 256 use 2
- https://www.sqlite.org/fileformat.html
- https://blog.golang.org/strings
- https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
encoding (need another issue for that)
- https://techoverflow.net/2013/01/25/efficiently-encoding-variable-length-integers-in-cc/

Gogland refactor messed up the doc

when change index -> indexLength, it update a lot of comment, including the doc for the file format.

search for indexLength should find most (all?) of them
search for indexOffset
search for seriesBlock
cassandra.md, known in cassandra backend .... saw it in error, caused by renaming markdown file for benchmark

Bench

Need to build something that is a small xephon-b

DBs

InfluxDB #13
Xephon-K 9a81c05
- the API allow different series in one batch, but only one series is used for now
KairosDB #16
Prometheus #17
- have to use the push gateway and it seems there is not way to push to prometheus directly using http?
TimeScale https://github.com/timescale/timescaledb
- the data model is a little bit different

Reporter

basic reporter #18

TODO

use simple random for simplicity
- use constant and fixed time interval 71d65ea
support not fixed interval
generate data on the fly, no load and store
#14 collect response time and store in TSDB (might use other TSDB that supports average etc. for this, since we haven't implemented them yet)
serializer
generate specific amount of data
load at a fixed rate
load at fixed time
(optional) load from csv

Docker

mount volume #51
limit resource usage
- https://docs.docker.com/compose/compose-file/#resources
- https://docs.docker.com/engine/reference/run/#user-memory-constraints

Backoff

Dataset

TimeScale https://docs.timescale.com/other-sample-datasets
Kaggle http://www.scylladb.com/2017/05/31/streaming-near-real-time-messaging-kafka-scylla/
https://github.com/eliangcs/pystock-data
https://www.hdfgroup.org/our-industries/financial-services/
Cloud https://alexpucher.com/blog/2015/06/29/cloud-traces-and-production-workloads-for-your-research/ (not very useful except the google data, but requires using google cloud storage)
the taxi dataset http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
US flight http://stat-computing.org/dataexpo/2009/the-data.html
distributed system simulator suggested by @gaocegege https://github.com/radlab/sparrow/tree/master/simulation

Ref

TSDBs using Cassandra

TBD

Blogs

http://www.datastax.com/dev/blog/metric-collection-and-storage-with-cassandra
https://misfra.me/2015/07/20/time-series-databases-discussion-notes/ datadog was using C* at that time

Previous work

tsdb-cassandra in @at15 mini-impl
awesome time series database https://github.com/xephonhq/awesome-time-series-database#cassandra

Collector

A tracking issue for collectors

system #21
process #22
container #31

Ref

https://github.com/square/inspect
http://collectl.sourceforge.net/Documentation.html found from apache kudu example https://github.com/cloudera/kudu-examples/tree/master/java/collectl

Basic reporter

Related #12 #14

total request body size (might exceed int64?, use uint64?)
total response body size
http response code count
error percentage (in current tests, err is not tirggered yet, though it should be)
- how golang http client handle non 2xx response code, treat it as error?
distribution of response time
- hey store every thing in ram, which is low effficient https://github.com/rakyll/hey/blob/master/requester/print.go#L190
- we can use pre allocated bucket and fill them in, then aggregate them based on slow and fastest
- or use this histogram library https://github.com/VividCortex/gohistogram

xephonhq / xephon-k Goto Github PK

xephon-k's Introduction

Xephon-K

Supported backend

Related projects

About

Authors

xephon-k's People

Contributors

Stargazers

Watchers

Forkers

xephon-k's Issues

general

non-compression encoding

variable length encoding

delta encoding

delta + RLE

general compression

Ref

Data File

Index File

Legacy

TSDBs using Cassandra

Blogs

Previous work

Recommend Projects

Recommend Topics

Recommend Org