Partitioning

Use consistent hashing and vnodes to partition data

Distributed DB

Insight from aphyr: the trouble with timestamps: Cassandra as a distributed system doesn't provide much of a time ordering guarantee.

first: #39
later: #40

Return explicit :tombstone in SSTable.query

Differentiate from :none so that you know to stop searching in other tables

Create sparse index for SSTables

Seeking thru a few KB of data on disk is inexpensive. By keeping a sparse index of entries, we can hold byte offsets for many different keys in memory, even if our total data footprint is large.

CommitLog: use binary file format

For simplicity's sake, you should finish #52 before reworking the file format.

Closing this issue resolves #56.

SSTable read: Seek to position in file

Use :file.position

https://stackoverflow.com/a/37121804

Memtable

Use gb_trees as the internal rep.

#10
#50
#19 ⚠️
#20 ⚠️
~~delete (tombstone)~~ #22
#38
#41

SSTable compaction

Run periodically: use Jose Valim code instead of :timer.apply_interval
merge N tables by streaming the files side by side. Look at first key in each file, copy the lowest key to the output file, and repeat
do nothing if there is only one table , or no tables

write tombstone in memtable.flush

Part of #17

Append

Append to commit log before writing to memtable
Manage tombstones
Write TSV
Record monotonic time

Replay

Call CommitLog.replay() on app startup
Read every value since the beginning of the log, and push each one into memtable.
Read TSV
Do not update SSTable files
test escaping -- append doesn't bother with it. Does NimbleCSV parsing step on some values, e.g. \n?

Discard on Memtable flush

#53

Background

An amusing article on Commit Log from Knoldus.

Flush memtable when it reaches multiple megabytes

strategy

Run a GenServer which holds a Map tracking size per key and an int tracking total memtable size. Update total incrementally. Trigger flush when reaching limit. Clear GenServer on flush

Make sure you size both the key and the value

this ticket creates some debt

For #51 (binary SSTables) , we need to consider this new section of code

ValueController.show must fall back to SSTables.query_all

Fix memtable seek

seek is now broken due to #29

The index is stored as an erlang term-to-binary

write the index as a separate file

Write to memtable

https://elixir-lang.org/getting-started/mix-otp/agent.html#agents

Data should be either

{ :value, bin, time }
{ :tombstone, time }

Do not ignore tombstones. We need them to impl deletes

Do not ignore time.

#11

Leaderless Replication

https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archGossipAbout.html

conflict resolution options

Use Last Write Wins to resolve conflict
capture the "happens before" relationship using version vectors (see Kleppmann pp187-191)

tasks

#106

Trim commit log

See #18

Use :timer.apply_interval
Wait until a large amount of monotonic time expires
Trim old entries

Rough example: Mono time every second

-576460585406903497
-576460584406903735
-576460583406900449
-576460582406743977
-576460581406902914
-576460580406900571
-576460579406903413
-576460578406903531
-576460577406903611
-576460576406902549
-576460575406903716
-576460574406903760
-576460573406923925
-576460572406643840
-576460571406898130
-576460570406895297
-576460569406901522

Remove Macro.unescape from value_view.ex

Once we've transitioned to binary storage in SSTables, we can stop unescaping values in the web render layer

#51

CommitLog: Use checksum to recover from partially written records

DB can crash while writing a record to commit log

Add checksum field to commit log

Detect and discard records which do not match checksum

Query SSTables in reverse order of timestamp

Write idx files as map

See SSTable and Compaction modules

It's currently a list

You can remove the Enum.find calls

Store checksum in commit log

Unexpected failure such as power loss can cause partial writes. Store a checksum of the kv in commit log. Then on replay, we can detect malformed commit log records and discard them

Query all SSTables

If you find a :tombstone, stop. Otherwise keep searching backwards thru time until you run out of files.

Goal

Stop using TSV for SSTable storage. Use a binary format with explicit lengths for keys and values.

Format

value records

Length of key in bytes
Length of value in bytes
Raw key, not escaped
Raw value, not escaped

tombstone records

Length of key in bytes
-1 to indicate tombstone
Raw key, not escaped

Subtasks

#60

Notes

https://erlang.org/doc/man/file.html#type-fd
Use IO.puts https://hexdocs.pm/elixir/IO.html#puts/2
Use IO.puts https://elixir-lang.org/getting-started/io-and-the-file-system.html

Keep all sparse indices in memory

Load them at app startup

Finish #46 first

Run as a single node

Implement the log-structured merge tree strategy defined in Kleppmann and run it on a single computer.

SSTables

Other Big Stuff

Necessary Small Stuff

#45

HTTP GET

Accept json(just a string! 422 if impossible binary) content types

https://github.com/michalmuskala/jason/blob/64dc894450f2a61f1006ab9e7f0eb43eb0de3e3d/test/encode_test.exs#L22

Local DB: use monotonic time

https://hexdocs.pm/elixir/System.html#module-time

SS table compression

Compress the blocks referred to by sparse index #46

design

Erlang stdlib has an interface to zlib, which looks like a good starting point. Use gzip method so that we can compare checksums of uncompressed payloads to the footer bytes of compressed payloads (#79 & #80).

tasks

rework dump : store gzipped payload
rework compaction : store gzipped payload
rework query

prereqs

#51

Reconstruct memtable from commit log

Test with JMeter

http://jmeter.apache.org/

escaped double quotes break in commit.log

problem

curl -X PUT  -d value='no no  "try dbl" \nno\t\t new\n meh'  http://localhost:4000/api/values/3

then restart the app and when commit.log is read, it will crash

[error] Task #PID<0.417.0> started from AugustDb.Supervisor terminating
** (NimbleCSV.ParseError) unexpected escape character " in "3\tno no  \"try dbl\" \\nno
\\t\\t new\\n meh\t-576460747596327185\n"
    (august_db 0.1.0) deps/nimble_csv/lib/nimble_csv.ex:422: CommitLogParser.separator/
5

solution

ditch TSV entirely: #78

HTTP PUT

accept json(string) https://github.com/michalmuskala/jason/blob/64dc894450f2a61f1006ab9e7f0eb43eb0de3e3d/test/encode_test.exs#L22

Write to memtable

Return 204 status

SS Table basics

Discard commit log on memtable flush

Child of #18

Query memtable

https://elixir-lang.org/getting-started/mix-otp/agent.html#agents

Accept application/octet-stream for PUT, GET

https://github.com/dwyl/phoenix-content-negotiation-tutorial#9-use-the-contentreply5-in-quotescontroller

https://denvaar.github.io/articles/content_negotiation_and_phoenix.html

https://gist.github.com/Terkwood/b29cae052322706b3272748f7fe50974

Benchmark Planning

integration tests in the existing project

Read the Phoenix testing manual https://hexdocs.pm/phoenix/testing.html

Or you could try to write some integration tests with https://github.com/boydm/phoenix_integration

end to end

Maybe try using finch for http requests in a standalone app https://github.com/keathley/finch

Alternatively you could write this in rust and leverage https://bheisler.github.io/criterion.rs/book/getting_started.html

Finally, There's trusty ol jmeter https://jmeter.apache.org/

unit test benchmarks

Test Zip, Memtable using https://elixirschool.com/en/lessons/libraries/benchee/

faker data

https://github.com/elixirs/faker#usage

Flush memtable on application startup

This will effectively move any leftover commit.log entries into an SSTable

Write deletes into commit.log

Use bloom/cuckoo filters to reduce SSTable reads

Goal

Using SSTables, it takes a long time to determine that a certain record does not exist. In the case where there is neither a value nor a tombstone associated with a key, you need to read through all SSTables before you can return a negative result.

You can use a bloom or cuckoo filter to speed up queries for kv pairs which don't exist. These probabilistic data structures allow you to (mostly) determine set membership.

When the set membership test returns false, you can rely on the result. The K/V pair definitely does not exist.

When the set membership test returns true, there's a possibility that it's a false positive -- it may not be in the given table.

when to create them in memory

on memtable flush (see the SO answer below)
on compaction (see https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsTuningBloomFilters.html)

Background

https://stackoverflow.com/a/39331778

https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutReads.html

Candidate libraries

https://github.com/gyson/blex -- posted in elixir forum
https://github.com/mpope9/exor_filter
https://github.com/farhadi/cuckoo_filter most recent commits

terkwood / augustdb Goto Github PK

augustdb's People

Contributors

Stargazers

Watchers

Forkers

augustdb's Issues

Append

Replay

Discard on Memtable flush

Background

strategy

this ticket creates some debt

conflict resolution options

tasks

Trim commit log

Rough example: Mono time every second

Goal

Format

value records

tombstone records

Subtasks

Notes

SSTables

Other Big Stuff

Necessary Small Stuff

design

tasks

prereqs

problem

solution

Return 204 status

integration tests in the existing project

end to end

unit test benchmarks

faker data

Goal

when to create them in memory

Background

Candidate libraries

Recommend Projects

Recommend Topics

Recommend Org