Seeking thru a few KB of data on disk is inexpensive. By keeping a sparse index of entries, we can hold byte offsets for many different keys in memory, even if our total data footprint is large.

Flush memtable when it reaches multiple megabytes

strategy

Run a GenServer which holds a Map tracking size per key and an int tracking total memtable size. Update total incrementally. Trigger flush when reaching limit. Clear GenServer on flush

Make sure you size both the key and the value

this ticket creates some debt

For #51 (binary SSTables) , we need to consider this new section of code

integration tests in the existing project

Read the Phoenix testing manual https://hexdocs.pm/phoenix/testing.html

Or you could try to write some integration tests with https://github.com/boydm/phoenix_integration

end to end

Maybe try using finch for http requests in a standalone app https://github.com/keathley/finch

Alternatively you could write this in rust and leverage https://bheisler.github.io/criterion.rs/book/getting_started.html

Finally, There's trusty ol jmeter https://jmeter.apache.org/

unit test benchmarks

Test Zip, Memtable using https://elixirschool.com/en/lessons/libraries/benchee/

faker data

https://github.com/elixirs/faker#usage

Partitioning

Use consistent hashing and vnodes to partition data

Store checksum in commit log

Unexpected failure such as power loss can cause partial writes. Store a checksum of the kv in commit log. Then on replay, we can detect malformed commit log records and discard them

Query SSTables in reverse order of timestamp

HTTP PUT

accept json(string) https://github.com/michalmuskala/jason/blob/64dc894450f2a61f1006ab9e7f0eb43eb0de3e3d/test/encode_test.exs#L22

Write to memtable

Return 204 status

Remove Macro.unescape from value_view.ex

Once we've transitioned to binary storage in SSTables, we can stop unescaping values in the web render layer

#51

HTTP GET

Accept json(just a string! 422 if impossible binary) content types

https://github.com/michalmuskala/jason/blob/64dc894450f2a61f1006ab9e7f0eb43eb0de3e3d/test/encode_test.exs#L22

Reconstruct memtable from commit log

Write idx files as map

See SSTable and Compaction modules

It's currently a list

You can remove the Enum.find calls

Run as a single node

Implement the log-structured merge tree strategy defined in Kleppmann and run it on a single computer.

SSTables

Other Big Stuff

Necessary Small Stuff

#45

SSTable read: Seek to position in file

Use :file.position

https://stackoverflow.com/a/37121804

Append

Append to commit log before writing to memtable
Manage tombstones
Write TSV
Record monotonic time

Replay

Call CommitLog.replay() on app startup
Read every value since the beginning of the log, and push each one into memtable.
Read TSV
Do not update SSTable files
test escaping -- append doesn't bother with it. Does NimbleCSV parsing step on some values, e.g. \n?

Discard on Memtable flush

#53

Background

An amusing article on Commit Log from Knoldus.

write tombstone in memtable.flush

Part of #17

Keep all sparse indices in memory

Load them at app startup

Finish #46 first

CommitLog: use binary file format

For simplicity's sake, you should finish #52 before reworking the file format.

Closing this issue resolves #56.

SS table compression

Compress the blocks referred to by sparse index #46

design

Erlang stdlib has an interface to zlib, which looks like a good starting point. Use gzip method so that we can compare checksums of uncompressed payloads to the footer bytes of compressed payloads (#79 & #80).

tasks

rework dump : store gzipped payload
rework compaction : store gzipped payload
rework query

prereqs

#51

Flush memtable on application startup

This will effectively move any leftover commit.log entries into an SSTable

CommitLog: Use checksum to recover from partially written records

DB can crash while writing a record to commit log

Add checksum field to commit log

Detect and discard records which do not match checksum

Distributed DB

Insight from aphyr: the trouble with timestamps: Cassandra as a distributed system doesn't provide much of a time ordering guarantee.

first: #39
later: #40

Query all SSTables

If you find a :tombstone, stop. Otherwise keep searching backwards thru time until you run out of files.

SSTable compaction

Run periodically: use Jose Valim code instead of :timer.apply_interval
merge N tables by streaming the files side by side. Look at first key in each file, copy the lowest key to the output file, and repeat
do nothing if there is only one table , or no tables

Test with JMeter

http://jmeter.apache.org/

HTTP DELETE

Always return 204

escaped double quotes break in commit.log

problem

curl -X PUT  -d value='no no  "try dbl" \nno\t\t new\n meh'  http://localhost:4000/api/values/3

then restart the app and when commit.log is read, it will crash

[error] Task #PID<0.417.0> started from AugustDb.Supervisor terminating
** (NimbleCSV.ParseError) unexpected escape character " in "3\tno no  \"try dbl\" \\nno
\\t\\t new\\n meh\t-576460747596327185\n"
    (august_db 0.1.0) deps/nimble_csv/lib/nimble_csv.ex:422: CommitLogParser.separator/
5

solution

ditch TSV entirely: #78

Tinker with SSTable rep

You could define your own

Touch commit log on startup

Query memtable

https://elixir-lang.org/getting-started/mix-otp/agent.html#agents

Memtable

Use gb_trees as the internal rep.

#10
#50
#19 ⚠️
#20 ⚠️
~~delete (tombstone)~~ #22
#38
#41

Binary SSTables

Goal

Stop using TSV for SSTable storage. Use a binary format with explicit lengths for keys and values.

Format

value records

Length of key in bytes
Length of value in bytes
Raw key, not escaped
Raw value, not escaped

tombstone records

Length of key in bytes
-1 to indicate tombstone
Raw key, not escaped

Subtasks

#60

Notes

https://erlang.org/doc/man/file.html#type-fd
Use IO.puts https://hexdocs.pm/elixir/IO.html#puts/2
Use IO.puts https://elixir-lang.org/getting-started/io-and-the-file-system.html

Use some elixir config

Fix memtable seek

seek is now broken due to #29

The index is stored as an erlang term-to-binary

write the index as a separate file

ValueController.show must fall back to SSTables.query_all

Fix SSTables index

Needs +1 for the offsets

Leaderless Replication

https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archGossipAbout.html

conflict resolution options

Use Last Write Wins to resolve conflict
capture the "happens before" relationship using version vectors (see Kleppmann pp187-191)

tasks

#106

Close files in Memtable.flush

Use bloom/cuckoo filters to reduce SSTable reads

Goal

Using SSTables, it takes a long time to determine that a certain record does not exist. In the case where there is neither a value nor a tombstone associated with a key, you need to read through all SSTables before you can return a negative result.

You can use a bloom or cuckoo filter to speed up queries for kv pairs which don't exist. These probabilistic data structures allow you to (mostly) determine set membership.

When the set membership test returns false, you can rely on the result. The K/V pair definitely does not exist.

When the set membership test returns true, there's a possibility that it's a false positive -- it may not be in the given table.

when to create them in memory

on memtable flush (see the SO answer below)
on compaction (see https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsTuningBloomFilters.html)

Background

https://stackoverflow.com/a/39331778

https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutReads.html

Candidate libraries

https://github.com/gyson/blex -- posted in elixir forum
https://github.com/mpope9/exor_filter
https://github.com/farhadi/cuckoo_filter most recent commits

Return explicit :tombstone in SSTable.query

Differentiate from :none so that you know to stop searching in other tables

Accept application/octet-stream for PUT, GET

https://github.com/dwyl/phoenix-content-negotiation-tutorial#9-use-the-contentreply5-in-quotescontroller

https://denvaar.github.io/articles/content_negotiation_and_phoenix.html

https://gist.github.com/Terkwood/b29cae052322706b3272748f7fe50974

Trim commit log

See #18

Use :timer.apply_interval
Wait until a large amount of monotonic time expires
Trim old entries

Rough example: Mono time every second

-576460585406903497
-576460584406903735
-576460583406900449
-576460582406743977
-576460581406902914
-576460580406900571
-576460579406903413
-576460578406903531
-576460577406903611
-576460576406902549
-576460575406903716
-576460574406903760
-576460573406923925
-576460572406643840
-576460571406898130
-576460570406895297
-576460569406901522

Write to memtable

https://elixir-lang.org/getting-started/mix-otp/agent.html#agents

Data should be either

{ :value, bin, time }
{ :tombstone, time }

Do not ignore tombstones. We need them to impl deletes

Do not ignore time.

#11

Discard commit log on memtable flush

Child of #18

terkwood / augustdb Goto Github PK

augustdb's Issues

strategy

this ticket creates some debt

integration tests in the existing project

end to end

unit test benchmarks

faker data

Return 204 status

SSTables

Other Big Stuff

Necessary Small Stuff

Append

Replay

Discard on Memtable flush

Background

design

tasks

prereqs

problem

solution

Goal

Format

value records

tombstone records

Subtasks

Notes

conflict resolution options

tasks

Goal

when to create them in memory

Background

Candidate libraries

Trim commit log

Rough example: Mono time every second

Recommend Projects

Recommend Topics

Recommend Org