terkwood / augustdb Goto Github PK
View Code? Open in Web Editor NEWKey/value store backed by LSM Tree architecture.
License: MIT License
Key/value store backed by LSM Tree architecture.
License: MIT License
Use consistent hashing and vnodes to partition data
Insight from aphyr: the trouble with timestamps: Cassandra as a distributed system doesn't provide much of a time ordering guarantee.
Always return 204
Differentiate from :none
so that you know to stop searching in other tables
Seeking thru a few KB of data on disk is inexpensive. By keeping a sparse index of entries, we can hold byte offsets for many different keys in memory, even if our total data footprint is large.
Use :file.position
Part of #17
Needs +1 for the offsets
You could define your own
CommitLog.replay()
on app startupappend
doesn't bother with it. Does NimbleCSV parsing step on some values, e.g. \n
?Run a GenServer which holds a Map tracking size per key and an int tracking total memtable size. Update total incrementally. Trigger flush when reaching limit. Clear GenServer on flush
Make sure you size both the key and the value
For #51 (binary SSTables) , we need to consider this new section of code
seek
is now broken due to #29
The index is stored as an erlang term-to-binary
write the index as a separate file
https://elixir-lang.org/getting-started/mix-otp/agent.html#agents
Data should be either
{ :value, bin, time }
{ :tombstone, time }
Do not ignore tombstones. We need them to impl deletes
Do not ignore time.
https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archGossipAbout.html
See #18
-576460585406903497
-576460584406903735
-576460583406900449
-576460582406743977
-576460581406902914
-576460580406900571
-576460579406903413
-576460578406903531
-576460577406903611
-576460576406902549
-576460575406903716
-576460574406903760
-576460573406923925
-576460572406643840
-576460571406898130
-576460570406895297
-576460569406901522
Correct the tabs and newlines
Once we've transitioned to binary storage in SSTables, we can stop unescaping values in the web render layer
DB can crash while writing a record to commit log
Add checksum field to commit log
Detect and discard records which do not match checksum
See SSTable and Compaction modules
It's currently a list
You can remove the Enum.find calls
Unexpected failure such as power loss can cause partial writes. Store a checksum of the kv in commit log. Then on replay, we can detect malformed commit log records and discard them
If you find a :tombstone, stop. Otherwise keep searching backwards thru time until you run out of files.
Stop using TSV for SSTable storage. Use a binary format with explicit lengths for keys and values.
Load them at app startup
Finish #46 first
Accept json(just a string! 422 if impossible binary) content types
Compress the blocks referred to by sparse index #46
Erlang stdlib has an interface to zlib, which looks like a good starting point. Use gzip method so that we can compare checksums of uncompressed payloads to the footer bytes of compressed payloads (#79 & #80).
curl -X PUT -d value='no no "try dbl" \nno\t\t new\n meh' http://localhost:4000/api/values/3
then restart the app and when commit.log is read, it will crash
[error] Task #PID<0.417.0> started from AugustDb.Supervisor terminating
** (NimbleCSV.ParseError) unexpected escape character " in "3\tno no \"try dbl\" \\nno
\\t\\t new\\n meh\t-576460747596327185\n"
(august_db 0.1.0) deps/nimble_csv/lib/nimble_csv.ex:422: CommitLogParser.separator/
5
ditch TSV entirely: #78
Write to memtable
Child of #18
Read the Phoenix testing manual https://hexdocs.pm/phoenix/testing.html
Or you could try to write some integration tests with https://github.com/boydm/phoenix_integration
Maybe try using finch for http requests in a standalone app https://github.com/keathley/finch
Alternatively you could write this in rust and leverage https://bheisler.github.io/criterion.rs/book/getting_started.html
Finally, There's trusty ol jmeter https://jmeter.apache.org/
Test Zip, Memtable using https://elixirschool.com/en/lessons/libraries/benchee/
This will effectively move any leftover commit.log
entries into an SSTable
Using SSTables, it takes a long time to determine that a certain record does not exist. In the case where there is neither a value nor a tombstone associated with a key, you need to read through all SSTables before you can return a negative result.
You can use a bloom or cuckoo filter to speed up queries for kv pairs which don't exist. These probabilistic data structures allow you to (mostly) determine set membership.
When the set membership test returns false
, you can rely on the result. The K/V pair definitely does not exist.
When the set membership test returns true
, there's a possibility that it's a false positive -- it may not be in the given table.
https://stackoverflow.com/a/39331778
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutReads.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.