terkwood / augustdb Goto Github PK
View Code? Open in Web Editor NEWKey/value store backed by LSM Tree architecture.
License: MIT License
Key/value store backed by LSM Tree architecture.
License: MIT License
Seeking thru a few KB of data on disk is inexpensive. By keeping a sparse index of entries, we can hold byte offsets for many different keys in memory, even if our total data footprint is large.
Run a GenServer which holds a Map tracking size per key and an int tracking total memtable size. Update total incrementally. Trigger flush when reaching limit. Clear GenServer on flush
Make sure you size both the key and the value
For #51 (binary SSTables) , we need to consider this new section of code
Read the Phoenix testing manual https://hexdocs.pm/phoenix/testing.html
Or you could try to write some integration tests with https://github.com/boydm/phoenix_integration
Maybe try using finch for http requests in a standalone app https://github.com/keathley/finch
Alternatively you could write this in rust and leverage https://bheisler.github.io/criterion.rs/book/getting_started.html
Finally, There's trusty ol jmeter https://jmeter.apache.org/
Test Zip, Memtable using https://elixirschool.com/en/lessons/libraries/benchee/
Use consistent hashing and vnodes to partition data
Unexpected failure such as power loss can cause partial writes. Store a checksum of the kv in commit log. Then on replay, we can detect malformed commit log records and discard them
Correct the tabs and newlines
Write to memtable
Once we've transitioned to binary storage in SSTables, we can stop unescaping values in the web render layer
Accept json(just a string! 422 if impossible binary) content types
See SSTable and Compaction modules
It's currently a list
You can remove the Enum.find calls
Use :file.position
CommitLog.replay()
on app startupappend
doesn't bother with it. Does NimbleCSV parsing step on some values, e.g. \n
?Part of #17
Load them at app startup
Finish #46 first
Compress the blocks referred to by sparse index #46
Erlang stdlib has an interface to zlib, which looks like a good starting point. Use gzip method so that we can compare checksums of uncompressed payloads to the footer bytes of compressed payloads (#79 & #80).
This will effectively move any leftover commit.log
entries into an SSTable
DB can crash while writing a record to commit log
Add checksum field to commit log
Detect and discard records which do not match checksum
Insight from aphyr: the trouble with timestamps: Cassandra as a distributed system doesn't provide much of a time ordering guarantee.
If you find a :tombstone, stop. Otherwise keep searching backwards thru time until you run out of files.
Always return 204
curl -X PUT -d value='no no "try dbl" \nno\t\t new\n meh' http://localhost:4000/api/values/3
then restart the app and when commit.log is read, it will crash
[error] Task #PID<0.417.0> started from AugustDb.Supervisor terminating
** (NimbleCSV.ParseError) unexpected escape character " in "3\tno no \"try dbl\" \\nno
\\t\\t new\\n meh\t-576460747596327185\n"
(august_db 0.1.0) deps/nimble_csv/lib/nimble_csv.ex:422: CommitLogParser.separator/
5
ditch TSV entirely: #78
You could define your own
Stop using TSV for SSTable storage. Use a binary format with explicit lengths for keys and values.
seek
is now broken due to #29
The index is stored as an erlang term-to-binary
write the index as a separate file
Needs +1 for the offsets
https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archGossipAbout.html
Using SSTables, it takes a long time to determine that a certain record does not exist. In the case where there is neither a value nor a tombstone associated with a key, you need to read through all SSTables before you can return a negative result.
You can use a bloom or cuckoo filter to speed up queries for kv pairs which don't exist. These probabilistic data structures allow you to (mostly) determine set membership.
When the set membership test returns false
, you can rely on the result. The K/V pair definitely does not exist.
When the set membership test returns true
, there's a possibility that it's a false positive -- it may not be in the given table.
https://stackoverflow.com/a/39331778
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutReads.html
Differentiate from :none
so that you know to stop searching in other tables
See #18
-576460585406903497
-576460584406903735
-576460583406900449
-576460582406743977
-576460581406902914
-576460580406900571
-576460579406903413
-576460578406903531
-576460577406903611
-576460576406902549
-576460575406903716
-576460574406903760
-576460573406923925
-576460572406643840
-576460571406898130
-576460570406895297
-576460569406901522
https://elixir-lang.org/getting-started/mix-otp/agent.html#agents
Data should be either
{ :value, bin, time }
{ :tombstone, time }
Do not ignore tombstones. We need them to impl deletes
Do not ignore time.
Child of #18
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.