Comments (5)
Now that we've switched to FiloMemTable, need a fast in-memory off-heap storage that can also be persisted. Some options:
Chronicle-java
https://github.com/xerial/larray
from filodb.
@parekuti here are some guidelines for the write-ahead log implementation for the memtable.
Requirements
- Must be able to save the state of
FiloAppendStore
andFiloMemTable
such that it could be restored if a crash happens - Must be able to save new Filo chunks as appended by
FiloAppendStore
to disk - If
FiloAppendStore
decides to rewrite the most current chunk, this must be handled (instead of appending new chunks, it replaces most recently appended chunks)
If a crash happens, the on disk file must restore all the state of the FiloAppendStore as well as the partSegKeyMap in the FiloMemTable
. However, the thought is that the partSegKeyMap does not need to be preserved on disk because the partition and segment keys for each row could be recovered from the chunks themselves.
At a higher level, we must be able to restore the state of all the active NodeCoordinatorActors. Thus, the active and flushing memtables; for each NodeCoordinatorActor, the dataset, version, and ingestion schema / columns. This needs to be persisted somewhere.
Write-Ahead Log File Format
While the FiloMemTable
already uses binary Filo chunks, we still need some file format for containing the chunks. So this is a proposal for the format.
File Header
The file header consists of the following bytes. The + signifies an offset in hex. Everything is written little endian.
- +0000: 8 bytes: The UTF8 for the string "FiloWAL" followed by 0x00
- +0008: 2 bytes: 0x0001 - signifying a header with column definitions
- +000a: 2 bytes: the little-endian number of columns
- +000c: 2 bytes: the number of bytes of column definitions (NN)
- +000e: NN bytes: The output of
DataColumn.toString
for each column, UTF8-encoded / written using DataWriter.write(string) - +000e + NN: 2 bytes: 0x0002 - signifying a section holding Filo chunks, should have number of chunks corresponding to the number of columns
- +0010 + NN: 4 bytes: number of bytes of first Filo columnar chunk
- +0014 + NN: first Filo columnar chunk
The above pattern repeats for each columnar chunk.
from filodb.
directory structure:
${memtable-wal-dir} / $dataset_$version / $timestamp.wal
Need to store datasets being written somewhere
from filodb.
@parekuti is working on this issue, but for some reason cannot assign this issue to her.
from filodb.
The PR for this has been merged.
from filodb.
Related Issues (20)
- Filo actors unreachable in filodb 0.7 HOT 16
- Filo full scan freeze HOT 3
- IN optimization and controlling task size during multipartition scan HOT 1
- Predicate pushdown is not working when a single table query has multiple conditions on the same column HOT 1
- Ability to merge ranges and create a larger token range to reduce number of tasks
- Errors setting up ingestion: ArrayBuffer HOT 5
- sbt test are failing HOT 4
- Try using Quotient Filters
- Unable to fetch data for a specific partition key when partition key is defined with more than 4 columns. HOT 5
- FiloDB Write format filodb.spark giving errors HOT 2
- FiloDB write format fails for Binary HOT 1
- Dataset creation ERROR DatasetCoordinatorActor: HOT 29
- Configured Filodb Failed HOT 1
- JVM Errors/Java Nullpointer exceptions HOT 5
- Upgrade to Scala 2.12 HOT 19
- Upgrade to SBT 1.x HOT 2
- Using Chaos Mesh to enhance FiloDB's stability
- Google groups links in the README do not work HOT 1
- E2E benchmarking of FiloDB HOT 1
- Unify akka versions used by dependencies
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from filodb.