couchbase / moss Goto Github PK
View Code? Open in Web Editor NEWmoss - a simple, fast, ordered, persistable, key-val storage library for golang
License: Other
moss - a simple, fast, ordered, persistable, key-val storage library for golang
License: Other
In the segment.kvs array, there's 16^H^H 8 bits that are currently unused / RESERVED for the future, out of the 128 bits for each entry... https://github.com/couchbase/moss/blob/master/segment.go#L149
Maybe we can use those 16^H^H 8 bits somehow. One thought is as moss is appending dirty segments to the end of the file, perhaps it might opportunistically sometimes decide to compact some of the kvs arrays of the segment stack into a big kvs array. That is, leave all the old, already persisted buf's as-is/untouched, but the latest "compacted kvs" would use (some of?) the 16 bits to refer to the old buf that each entry points to.
The thinking is instead of performing binary search on the multiple kvs arrays down through the segment stack, there'd be only a single kvs array to binary search.
(Update: 16 -> 8 bits (thanks marty!); also if 8 bits ain't enough, then it's possible that keys and vals at runtime might not be near the max-keyLen and max-valLen limits, so that might be another place to get bits; and another place might be the offset into the buf)
...and then return an appropriate error
Here...
$ go test -v -race ./...
github.com/couchbase/moss.(*collection).updateStats()
/Users/steveyen/go/src/github.com/couchbase/moss/collection.go:427 +0x313
Previous write at 0x00c4204a4000 by goroutine 26:
github.com/couchbase/moss.(*segment).Swap()
/Users/steveyen/go/src/github.com/couchbase/moss/segment.go:274 +0x176
Currently moss store compactions only work for the Basic segment kind since it uses a custom bufferedSectionWriter. Adding an incremental persistence method to the Segment interface will help decouple the compaction logic from the actual implementation of the Segment interface.
This can make moss much more extensible with custom segment types which support unique application needs such as - btrees for faster indexing, compressed segments for smaller file sizes, finite state transducers, bloom filters, etc
That is, the up-front example code should favor an code sample of using OpenStoreCollection() instead of the current caching use-case of NewCollection()
@mschoch I enjoyed your talk on Moss at GopherCon. At the end you pointed out a situation (I couldn't discern exactly when) where a small write lead to alot of copying/write-amplification.
I just wanted to inquire if that issue had been addressed?
On both the regular persistence pathway and the compaction pathway, moss performs writes block-aligned with the starting byte of the to-be-written buffer. But, what about the end of the buffer?
Need to check/review the codepaths to see if moss is writing out block-sized buffers. If not, the filesystem/os might need to perform a wasteful read-modify-write of the last block.
mmap() is terrific, leading to a simple moss implementation.
But, due to the magic of page-faulting in data from disk from mmap(), which is an operation that's invisible to the app, the app thread may suddenly become blocked... and the golang scheduler has no idea it's happening and cannot arrange for the app thread to be used for other needs.
Perhaps various OS API's like madvise() and siblings can help manage this situation in more controlled fashion and reduce page faults.
The current restriction which disallows duplicate keys in a batch is potentially problematic for applications.
The reasons for this restriction are:
Ideas:
Add new collection option (possibly the default) which does the following:
Further, if we introduce these NOOPs, the binary/search code might also have to honor these by looking forward/backward.
Also, some segment persisters could also choose to elide these NOOP operations.
I was trying to use moss as a backend store for Dgraph (https://github.com/dgraph-io/dgraph/tree/try/moss). But faced this issue
panic: runtime error: slice bounds out of range
goroutine 348 [running]:
github.com/couchbase/moss.(*segment).FindStartKeyInclusivePos(0xc4201ee000, 0xc42a1cb7e0, 0x16, 0x16, 0x16)
/home/ashwin/go/src/github.com/couchbase/moss/segment.go:313 +0x19b
github.com/couchbase/moss.(*segmentStack).get(0xc4201cac30, 0xc42a1cb7e0, 0x16, 0x16, 0x1e, 0x0, 0x7f7f00, 0xc42006efc0, 0x1f, 0x2a, ...)
/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:90 +0x26b
github.com/couchbase/moss.(*segmentStack).Get(0xc4201cac30, 0xc42a1cb7e0, 0x16, 0x16, 0xc4201cac00, 0x6, 0x6, 0x6, 0x0, 0x6)
/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:74 +0x75
github.com/couchbase/moss.(*Footer).Get(0xc42006efc0, 0xc42a1cb7e0, 0x16, 0x16, 0x465600, 0x1, 0x6, 0xc424e85910, 0x465182, 0xfc7740)
/home/ashwin/go/src/github.com/couchbase/moss/store_footer.go:426 +0x8a
github.com/couchbase/moss.(*snapshotWrapper).Get(0xc420192fc0, 0xc42a1cb7e0, 0x16, 0x16, 0x0, 0x0, 0xc4200928f0, 0x6, 0x0, 0xc424e85948)
/home/ashwin/go/src/github.com/couchbase/moss/wrap.go:94 +0x62
github.com/couchbase/moss.(*segmentStack).get(0xc42a1f0d20, 0xc42a1cb7e0, 0x16, 0x16, 0xffffffffffffffff, 0x0, 0xc424e85a00, 0x1, 0x1, 0x6, ...)
/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:110 +0xa2
github.com/couchbase/moss.(*segmentStack).Get(0xc42a1f0d20, 0xc42a1cb7e0, 0x16, 0x16, 0x0, 0x0, 0x414ea2, 0xc42a1f0cd0, 0x50, 0x48)
/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:74 +0x75
github.com/dgraph-io/dgraph/posting.(*List).getPostingList(0xc42a1e9e00, 0x0, 0x0)
/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/list.go:190 +0x1ef
github.com/dgraph-io/dgraph/posting.(*List).updateMutationLayer(0xc42a1e9e00, 0xc42a1f0cd0, 0x0)
/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/list.go:263 +0x125
github.com/dgraph-io/dgraph/posting.(*List).addMutation(0xc42a1e9e00, 0xfdec00, 0xc425576e40, 0xc42226c660, 0x5, 0x5, 0xb090c8)
/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/list.go:340 +0xd7
github.com/dgraph-io/dgraph/posting.(*List).AddMutationWithIndex(0xc42a1e9e00, 0xfdec00, 0xc425576e40, 0xc42226c660, 0x0, 0x0)
/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/index.go:171 +0x2da
github.com/dgraph-io/dgraph/worker.runMutations(0x7f62912c2280, 0xc425576e40, 0xc4221e0000, 0x3e8, 0x400, 0x0, 0x0)
/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/mutation.go:50 +0x21a
github.com/dgraph-io/dgraph/worker.(*node).processMutation(0xc42008c000, 0x2, 0x114, 0x0, 0xc420f4c000, 0x92c5, 0xa000, 0x0, 0x0, 0x0, ...)
/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/draft.go:372 +0x13e
github.com/dgraph-io/dgraph/worker.(*node).process(0xc42008c000, 0x2, 0x114, 0x0, 0xc420f4c000, 0x92c5, 0xa000, 0x0, 0x0, 0x0, ...)
/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/draft.go:405 +0x36a
created by github.com/dgraph-io/dgraph/worker.(*node).processApplyCh
/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/draft.go:444 +0x49b
Any help would be appriciated.
Thanks!
Thanks again for Moss!
How production-ready is moss these days? Is it used at couch?
Are there any concerns about data loss?
Thanks!
Persistence is currently asynchronously handled by the background persister goroutine, and if folks want to ensure synchronous persistence, they have to go through a backflip dance steps like hooking up an event callback as seen in various unit tests.
Betcha some folks will just want a simple synchronous persistence API.
This note occurs in the source:
https://github.com/couchbase/moss/blob/master/api.go#L69-L71
See Rob Pike's article for why we shouldn't care:
https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
My recommendation is that we always write moss data little-endian
Occasionally fsync()'ing during compaction (e.g., after ever 16mb written) might have performance impact -- see if an optional parameter for this might help moss?
see http://danluu.com/file-consistency/
...and might also need to fsync the grand-parent dir, if mossStore was the one that actually created the parent dir?
It appears that store.Persist() (and store.Close() ) return before persistence to disk is actually completed.
This creates a race and causes a simple write-then-read test (https://github.com/glycerine/bulletin/blob/master/bulletin_test.go#L10) to fail, unless a print is inserted to slow down the test code.
to reproduce: clone https://github.com/glycerine/bulletin and run go test -v
go test -v
=== RUN Test001BulletinPersistanceToDisk
read/write to persistent disk should work โ
Failures:
* /Users/jaten/go/src/github.com/glycerine/bulletin/bulletin_test.go
Line 39:
Expected: '[]byte{0x61, 0x20, 0x76, 0x61, 0x6c, 0x75, 0x65}'
Actual: '[]byte(nil)'
(Should resemble)!
1 assertion thus far
--- FAIL: Test001BulletinPersistanceToDisk (0.00s)
FAIL
exit status 1
FAIL github.com/glycerine/bulletin 0.014s
commenting in the line https://github.com/glycerine/bulletin/blob/master/bulletin.go#L58 slows down the test, causing it to succeed on my mac book. Hence the code is timing finicky. It should not be. I expected that once Persist() or Close() completed, the data is safely on disk. However, inspection of disk reveals no files written before process shutdown. Hence there is background async processing going on that should be synchronous with the completion of Persiste() and/or Close()
Sometimes, perhaps fsync() can take awhile -- might be good to know how long.
Saw this while doing a code review perusal, and wondering if the following allocates garbage...
pair := []uint64{opKlVl, uint64(keyStart)}
From: https://github.com/couchbase/moss/blob/master/store_compact.go#L534
If so, perhaps a pre-allocated pair can be held as part of the compactWriter and reused across multiple calls to compactWriter.Mutate()
This is probably too big for one ticket/issue, but...
Many systems have multiple I/O devices. And, today's moss only writes to a single directory, so isn't able to leverage multi-device I/O concurrency/bandwidth.
Some of those devices will be faster than others (SSD vs HDD).
And, some might be ephemeral (ex: EC2 ephemeral storage) versus long-lived (ex: EBS).
It might be debatable whether a "sufficiently advanced moss of the future" should be aware of such things, or whether such concerns should instead remain (like today) as just the application's responsibility to shard their moss usage on their own and handle the movement of data amongst storage hierarchies on their own.
Currently, creating snapshot copies the the segment stack, which means memory allocations and copying of pointers. Instead, perhaps creating a snapshot should just bump a ref-count on some existing segmentStack which should be treated as immutable (except for the ref-count).
Whenever a writer (such as ExecuteBatch, the background Merger, etc) wants to modify the segmentStack, it should use copy-on-write.
The existing moss implementation actually does parts of the above anyways, so we might be near to that already.
After an application calls collection.ExecuteBatch(batch), you're supposed to batch.Close() and not reuse it, as mentioned here...
https://godoc.org/github.com/couchbase/moss#Collection
But, it should probably be mentioned more prominently in the README or other doc places.
(See also #14)
Thanks to @mcpaddy for the idea & intro to this term.
(wonder if besides "goodput", there's also "badput" and "uglyput")
I'm using this code to store and retrieve records from moss.
But when I call GetStoredSQSMessages()
it only seems to return the last entry, as opposed to all the entries.
If I run strings data-0000000000000001.moss
I can see all the records I'm expecting, so I know their somewhere in the moss, but I just can't get at them w/ the iterator.
Can you take a look at my GetStoredSQSMessages method and see if I'm doing anything wrong.
If nothing is obvious, should I try repro'ing this in a unit test? I'm storing the moss in a docker volume mount, so it's possible I'm doing something funny (but like I said I can see all the records with strings
, so it seems to be an iterator problem)
In moss, it might have a large segment (many megabytes) that needs to be written or appended to the end of the file.
Is it more efficient to call pwrite() once with the entire segment's large data buffer, where perhaps the OS/filesystem knows how to efficiently split that up into concurrent I/O channel use...
...or, should moss split the big buffer into multiple, smaller, block-aligned sections, where moss can spawn off multiple goroutines to invoke several pwrite()'s concurrently.
Which one is faster / utilizes more I/O bandwidth or channels?
moss defaults to asynchronous batch persistence, but users should be able to ask for synchronous batch persistence by just flipping on an option.
Currently, running an iterator through all the keys can blow the hot working set out of the OS page cache, potentially ejecting the hot, cached entries that the app might need.
On the other hand, perhaps this is the behavior that the app wanted, where iterating through all keys will happen to warm up entries into the OS page cache.
The idea is to add an optional IteratorOption that allows the user to have a little bit more control, as a hint to the iterator to avoid blowing caches if possible.
For example, perhaps the compactor could use such an option, as although it needs to iterate through all the active data, it's copying them to a new file. So old pages from the soon-to-be-deleted old file really don't need to stick around in cache.
Saw this one time only so far (sporadic)...
=== RUN TestOpenStoreCollection
--- FAIL: TestOpenStoreCollection (0.21s)
store_test.go:1035: expected reopen store to work
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x1180dfd]goroutine 6028 [running]:
testing.tRunner.func1(0xc4200709c0)
/usr/local/go/src/testing/testing.go:622 +0x29d
panic(0x11cb760, 0x12f0710)
/usr/local/go/src/runtime/panic.go:489 +0x2cf
_/Users/steveyen/dev/couchbase-server.spock/godeps/src/github.com/couchbase/moss.TestOpenStoreCollection(0xc4200709c0)
/Users/steveyen/dev/couchbase-server.spock/godeps/src/github.com/couchbase/moss/store_test.go:1038 +0xebd
testing.tRunner(0xc4200709c0, 0x120e038)
/usr/local/go/src/testing/testing.go:657 +0x96
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:697 +0x2ca
exit status 2
FAIL _/Users/steveyen/dev/couchbase-server.spock/godeps/src/github.com/couchbase/moss 18.562s
To repro:
func TestCloseUnstartedCollection(t *testing.T) {
m, _ := NewCollection(CollectionOptions{})
m.Close()
}
Results in:
go test -v -run TestCloseUnstartedCollection
=== RUN TestCloseUnstartedCollection
fatal error: all goroutines are asleep - deadlock!
How hard would it be to make Close()
a no-op if Start()
had never been called?
Btw in the README Example I don't see Start()
being called, so that code might deadlock too.
To repro:
Add a call to m.Start()
in the TestOpenStoreCollection
test:
func TestOpenStoreCollection(t *testing.T) {
tmpDir, _ := ioutil.TempDir("", "mossStore")
defer os.RemoveAll(tmpDir)
var mu sync.Mutex
counts := map[EventKind]int{}
eventWaiters := map[EventKind]chan bool{}
co := CollectionOptions{
MergeOperator: &MergeOperatorStringAppend{Sep: ":"},
OnEvent: func(event Event) {
mu.Lock()
counts[event.Kind]++
eventWaiter := eventWaiters[event.Kind]
mu.Unlock()
if eventWaiter != nil {
eventWaiter <- true
}
},
}
store, m, err := OpenStoreCollection(tmpDir, StoreOptions{
CollectionOptions: co,
}, StorePersistOptions{})
if err != nil || m == nil || store == nil {
t.Errorf("expected open empty store collection to work")
}
m.Start() // <--------- add this line
.... etc ...
Will give error:
go test -v -run TestOpenStoreCollection
=== RUN TestOpenStoreCollection
panic: close of closed channel
It seems like a confusing interface to have to call Start()
on non-persistent collections, but you can't call Start()
on persisted collections. Is this intentional, or should calling Start()
work on persisted collections?
The leveled compaction policy decision only looks at the root collection when making its decision in compactMaybe(), so need to doc that up more clearly somewhere.
I noticed that tests fail on non-amd64 architectures. Failure logs are available on Debian's reproducible build project [1]. These failure logs were produced using rev:8ea508f. I've also reproduced this same behavior on rev:61afce4.
--- PASS: TestCollectionStatsClose (0.00s)
=== RUN TestCollectionGet
--- PASS: TestCollectionGet (0.00s)
=== RUN TestMossDGM
--- FAIL: TestMossDGM (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x804931b]
goroutine 1022 [running]:
testing.tRunner.func1(0x1881aa00)
/usr/lib/go-1.8/src/testing/testing.go:622 +0x226
panic(0x81d40c0, 0x82c2248)
/usr/lib/go-1.8/src/runtime/panic.go:489 +0x22a
sync/atomic.LoadUint64(0x186f42bc, 0x0, 0x0)
/usr/lib/go-1.8/src/sync/atomic/asm_386.s:159 +0xb
github.com/couchbase/moss.(*dgmTest).getDGMStats(0x186f4240, 0x23d, 0x0, 0x82c6080)
/build/golang-github-couchbase-moss-0.0~git20170828.0.61afce4/obj-i686-linux-gnu/src/github.com/couchbase/moss/dgm_moss_test.go:469 +0x88
github.com/couchbase/moss.TestMossDGM(0x1881aa00)
/build/golang-github-couchbase-moss-0.0~git20170828.0.61afce4/obj-i686-linux-gnu/src/github.com/couchbase/moss/dgm_moss_test.go:887 +0x2f9
testing.tRunner(0x1881aa00, 0x8204518)
/usr/lib/go-1.8/src/testing/testing.go:657 +0x7e
created by testing.(*T).Run
/usr/lib/go-1.8/src/testing/testing.go:697 +0x242
exit status 2
FAIL github.com/couchbase/moss 0.032s
[1] https://tests.reproducible-builds.org/debian/rb-pkg/buster/i386/golang-github-couchbase-moss.html
Most of the options structs are small, but they must be copied on each invocation, as opposed to simply copying the pointer.
Most applications will be fine creating one set of options and reusing it.
Passing nil, and having the library interpret it as "default" options is common pattern in Go.
using commit 9fdd764, sometimes seeing go test failures
$ go test
--- FAIL: TestStoreCompaction (0.04s)
store_test.go:747: expected only 1 file, got: 2, fileNames; [data-000000000000001d.moss (33017) data-000000000000001e.moss (37113)]
store_test.go:758: expected seq: 29 to equal numBatches: 30
store_test.go:778: expected nextFNameseq to be seq+1
--- FAIL: TestStoreCompactionDeferredSort (0.04s)
store_test.go:747: expected only 1 file, got: 2, fileNames; [data-000000000000001d.moss (33017) data-000000000000001e.moss (37113)]
store_test.go:758: expected seq: 29 to equal numBatches: 30
store_test.go:778: expected nextFNameseq to be seq+1
FAIL
exit status 1
FAIL github.com/couchbase/moss 5.517s
See also the CompactionFilter feature from rocksdb.
Some use cases to consider...
the compaction callback (provided by the app) might want an entry to disappear -- for example, to implement expirations
the compaction callback might want to change the value associated with a key, perhaps to compress or cleanup the data within a particular value
the compaction callback might want to atomically create/update/delete a whole slew of other keys -- for example, if I expire the entry for "user-session:0321", I also want to delete other keys with the same prefix, like "user-session:0321-gestures" and "user-session:0321-stats". This might be an advanced, separate functionality that deserves a different ticket/issue.
...and then an appropriate error should be returned / noted
So, if we're willing to only support Go 1.7+ we should consider using sub-tests.
Here is an example failure you get today:
--- FAIL: TestIteratorSeekTo (0.00s)
iterator_test.go:439: expected done, got: <nil>
iterator_test.go:439: expected done, got: <nil>
iterator_test.go:439: expected done, got: <nil>
iterator_test.go:439: expected done, got: <nil>
iterator_test.go:439: expected done, got: <nil>
iterator_test.go:439: expected done, got: <nil>
iterator_test.go:439: expected done, got: <nil>
iterator_test.go:439: expected done, got: <nil>
But, when I look inside, there are several variations of seekTo being tested, in my case maybe they're all failing, but the point is you can't easily tell which ones are.
I started using the sub-test capability in vellum and it worked pretty nice. Here is a table style test that uses it:
https://github.com/couchbaselabs/vellum/blob/master/builder_test.go#L26-L109
I include a description field (as opposed to just a comment) explaining what this table entry is testing. Then I use that description when running the sub-test, so that if it fails, it prints out the details of which case failed. And of course description need not be a static string either, you could fmt.Sprintf() a parameterized case version.
As for the output, if you run the one I mentioned above with -v you get:
--- PASS: TestCommonPrefixLen (0.00s)
--- PASS: TestCommonPrefixLen/both_slices_nil (0.00s)
--- PASS: TestCommonPrefixLen/slice_a_nil,_slice_b_not (0.00s)
--- PASS: TestCommonPrefixLen/slice_b_nil,_slice_a_not (0.00s)
--- PASS: TestCommonPrefixLen/both_slices_empty (0.00s)
--- PASS: TestCommonPrefixLen/slice_a_empty,_slice_b_not (0.00s)
--- PASS: TestCommonPrefixLen/slice_b_nil,_slice_a_not#01 (0.00s)
--- PASS: TestCommonPrefixLen/slices_a_and_b_the_same (0.00s)
--- PASS: TestCommonPrefixLen/slice_a_substring_of_b (0.00s)
--- PASS: TestCommonPrefixLen/slice_b_substring_of_a (0.00s)
--- PASS: TestCommonPrefixLen/slice_a_starts_with_prefix_of_b (0.00s)
--- PASS: TestCommonPrefixLen/slice_b_starts_with_prefix_of_a (0.00s)
And if I intentionally break on of the, you can see how the error output is more useful:
--- FAIL: TestCommonPrefixLen (0.00s)
--- FAIL: TestCommonPrefixLen/both_slices_nil (0.00s)
builder_test.go:105: wanted: 1, got: 0
Now I can go right to the both_slices_nil case to start debugging.
Also, the cmd-line lets you filter by these name as well, so if I wanted to just rerun a single case that I'm working on I can do:
go test -run "CommonPrefix/both"
--- FAIL: TestCommonPrefixLen (0.00s)
--- FAIL: TestCommonPrefixLen/both_slices_nil (0.00s)
builder_test.go:105: wanted: 1, got: 0
FAIL
exit status 1
FAIL github.com/couchbaselabs/vellum 0.007s
Lots more details in the blog: https://blog.golang.org/subtests
If you're OK, I could start migrating some of them over time.
@hisundar had an idea / discussion on ckecksums...
Every 4K written in either the kvs array or the buf array as part of segment can be checksummed. The checksums can be stored in a new system child collection.
A new ReadOption might allow the user to either ask for optional checksum check (or allow folks to skip checksum checking) as the segment is accessed. We know mmap will pagefault in entire pages (like 4k), so after the page fault, access of 4k to double-check the checksum should be cheap (in RAM, but not necessarily in cacheline).
Hi, from the description it is not obvious to me if moss supports buckets(like bolt)?
There is the
child collections allow multiple related collections to be atomically grouped
which I'm not exactly sure if it is something like buckets or it is just a bunch of selected records manually put together?
Users can currently fake out multiple collections by explicitly adding a short collection name prefix to each of their keys. However, such a trick is suboptimal as it repeats a prefix for every key-val item.
Instead, a proposal to support multiple collections natively in moss would be by introducing a few additional methods to the API, so that we remain backwards compatible for current moss users.
The idea is that the current Collection in moss now logically becomes a "top-most" collection of an optional hierarchy of child collections.
To the Batch interface, the proposal is to add the methods...
NewChildCollectionBatch(childCollectionName string, hints) (Batch, error)
DelChildCollection(childCollectionName string) error
When a batch is executed, the entire hierarchy of a top-level batch and its batches for any child collections will be committed atomically.
Removing a child collection takes precedence over adding more child collection mutations.
To the Snapshot interface, the proposal is to add the methods...
ChildCollectionNames() ([]string, error)
ChildCollectionSnapshot(childCollectionName string) (Snapshot, error)
And, that's it.
The proposed API allows for deeply nested child collections of child collections, but the initial implementation might just return an error if the user tries to have deep nesting.
Perhaps to the design docs, add a paragraph or so on lock acquisition precedence, to help avoid any lock inversion deadlock future bugs... see also 9fdd764
The leveled compaction policy decision only looks at the root collection when making its decision in compactMaybe(). Maybe need to recursively examine the child collections.
One thought is if any collection in the collection hierarchy needs a full compaction, then that's the overriding priority.
See also... #40
Hi,
this project looks very interesting. I have a lot of transaction writes, instead of multiple writes per transaction.
How many transaction writes of small data can moss handle per second on average ssd? For example with BoltDB it was only about 250, so I wonder if this project can perform better or if it is also limited by the file system.
From discussion with @mschoch and @sreekanth-cb, thoughts on changes to Segment interface to make it more generic, and not so specific to an array based implementation...
type Segment interface {
+ // DRAFT / NEW method...
+ Get(key []byte) (operation uint64, val []byte, error)
+
+ // And also remove FindKeyPos().
+
+ // DRAFT - replace FindStartKeyInclusivePos() with
+ // some opaque cursor / handle based approach...
+ type Cursor interface {
+ Current() (operation uint64, key []byte, val []byte, error)
+ Next() error
+ }
+
+ FindStartKeyInclusiveCursor(startKeyInclusive, endKeyExclusive []byte) (Cursor, error)
If a user just wants to lookup a single item in a collection, they have to first create a snapshot, then snapshot.Get(), then snapshot.Close().
One issue is creating a snapshot means memory allocations (of the snapshot instance and taking a copy of the segment stack).
A "onesie" API to just lookup a single item, if it can be implemented efficiently (without undue memory allocations and with having to hold a lock for a long time), more be more convenient for folks to grok.
(See also #14)
See also blevesearch/bleve#553 for some real-world use case.
The Data Greater than Memory (DGM) performance of moss store's Log-Structured Merge Arrays is limited by the speed of the merge operation during compaction.
Currently this compaction is done in a single level which comes from the default setting of 0 as the compaction threshold. This can result in heavy write amplification.
Based on discussion with @steveyen, to mitigate this situation, moss store persistence can follow this simple approach:
maxSmallSegments=3, maxBigSegments=2
Initially, maxSmallSegments=0, maxBigSegments=0
Persistence appends small segments to end of the file...
|-seg0-||-seg1-||-seg2-| (maxSmallSegments=3)
On the next round of persistence, the above 3 segments can be compacted into a new file
|====seg0===| (maxSmallSegments=0, maxBigSegments=1)
Following this further persistence rounds simply append smaller segments
|====seg0====||-seg0-||-seg1-||-seg2-|
Now the next round of persistence, only compacts the small segments making the file look as follows..
|====seg0====||...seg0...||...seg1...||...seg2...||=====seg1=====|
The rationality behind this is that compacting fewer segments would be faster than constantly rewriting the file on every delta.
Later to support efficient persistence to disk, we can adopt a simple size-tiered leveled compaction support in mossStore by splitting the Footer across multiple levels:
data-L0-0000xx.moss: most recent segments.
data-L1-0000xx.moss: segments merged from L0
data-L2-0000xx.moss: segments merged from L1
We can then size tier these on the levels to achieve good tradeoff between space, read and write efficiencies.
if moss can somehow detect for a batch or segment that all the keys are exactly the same length, then one optimization might be to compress the kvs array -- don't need to store the key length
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.