couchbase / moss Goto Github PK

moss - a simple, fast, ordered, persistable, key-val storage library for golang

License: Other

Go 97.11% Makefile 0.19% Python 2.70%

moss's Issues

partial compaction of index/kvs only.

In the segment.kvs array, there's 16^H^H 8 bits that are currently unused / RESERVED for the future, out of the 128 bits for each entry... https://github.com/couchbase/moss/blob/master/segment.go#L149

Maybe we can use those 16^H^H 8 bits somehow. One thought is as moss is appending dirty segments to the end of the file, perhaps it might opportunistically sometimes decide to compact some of the kvs arrays of the segment stack into a big kvs array. That is, leave all the old, already persisted buf's as-is/untouched, but the latest "compacted kvs" would use (some of?) the 16 bits to refer to the old buf that each entry points to.

The thinking is instead of performing binary search on the multiple kvs arrays down through the segment stack, there'd be only a single kvs array to binary search.

(Update: 16 -> 8 bits (thanks marty!); also if 8 bits ain't enough, then it's possible that keys and vals at runtime might not be near the max-keyLen and max-valLen limits, so that might be another place to get bits; and another place might be the offset into the buf)

moss Batch should check for keys and/or vals that are too large

...and then return an appropriate error

stats update failing data race detection

Here...

$ go test -v -race ./...

  github.com/couchbase/moss.(*collection).updateStats()
      /Users/steveyen/go/src/github.com/couchbase/moss/collection.go:427 +0x313

Previous write at 0x00c4204a4000 by goroutine 26:
  github.com/couchbase/moss.(*segment).Swap()
      /Users/steveyen/go/src/github.com/couchbase/moss/segment.go:274 +0x176

Add IncrementalPersistence method to Segment interface & decouple from compaction

Currently moss store compactions only work for the Basic segment kind since it uses a custom bufferedSectionWriter. Adding an incremental persistence method to the Segment interface will help decouple the compaction logic from the actual implementation of the Segment interface.
This can make moss much more extensible with custom segment types which support unique application needs such as - btrees for faster indexing, compressed segments for smaller file sizes, finite state transducers, bloom filters, etc

cleanup README to favor moss store use cases

That is, the up-front example code should favor an code sample of using OpenStoreCollection() instead of the current caching use-case of NewCollection()

Q on write amplification

@mschoch I enjoyed your talk on Moss at GopherCon. At the end you pointed out a situation (I couldn't discern exactly when) where a small write lead to alot of copying/write-amplification.

I just wanted to inquire if that issue had been addressed?

optimize: files writes should be block-sized

On both the regular persistence pathway and the compaction pathway, moss performs writes block-aligned with the starting byte of the to-be-written buffer. But, what about the end of the buffer?
Need to check/review the codepaths to see if moss is writing out block-sized buffers. If not, the filesystem/os might need to perform a wasteful read-modify-write of the last block.

research madvise() syscalls to avoid mmap() stalls

mmap() is terrific, leading to a simple moss implementation.

But, due to the magic of page-faulting in data from disk from mmap(), which is an operation that's invisible to the app, the app thread may suddenly become blocked... and the golang scheduler has no idea it's happening and cannot arrange for the app thread to be used for other needs.

Perhaps various OS API's like madvise() and siblings can help manage this situation in more controlled fashion and reduce page faults.

consider allowing duplicate keys in a batch

The current restriction which disallows duplicate keys in a batch is potentially problematic for applications.

The reasons for this restriction are:

segments are sorted, and the sort is not stable, thus notion of "initial insert order" is lost
subsequently binary search may land on any of the duplicates, and yield undefined behavior

Ideas:

Add new collection option (possibly the default) which does the following:

use a stable sort
additional step to scan/find duplicates
-- choices here are to either remove (possibly needing to reslice large data)
-- or to introduce a new NOOP operation, which hides these duplicates

Further, if we introduce these NOOPs, the binary/search code might also have to honor these by looking forward/backward.

Also, some segment persisters could also choose to elide these NOOP operations.

Panic on Get

I was trying to use moss as a backend store for Dgraph (https://github.com/dgraph-io/dgraph/tree/try/moss). But faced this issue

panic: runtime error: slice bounds out of range

goroutine 348 [running]:
github.com/couchbase/moss.(*segment).FindStartKeyInclusivePos(0xc4201ee000, 0xc42a1cb7e0, 0x16, 0x16, 0x16)
	/home/ashwin/go/src/github.com/couchbase/moss/segment.go:313 +0x19b
github.com/couchbase/moss.(*segmentStack).get(0xc4201cac30, 0xc42a1cb7e0, 0x16, 0x16, 0x1e, 0x0, 0x7f7f00, 0xc42006efc0, 0x1f, 0x2a, ...)
	/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:90 +0x26b
github.com/couchbase/moss.(*segmentStack).Get(0xc4201cac30, 0xc42a1cb7e0, 0x16, 0x16, 0xc4201cac00, 0x6, 0x6, 0x6, 0x0, 0x6)
	/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:74 +0x75
github.com/couchbase/moss.(*Footer).Get(0xc42006efc0, 0xc42a1cb7e0, 0x16, 0x16, 0x465600, 0x1, 0x6, 0xc424e85910, 0x465182, 0xfc7740)
	/home/ashwin/go/src/github.com/couchbase/moss/store_footer.go:426 +0x8a
github.com/couchbase/moss.(*snapshotWrapper).Get(0xc420192fc0, 0xc42a1cb7e0, 0x16, 0x16, 0x0, 0x0, 0xc4200928f0, 0x6, 0x0, 0xc424e85948)
	/home/ashwin/go/src/github.com/couchbase/moss/wrap.go:94 +0x62
github.com/couchbase/moss.(*segmentStack).get(0xc42a1f0d20, 0xc42a1cb7e0, 0x16, 0x16, 0xffffffffffffffff, 0x0, 0xc424e85a00, 0x1, 0x1, 0x6, ...)
	/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:110 +0xa2
github.com/couchbase/moss.(*segmentStack).Get(0xc42a1f0d20, 0xc42a1cb7e0, 0x16, 0x16, 0x0, 0x0, 0x414ea2, 0xc42a1f0cd0, 0x50, 0x48)
	/home/ashwin/go/src/github.com/couchbase/moss/segment_stack.go:74 +0x75
github.com/dgraph-io/dgraph/posting.(*List).getPostingList(0xc42a1e9e00, 0x0, 0x0)
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/list.go:190 +0x1ef
github.com/dgraph-io/dgraph/posting.(*List).updateMutationLayer(0xc42a1e9e00, 0xc42a1f0cd0, 0x0)
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/list.go:263 +0x125
github.com/dgraph-io/dgraph/posting.(*List).addMutation(0xc42a1e9e00, 0xfdec00, 0xc425576e40, 0xc42226c660, 0x5, 0x5, 0xb090c8)
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/list.go:340 +0xd7
github.com/dgraph-io/dgraph/posting.(*List).AddMutationWithIndex(0xc42a1e9e00, 0xfdec00, 0xc425576e40, 0xc42226c660, 0x0, 0x0)
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/posting/index.go:171 +0x2da
github.com/dgraph-io/dgraph/worker.runMutations(0x7f62912c2280, 0xc425576e40, 0xc4221e0000, 0x3e8, 0x400, 0x0, 0x0)
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/mutation.go:50 +0x21a
github.com/dgraph-io/dgraph/worker.(*node).processMutation(0xc42008c000, 0x2, 0x114, 0x0, 0xc420f4c000, 0x92c5, 0xa000, 0x0, 0x0, 0x0, ...)
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/draft.go:372 +0x13e
github.com/dgraph-io/dgraph/worker.(*node).process(0xc42008c000, 0x2, 0x114, 0x0, 0xc420f4c000, 0x92c5, 0xa000, 0x0, 0x0, 0x0, ...)
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/draft.go:405 +0x36a
created by github.com/dgraph-io/dgraph/worker.(*node).processApplyCh
	/home/ashwin/go/src/github.com/dgraph-io/dgraph/worker/draft.go:444 +0x49b

Any help would be appriciated.
Thanks!

status of moss / production ready?

Thanks again for Moss!

How production-ready is moss these days? Is it used at couch?

Are there any concerns about data loss?

Thanks!

feature: add support for reverse iteration

add API for synchronous persistence

Persistence is currently asynchronously handled by the background persister goroutine, and if folks want to ensure synchronous persistence, they have to go through a backflip dance steps like hooking up an event callback as seen in various unit tests.

Betcha some folks will just want a simple synchronous persistence API.

new child collections API need fuzz / smat testing

resolve endianness issue

This note occurs in the source:

https://github.com/couchbase/moss/blob/master/api.go#L69-L71

See Rob Pike's article for why we shouldn't care:

https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html

My recommendation is that we always write moss data little-endian

occasionally sync during compaction

Occasionally fsync()'ing during compaction (e.g., after ever 16mb written) might have performance impact -- see if an optional parameter for this might help moss?

call fsync on parent dir on file first creation

see http://danluu.com/file-consistency/

...and might also need to fsync the grand-parent dir, if mossStore was the one that actually created the parent dir?

use golang Context idiom

race to complete data write

It appears that store.Persist() (and store.Close() ) return before persistence to disk is actually completed.

This creates a race and causes a simple write-then-read test (https://github.com/glycerine/bulletin/blob/master/bulletin_test.go#L10) to fail, unless a print is inserted to slow down the test code.

to reproduce: clone https://github.com/glycerine/bulletin and run go test -v


go test -v 
=== RUN   Test001BulletinPersistanceToDisk

  read/write to persistent disk should work ✘


Failures:

  * /Users/jaten/go/src/github.com/glycerine/bulletin/bulletin_test.go
  Line 39:
  Expected: '[]byte{0x61, 0x20, 0x76, 0x61, 0x6c, 0x75, 0x65}'
  Actual:   '[]byte(nil)'
  (Should resemble)!


1 assertion thus far

--- FAIL: Test001BulletinPersistanceToDisk (0.00s)
FAIL
exit status 1
FAIL    github.com/glycerine/bulletin   0.014s

commenting in the line https://github.com/glycerine/bulletin/blob/master/bulletin.go#L58 slows down the test, causing it to succeed on my mac book. Hence the code is timing finicky. It should not be. I expected that once Persist() or Close() completed, the data is safely on disk. However, inspection of disk reveals no files written before process shutdown. Hence there is background async processing going on that should be synchronous with the completion of Persiste() and/or Close()

add times taken to fsync() to the histograms

Sometimes, perhaps fsync() can take awhile -- might be good to know how long.

potential garbage of kvs pair during compactWriter.Mutate()

Saw this while doing a code review perusal, and wondering if the following allocates garbage...

pair := []uint64{opKlVl, uint64(keyStart)}

From: https://github.com/couchbase/moss/blob/master/store_compact.go#L534

If so, perhaps a pre-allocated pair can be held as part of the compactWriter and reused across multiple calls to compactWriter.Mutate()

multi-directory, storage hierarchy, ephemeral awareness

This is probably too big for one ticket/issue, but...

Many systems have multiple I/O devices. And, today's moss only writes to a single directory, so isn't able to leverage multi-device I/O concurrency/bandwidth.

Some of those devices will be faster than others (SSD vs HDD).

And, some might be ephemeral (ex: EC2 ephemeral storage) versus long-lived (ex: EBS).

It might be debatable whether a "sufficiently advanced moss of the future" should be aware of such things, or whether such concerns should instead remain (like today) as just the application's responsibility to shard their moss usage on their own and handle the movement of data amongst storage hierarchies on their own.

optimization idea: copy-on-write, ref-counted segmentStack's

Currently, creating snapshot copies the the segment stack, which means memory allocations and copying of pointers. Instead, perhaps creating a snapshot should just bump a ref-count on some existing segmentStack which should be treated as immutable (except for the ref-count).

Whenever a writer (such as ExecuteBatch, the background Merger, etc) wants to modify the segmentStack, it should use copy-on-write.

The existing moss implementation actually does parts of the above anyways, so we might be near to that already.

document the non-reusability of Batch instances better

After an application calls collection.ExecuteBatch(batch), you're supposed to batch.Close() and not reuse it, as mentioned here...

https://godoc.org/github.com/couchbase/moss#Collection

But, it should probably be mentioned more prominently in the README or other doc places.

(See also #14)

Add goodput stats to moss

Thanks to @mcpaddy for the idea & intro to this term.

(wonder if besides "goodput", there's also "badput" and "uglyput")

Iterator only returning one record

I'm using this code to store and retrieve records from moss.

But when I call GetStoredSQSMessages() it only seems to return the last entry, as opposed to all the entries.

If I run strings data-0000000000000001.moss I can see all the records I'm expecting, so I know their somewhere in the moss, but I just can't get at them w/ the iterator.

Can you take a look at my GetStoredSQSMessages method and see if I'm doing anything wrong.

If nothing is obvious, should I try repro'ing this in a unit test? I'm storing the moss in a docker volume mount, so it's possible I'm doing something funny (but like I said I can see all the records with strings, so it seems to be an iterator problem)

is a large pwrite() more efficient than many concurrent smaller pwrite()'s?

In moss, it might have a large segment (many megabytes) that needs to be written or appended to the end of the file.

Is it more efficient to call pwrite() once with the entire segment's large data buffer, where perhaps the OS/filesystem knows how to efficiently split that up into concurrent I/O channel use...

...or, should moss split the big buffer into multiple, smaller, block-aligned sections, where moss can spawn off multiple goroutines to invoke several pwrite()'s concurrently.

Which one is faster / utilizes more I/O bandwidth or channels?

add an optional durability level to WriteOptions

moss defaults to asynchronous batch persistence, but users should be able to ask for synchronous batch persistence by just flipping on an option.

add IteratorOption to hint on how much to favor cache stability (whether it's ok to blow the cache)

Currently, running an iterator through all the keys can blow the hot working set out of the OS page cache, potentially ejecting the hot, cached entries that the app might need.

On the other hand, perhaps this is the behavior that the app wanted, where iterating through all keys will happen to warm up entries into the OS page cache.

The idea is to add an optional IteratorOption that allows the user to have a little bit more control, as a hint to the iterator to avoid blowing caches if possible.

For example, perhaps the compactor could use such an option, as although it needs to iterate through all the active data, it's copying them to a new file. So old pages from the soon-to-be-deleted old file really don't need to stick around in cache.

TestOpenStoreCollection fail, store_test.go:1035

Saw this one time only so far (sporadic)...

=== RUN TestOpenStoreCollection
--- FAIL: TestOpenStoreCollection (0.21s)
store_test.go:1035: expected reopen store to work
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x1180dfd]

goroutine 6028 [running]:
testing.tRunner.func1(0xc4200709c0)
/usr/local/go/src/testing/testing.go:622 +0x29d
panic(0x11cb760, 0x12f0710)
/usr/local/go/src/runtime/panic.go:489 +0x2cf
_/Users/steveyen/dev/couchbase-server.spock/godeps/src/github.com/couchbase/moss.TestOpenStoreCollection(0xc4200709c0)
/Users/steveyen/dev/couchbase-server.spock/godeps/src/github.com/couchbase/moss/store_test.go:1038 +0xebd
testing.tRunner(0xc4200709c0, 0x120e038)
/usr/local/go/src/testing/testing.go:657 +0x96
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:697 +0x2ca
exit status 2
FAIL _/Users/steveyen/dev/couchbase-server.spock/godeps/src/github.com/couchbase/moss 18.562s

Trying to close an unstarted collection results in deadlock

To repro:

func TestCloseUnstartedCollection(t *testing.T) {

	m, _ := NewCollection(CollectionOptions{})
	m.Close()

}

Results in:

 go test -v -run TestCloseUnstartedCollection
=== RUN   TestCloseUnstartedCollection
fatal error: all goroutines are asleep - deadlock!

How hard would it be to make Close() a no-op if Start() had never been called?

Btw in the README Example I don't see Start() being called, so that code might deadlock too.

Calling Start() on a StoreCollection gives "panic: close of a closed channel"

To repro:

Add a call to m.Start() in the TestOpenStoreCollection test:

func TestOpenStoreCollection(t *testing.T) {
	tmpDir, _ := ioutil.TempDir("", "mossStore")
	defer os.RemoveAll(tmpDir)

	var mu sync.Mutex
	counts := map[EventKind]int{}
	eventWaiters := map[EventKind]chan bool{}

	co := CollectionOptions{
		MergeOperator: &MergeOperatorStringAppend{Sep: ":"},
		OnEvent: func(event Event) {
			mu.Lock()
			counts[event.Kind]++
			eventWaiter := eventWaiters[event.Kind]
			mu.Unlock()
			if eventWaiter != nil {
				eventWaiter <- true
			}
		},
	}

	store, m, err := OpenStoreCollection(tmpDir, StoreOptions{
		CollectionOptions: co,
	}, StorePersistOptions{})
	if err != nil || m == nil || store == nil {
		t.Errorf("expected open empty store collection to work")
	}
	m.Start()   // <--------- add this line

        .... etc ...

Will give error:

 go test -v -run TestOpenStoreCollection
=== RUN   TestOpenStoreCollection
panic: close of closed channel

It seems like a confusing interface to have to call Start() on non-persistent collections, but you can't call Start() on persisted collections. Is this intentional, or should calling Start() work on persisted collections?

document that leveled compaction only looks at top/root collection

The leveled compaction policy decision only looks at the root collection when making its decision in compactMaybe(), so need to doc that up more clearly somewhere.

Build failures on non-amd64 architectures

I noticed that tests fail on non-amd64 architectures. Failure logs are available on Debian's reproducible build project [1]. These failure logs were produced using rev:8ea508f. I've also reproduced this same behavior on rev:61afce4.

--- PASS: TestCollectionStatsClose (0.00s)
=== RUN   TestCollectionGet
--- PASS: TestCollectionGet (0.00s)
=== RUN   TestMossDGM
--- FAIL: TestMossDGM (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x804931b]

goroutine 1022 [running]:
testing.tRunner.func1(0x1881aa00)
        /usr/lib/go-1.8/src/testing/testing.go:622 +0x226
panic(0x81d40c0, 0x82c2248)
        /usr/lib/go-1.8/src/runtime/panic.go:489 +0x22a
sync/atomic.LoadUint64(0x186f42bc, 0x0, 0x0)
        /usr/lib/go-1.8/src/sync/atomic/asm_386.s:159 +0xb
github.com/couchbase/moss.(*dgmTest).getDGMStats(0x186f4240, 0x23d, 0x0, 0x82c6080)
        /build/golang-github-couchbase-moss-0.0~git20170828.0.61afce4/obj-i686-linux-gnu/src/github.com/couchbase/moss/dgm_moss_test.go:469 +0x88
github.com/couchbase/moss.TestMossDGM(0x1881aa00)
        /build/golang-github-couchbase-moss-0.0~git20170828.0.61afce4/obj-i686-linux-gnu/src/github.com/couchbase/moss/dgm_moss_test.go:887 +0x2f9
testing.tRunner(0x1881aa00, 0x8204518)
        /usr/lib/go-1.8/src/testing/testing.go:657 +0x7e
created by testing.(*T).Run
        /usr/lib/go-1.8/src/testing/testing.go:697 +0x242
exit status 2
FAIL    github.com/couchbase/moss       0.032s

[1] https://tests.reproducible-builds.org/debian/rb-pkg/buster/i386/golang-github-couchbase-moss.html

consider making all options arguments pointer and support nil as default

Most of the options structs are small, but they must be copied on each invocation, as opposed to simply copying the pointer.

Most applications will be fine creating one set of options and reusing it.

Passing nil, and having the library interpret it as "default" options is common pattern in Go.

TestStoreCompaction, TestStoreCompactionDeferredSort sometimes fails

using commit 9fdd764, sometimes seeing go test failures

$ go test
--- FAIL: TestStoreCompaction (0.04s)
store_test.go:747: expected only 1 file, got: 2, fileNames; [data-000000000000001d.moss (33017) data-000000000000001e.moss (37113)]
store_test.go:758: expected seq: 29 to equal numBatches: 30
store_test.go:778: expected nextFNameseq to be seq+1
--- FAIL: TestStoreCompactionDeferredSort (0.04s)
store_test.go:747: expected only 1 file, got: 2, fileNames; [data-000000000000001d.moss (33017) data-000000000000001e.moss (37113)]
store_test.go:758: expected seq: 29 to equal numBatches: 30
store_test.go:778: expected nextFNameseq to be seq+1
FAIL
exit status 1
FAIL github.com/couchbase/moss 5.517s

add optional CompactionCallback

See also the CompactionFilter feature from rocksdb.

Some use cases to consider...

the compaction callback (provided by the app) might want an entry to disappear -- for example, to implement expirations
the compaction callback might want to change the value associated with a key, perhaps to compress or cleanup the data within a particular value
the compaction callback might want to atomically create/update/delete a whole slew of other keys -- for example, if I expire the entry for "user-session:0321", I also want to delete other keys with the same prefix, like "user-session:0321-gestures" and "user-session:0321-stats". This might be an advanced, separate functionality that deserves a different ticket/issue.

moss Merge Operator result value should be checked for going over val size limits

...and then an appropriate error should be returned / noted

consider using sub-tests?

So, if we're willing to only support Go 1.7+ we should consider using sub-tests.

Here is an example failure you get today:

--- FAIL: TestIteratorSeekTo (0.00s)
	iterator_test.go:439: expected done, got: <nil>
	iterator_test.go:439: expected done, got: <nil>
	iterator_test.go:439: expected done, got: <nil>
	iterator_test.go:439: expected done, got: <nil>
	iterator_test.go:439: expected done, got: <nil>
	iterator_test.go:439: expected done, got: <nil>
	iterator_test.go:439: expected done, got: <nil>
	iterator_test.go:439: expected done, got: <nil>

But, when I look inside, there are several variations of seekTo being tested, in my case maybe they're all failing, but the point is you can't easily tell which ones are.

I started using the sub-test capability in vellum and it worked pretty nice. Here is a table style test that uses it:

https://github.com/couchbaselabs/vellum/blob/master/builder_test.go#L26-L109

I include a description field (as opposed to just a comment) explaining what this table entry is testing. Then I use that description when running the sub-test, so that if it fails, it prints out the details of which case failed. And of course description need not be a static string either, you could fmt.Sprintf() a parameterized case version.

As for the output, if you run the one I mentioned above with -v you get:

--- PASS: TestCommonPrefixLen (0.00s)
    --- PASS: TestCommonPrefixLen/both_slices_nil (0.00s)
    --- PASS: TestCommonPrefixLen/slice_a_nil,_slice_b_not (0.00s)
    --- PASS: TestCommonPrefixLen/slice_b_nil,_slice_a_not (0.00s)
    --- PASS: TestCommonPrefixLen/both_slices_empty (0.00s)
    --- PASS: TestCommonPrefixLen/slice_a_empty,_slice_b_not (0.00s)
    --- PASS: TestCommonPrefixLen/slice_b_nil,_slice_a_not#01 (0.00s)
    --- PASS: TestCommonPrefixLen/slices_a_and_b_the_same (0.00s)
    --- PASS: TestCommonPrefixLen/slice_a_substring_of_b (0.00s)
    --- PASS: TestCommonPrefixLen/slice_b_substring_of_a (0.00s)
    --- PASS: TestCommonPrefixLen/slice_a_starts_with_prefix_of_b (0.00s)
    --- PASS: TestCommonPrefixLen/slice_b_starts_with_prefix_of_a (0.00s)

And if I intentionally break on of the, you can see how the error output is more useful:

--- FAIL: TestCommonPrefixLen (0.00s)
    --- FAIL: TestCommonPrefixLen/both_slices_nil (0.00s)
    	builder_test.go:105: wanted: 1, got: 0

Now I can go right to the both_slices_nil case to start debugging.

Also, the cmd-line lets you filter by these name as well, so if I wanted to just rerun a single case that I'm working on I can do:

go test -run "CommonPrefix/both"
--- FAIL: TestCommonPrefixLen (0.00s)
    --- FAIL: TestCommonPrefixLen/both_slices_nil (0.00s)
    	builder_test.go:105: wanted: 1, got: 0
FAIL
exit status 1
FAIL	github.com/couchbaselabs/vellum	0.007s

Lots more details in the blog: https://blog.golang.org/subtests

If you're OK, I could start migrating some of them over time.

add checksums

@hisundar had an idea / discussion on ckecksums...

Every 4K written in either the kvs array or the buf array as part of segment can be checksummed. The checksums can be stored in a new system child collection.

A new ReadOption might allow the user to either ask for optional checksum check (or allow folks to skip checksum checking) as the segment is accessed. We know mmap will pagefault in entire pages (like 4k), so after the page fault, access of 4k to double-check the checksum should be cheap (in RAM, but not necessarily in cacheline).

Buckets?

Hi, from the description it is not obvious to me if moss supports buckets(like bolt)?

There is the

child collections allow multiple related collections to be atomically grouped

which I'm not exactly sure if it is something like buckets or it is just a bunch of selected records manually put together?

support for multiple collections

Users can currently fake out multiple collections by explicitly adding a short collection name prefix to each of their keys. However, such a trick is suboptimal as it repeats a prefix for every key-val item.

Instead, a proposal to support multiple collections natively in moss would be by introducing a few additional methods to the API, so that we remain backwards compatible for current moss users.

The idea is that the current Collection in moss now logically becomes a "top-most" collection of an optional hierarchy of child collections.

To the Batch interface, the proposal is to add the methods...

NewChildCollectionBatch(childCollectionName string, hints) (Batch, error)

DelChildCollection(childCollectionName string) error

When a batch is executed, the entire hierarchy of a top-level batch and its batches for any child collections will be committed atomically.

Removing a child collection takes precedence over adding more child collection mutations.

To the Snapshot interface, the proposal is to add the methods...

ChildCollectionNames() ([]string, error)

ChildCollectionSnapshot(childCollectionName string) (Snapshot, error)

And, that's it.

The proposed API allows for deeply nested child collections of child collections, but the initial implementation might just return an error if the user tries to have deep nesting.

Add some lock precedence design blurb/docs

Perhaps to the design docs, add a paragraph or so on lock acquisition precedence, to help avoid any lock inversion deadlock future bugs... see also 9fdd764

leveled compaction only considers top/root collection

The leveled compaction policy decision only looks at the root collection when making its decision in compactMaybe(). Maybe need to recursively examine the child collections.

One thought is if any collection in the collection hierarchy needs a full compaction, then that's the overriding priority.

Q: Transaction performance

Hi,
this project looks very interesting. I have a lot of transaction writes, instead of multiple writes per transaction.

How many transaction writes of small data can moss handle per second on average ssd? For example with BoltDB it was only about 250, so I wonder if this project can perform better or if it is also limited by the file system.

rework moss Segment interface to be more generic

From discussion with @mschoch and @sreekanth-cb, thoughts on changes to Segment interface to make it more generic, and not so specific to an array based implementation...

type Segment interface {
+ // DRAFT / NEW method...
+ Get(key []byte) (operation uint64, val []byte, error)
+
+ // And also remove FindKeyPos().
+
+ // DRAFT - replace FindStartKeyInclusivePos() with
+ // some opaque cursor / handle based approach...
+ type Cursor interface {
+         Current() (operation uint64, key []byte, val []byte, error)
+         Next() error
+ }
+
+ FindStartKeyInclusiveCursor(startKeyInclusive, endKeyExclusive []byte) (Cursor, error)

add a non-snapshot Collection.Get(key, ReaderOptions) API

If a user just wants to lookup a single item in a collection, they have to first create a snapshot, then snapshot.Get(), then snapshot.Close().

One issue is creating a snapshot means memory allocations (of the snapshot instance and taking a copy of the segment stack).

A "onesie" API to just lookup a single item, if it can be implemented efficiently (without undue memory allocations and with having to hold a lock for a long time), more be more convenient for folks to grok.

(See also #14)

add compression to mossStore

See also blevesearch/bleve#553 for some real-world use case.

Leveled compaction support for heavy DGM

The Data Greater than Memory (DGM) performance of moss store's Log-Structured Merge Arrays is limited by the speed of the merge operation during compaction.
Currently this compaction is done in a single level which comes from the default setting of 0 as the compaction threshold. This can result in heavy write amplification.
Based on discussion with @steveyen, to mitigate this situation, moss store persistence can follow this simple approach:
maxSmallSegments=3, maxBigSegments=2
Initially, maxSmallSegments=0, maxBigSegments=0
Persistence appends small segments to end of the file...
|-seg0-||-seg1-||-seg2-| (maxSmallSegments=3)
On the next round of persistence, the above 3 segments can be compacted into a new file
|====seg0===| (maxSmallSegments=0, maxBigSegments=1)
Following this further persistence rounds simply append smaller segments
|====seg0====||-seg0-||-seg1-||-seg2-|
Now the next round of persistence, only compacts the small segments making the file look as follows..
|====seg0====||...seg0...||...seg1...||...seg2...||=====seg1=====|
The rationality behind this is that compacting fewer segments would be faster than constantly rewriting the file on every delta.

Later to support efficient persistence to disk, we can adopt a simple size-tiered leveled compaction support in mossStore by splitting the Footer across multiple levels:
data-L0-0000xx.moss: most recent segments.
data-L1-0000xx.moss: segments merged from L0
data-L2-0000xx.moss: segments merged from L1
We can then size tier these on the levels to achieve good tradeoff between space, read and write efficiencies.

optimization - same length keys

if moss can somehow detect for a batch or segment that all the keys are exactly the same length, then one optimization might be to compress the kvs array -- don't need to store the key length

couchbase / moss Goto Github PK

moss's Issues

Recommend Projects

Recommend Topics

Recommend Org