Coder Social home page Coder Social logo

chaindb's Introduction

chaindb

Discord

go key-value store using badgerDB

usage

go get github.com/ChainSafe/chaindb

chaindb's People

Contributors

noot avatar eclesiomelojunior avatar arijitad avatar mvdan avatar dutterbutter avatar edwardmack avatar timwu20 avatar dimartiro avatar

Stargazers

Chris Gianelloni avatar Tianwen avatar rms rob avatar Omer GOKSOY avatar Mauro Delazeri avatar Bruce Hervé avatar Crystal_Alchemist avatar tang avatar heipi avatar Mohammad Shahgolzadeh avatar  avatar Nigel Sheridan-Smith avatar Asad Khan avatar  avatar ForthXu avatar  avatar Jacob Gadikian avatar Qdigital avatar 浮生 avatar  avatar

Watchers

James Cloos avatar  avatar Amer Ameen avatar  avatar Ryan Noble avatar David Ansermino avatar  avatar ChainSafe Systems avatar  avatar

chaindb's Issues

badger's batch writer should use upstream write batches

In a codebase of mine, I saw huge performance wins by directly using https://pkg.go.dev/github.com/dgraph-io/badger/v2?tab=doc#DB.NewWriteBatch, instead of the current mechanism of first storing the operations in a map[string][]byte.

We do use a WriteBatch when flushing via the Write method, but making every operation go through a map first is IMO missing the point of using a write batch in the first place.

For example, it seems like badger's write batching flushes the writes to the database in large chunks. That is, writing a million values in a batch shouldn't mean keeping a million key-values in memory, as it might be flushing them in chunks of hundreds or thousands.

However, our current implementation always keeps the entire batch in memory, so it doesn't work well at all for large numbers of writes. I think that's the main purpose of batching writes in the first place, so I think we should remove the map entirely.

One feature we will lose because of this is the Reset method, which empties the batch. We can't implement this with "pure" write batches, because once a Put/Del has been done, it might already have been flushed to disk. I don't think we should support this kind of "undo" semantics in write batches.

If gossamer really needs that feature, I think you could continue using write batches, but keep the "reset" logic in your own code. That is, only use Database.NewBatch once the entire final set of writes has been computed. It has the same disadvantage about the memory usage as now, but it keeps write batches in this package fast, so it doesn't force the added complexity and slow-down for everyone else.

split implementations to one per package

Right now this module is a single package, which is OK since we only have one implementation with badger.

However, it would be better long-term design to have the interface and generic code/tests at the root package, and one sub-package for each implementation that pulls in heavy dependencies like badger.

If we don't do that, we could easily end up in the case where trying to use badger also forces importing (and thus linking into the binary) other database software.

The first suggestion that comes to mind is chaindb/<dbname>, like chaindb/badger. However, this would be unfortunate as it clashes with the name of the upstream badger package itself, and it's entirely reasonable to want to use both at the same time (e.g. when using our package along with database options from upstream).

Another idea is to use slightly different names, like chaindb/badgerdb. Though that doesn't really avoid the confusion between the two names.

The real difference here is that our package is an implementation, or a wrapper around upstream. Perhaps chaindb/badgerimpl? It's a bit ugly, but I think it's the least confusing.

Another option is to just do chaindb/badger, and then let the importer rename it as they please like import badgerimpl "github.com/ChainSafe/chaindb/badger". Perhaps this is the simplest option.

stop compressing all keys and values with snappy

The idea sounds fine to me, but I think it's wrong to hard-code this into the badger implementation.

Instead, I think we should allow the badger constructor to pass options to upstream. For example, https://pkg.go.dev/github.com/dgraph-io/badger/v2?tab=doc#Options.WithCompression could be used to accomplish the same, and probably in a more performant way, since it's handled by upstream.

In the future, if we're also interested in compression for DBs that don't implement it themselves, we could also add a DB overlay similar to the "with prefix" one we already have. In any case, it should not be coupled with any of the implementations.

clarify API concurrency guarantees

I think the interface should do a better job at clarifying what the semantics of each method are, and what are the concurrency guarantees.

In terms of concurrency, I think we could start with some basics:

  • All methods are thread safe by default. That is, they shouldn't panic when called from many goroutines with -race.
  • Any method that isn't thread safe with any other method should be clearly documented as such. For example, is Close OK to call concurrently with Put?
  • All methods that modify the database are atomic. That is, running two concurrent Put calls can only result in either of them being applied in the end, and not corrupted data, or the original value.
  • While methods like Get and Put are thread safe, data consistency outside of atomic operations is not guaranteed. That is, if Get("foo") is run concurrently with Put("foo", "bar"), one can't predict the result of the Get.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.