voldemort / voldemort Goto Github PK

An open source clone of Amazon's Dynamo.

License: Apache License 2.0

Python 0.97% Shell 0.83% C++ 2.34% Ruby 0.59% Java 94.47% HTML 0.11% CSS 0.01% Thrift 0.01% Batchfile 0.12% Makefile 0.08% M4 0.41% Haml 0.04% NASL 0.02%

voldemort's Introduction

Voldemort is a distributed key-value storage system

N.B.: Voldemort is no longer under development. LinkedIn was the primary maintainer and user of Voldemort, and stopped all production usage in 2018. Most of the Voldemort Read-Only use cases and some of the Read-Write use cases have migrated to Venice, which is actively maintained and also open sourced.

Overview

Data is automatically replicated over multiple servers across multiple datacenters.
Data is automatically partitioned so each server contains only a subset of the total data
Server failure is handled transparently
Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, and Java Serialization
Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system
Each node is independent of other nodes with no central point of failure or coordination
Pluggable storage engines, to cater to different workloads
SSD Optimized Read Write storage engine, with support for multi-tenancy
Built in mechanism to fetch & serve batch computed data from Hadoop
Support for pluggable data placement strategies to support things like distribution across data centers that are geographical far apart.

It is used at LinkedIn by numerous critical services powering a large portion of the site. .

QuickStart

You can refer to http://www.project-voldemort.com for more info

Download Code

cd ~/workspace
git clone https://github.com/voldemort/voldemort.git
cd voldemort
./gradlew clean build -x test

Start Server

# in one terminal
bin/voldemort-server.sh config/single_node_cluster

Use Client Shell

Client shell gives you fast access to the store. We already have a test store defined in the "single_node_cluster", whose key and value are both String.

# in another terminal
cd ~/workspace/voldemort
bin/voldemort-shell.sh test tcp://localhost:6666/

Now you have the the voldemort shell running. You can try these commands in the shell

put "k1" "v1"
put "k2" "v2"
get "k1"
getall "k1" "k2"
delete "k1"
get "k1"

You can find more commands by runninghelp

Want to dig into the detailed implementation or even contribute to Voldemort? A quick git guide for people who want to make contributions to Voldemort.

Comparison to relational databases

Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like ActiveRecord or Hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience. For large applications under internet-type scalability pressure, a system may likely consist of a number of functionally partitioned services or apis, which may manage storage resources across multiple data centers using storage systems which may themselves be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is not available in any single database. A typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these applications Voldemort offers a number of advantages:

Voldemort combines in memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast).
Unlike MySQL replication, both reads and writes scale horizontally
Data partioning is transparent, and allows for cluster expansion without rebalancing all data
Data replication and placement is decided by a simple API to be able to accommodate a wide range of application specific strategies
The storage layer is completely mockable so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster (or even a real storage system) for simple testing

Contribution

The source code is available under the Apache 2.0 license. We are actively looking for contributors so if you have ideas, code, bug reports, or fixes you would like to contribute please do so.

For help please see the discussion group, or the IRC channel chat.us.freenode.net #voldemort. Bugs and feature requests can be filed on Github.

Special Thanks

We would like to thank JetBrains for supporting Voldemort Project by offering open-source license of their IntelliJ IDE to us.

voldemort's People

Contributors

Stargazers

Watchers

Forkers

jkreps bbansal readams mcaserta geir lindner agiamas zgdpj pgoodwin tommay yairwein gnufied dannya-zz willmuldrew jannehietamaki afeinberg up1 maharshi benschmaus ak2consulting edyskant claudiocherubino brunyuriy krummas kooer mrambacher eishay abagri mfr0st pkokati cloudzhou omega1 rakeshkaria howardhome atoulme xaze psemenyuk jcustenborder tarjei macflecknoe danny-gz rsumbaly mavrukin evilsquid888 fwu brmorris gholler nehanarkhede germanviscuso benhardy famousmoose p6 wspringer dannyye gxm leonhong madhub2v suranjan anuragranjan utsengar ravidontharaju jabley imaxxs matthayes marceltoben d5nguyenvan chaitanya79 sathya33 jchoate rdrushal anoopj jtraupman-li medallia jghoman mebigfatguy sbernatsky xiaoyang beobal cyxac leigao haroldl sathyaphoenix jayzalowitz mickeyhs daggerrz littleforker sbogrett ericturcotte avasilye ankurbaidya sarchak maksred mdw211126com jacklu mitultiwari wxfno1 dmidi mattstep sunilmallya vidyasagar1729

voldemort's Issues

voldemort-shell.sh handles lists but not arrays

I'd like a way to see values containing byte arrays.
It would be nice if voldemort-shell.sh could do this for me.

Pull request to follow.

VoldemortConfig validation should be made uniform

http://code.google.com/p/project-voldemort/issues/detail?id=174

Sporadic BDB test failure on hudson

No idea what this is about but it seems to happen in about 1/5 of the runs

voldemort.store.PersistenceFailureException: Shutdown failed.
at voldemort.store.bdb.BdbStorageEngine.close(BdbStorageEngine.java:306)
at voldemort.store.bdb.BdbStorageEngineTest.tearDown(BdbStorageEngineTest.java:73)
Caused by: Environment invalid because of previous exception: com.sleepycat.je.RunRecoveryException: java.lang.InterruptedException
at com.sleepycat.je.txn.LockManager.lock(LockManager.java:245)
at com.sleepycat.je.txn.Txn.lockInternal(Txn.java:425)
at com.sleepycat.je.txn.Locker.lock(Locker.java:360)
at com.sleepycat.je.dbi.CursorImpl.lockLNDeletedAllowed(CursorImpl.java:2516)
at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2438)
at com.sleepycat.je.dbi.CursorImpl.delete(CursorImpl.java:805)
at com.sleepycat.je.Cursor.deleteNoNotify(Cursor.java:1303)
at com.sleepycat.je.Cursor.deleteInternal(Cursor.java:1278)
at com.sleepycat.je.Cursor.delete(Cursor.java:417)
at voldemort.store.bdb.BdbStorageEngine.delete(BdbStorageEngine.java:267)
at voldemort.store.bdb.BdbStorageEngineTest$2.run(BdbStorageEngineTest.java:157)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at com.sleepycat.je.txn.LockManager.lock(LockManager.java:243)

Allow required reads to be specified during get

Let's walk through a scenario. A store with preferred/required reads = 1, required writes = 2 and replication factor = 2. To make it simple, let's say that there are only 2 nodes in the cluster. Client is ok with seeing some temporary inconsistency, but doesn't want to lose writes.

After operating for some time, there's a temporary issue with storage in one of the nodes, so it loses some of the updates. At this stage, reads may return either the latest or an older version of certain keys depending on which node the request goes to.

To write, the client gets a key, updates it and does a put. However, if the client got the key from the node that has stale keys, the put will fail with an ObsoleteVersionException. Fair enough, this is the right behaviour. The question now, is how can the client fix this? The way that seems obvious to me is to do a get with required reads == required writes. This would give it enough information to fix things (repairReads may even be enough to fix the issue).

As far as I can see, Voldemort doesn't provide a way for users to perform a get with a different required reads than the one configured by default though. Is there a better way to achieve what I outlined here or is this something worth adding?

Use protocol buffers API to improve the efficiency of the network protocol

As Rob Adams noted in a thread long ago:

"Also, I think what you want to implement the message size thing is to
call Message.ByteSize() on writing and when reading construct a
CodedInputStream and then use the API on that to limit it to read a
specific number of bytes from the stream.

This is referenced here in the pb docs:
http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming"

I will implement this in a branch.

build script seems broken

After running "ant" in a terminal, the dist folder has these jars:

voldemort-${curr.release}-src.jar
voldemort-${curr.release}.jar
voldemort-contrib-${curr.release}.jar
voldemort-test-${curr.release}.jar

IllegalArgumentException in RoutedStore.getAll()

If preferredReads > 1 but there is only one available node, then the following code in RoutedStore.getAll() will throw an IllegalArgumentException, though it has already passed the checkRequiredReads(availableNodes) which means the request can still succeed if no more nodes fail. I think the fix is just to threshold it at 0.

List extraNodes = Lists.newArrayListWithCapacity(availableNodes.size() - preferredReads);

Here is the exception (from my branch, though so the stack trace may be off):

Exception in thread "handler698896" java.lang.IllegalArgumentException: Illegal Capacity: -1
    at java.util.ArrayList.(ArrayList.java:111)
    at com.google.common.collect.Lists.newArrayListWithCapacity(Lists.java:150)
    at voldemort.store.routed.RoutedStore.getAll(RoutedStore.java:271)
    at voldemort.server.protocol.pb.ProtoBuffRequestHandler.handleGetAll(ProtoBuffRequestHandler.java:85)
    at voldemort.server.protocol.pb.ProtoBuffRequestHandler.handleRequest(ProtoBuffRequestHandler.java:51)
    at voldemort.server.socket.SocketServerSession.run(SocketServerSession.java:39)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
    at java.lang.Thread.run(Thread.java:613)

RoutedStore.put timeout can be exceeded

It is possible in worst case situations to exceed the timeout during a put operation. I created test/unit/voldemort/store/routed/TimeoutTest.java in git://github.com/jtuberville/voldemort.git to illustrate this and have provided a proposed fix in my fork. This has been discussed in the thread http://groups.google.com/group/project-voldemort/browse_thread/thread/a5d5e4bdedad0d87

ArrayIndexOutOfBoundsException in src/java/voldemort/routing/ConsistentRoutingStrategy.java

The following line in file src/java/voldemort/routing/ConsistentRoutingStrategy.java, method routeRequest can cause an ArrayIndexOutOfBoundsException.

   public List routeRequest(byte[] key) {
        ...
        int index = Math.abs(hash.hash(key)) % this.partitionToNode.length;
        ...
    }

JsonReader.java reads int not long

http://code.google.com/p/project-voldemort/issues/detail?id=105

From what I see, currently getNumber returns a Number object and only seems to read int.

0.96 performance for 100M inserts?

Can anyone point me to expected performance on 0.96 for the internal voldemort-performance-tool.sh benchmark (or links to YCSB voldemort stats)?

In particular, I'm interested in inserts and retrievals for >100M records (~16byte key, ~256byte value).

I'm primarily focused on single node performance, and assuming that sharding across more nodes will provide near linear scalability.

So far, the best single-node performance that I've been able to achieve on a server-class 12-core blade machine using only the in-memory cache persistence storage settings (disabled bdb, disabled slop) results in 1M insertions taking 60sec.

-- start 'server.properties' --
node.id=0
max.threads=100
slop.enable=false
enable.readonly.engine=true
enable.bdb.engine=false
socket.enable=true
storage.configs=voldemort.store.memory.CacheStorageConfiguration
-- end --

Given the server blade has 96GB of RAM and everything should be in memory, why would 1M inserts take 60sec?

Any tips or pointers to silly things I might be doing? If you had to insert several terabytes of data into voldemort, what would you suggest (other than waiting a week or two)?

Thanks!

-[dg]

Incorrect coersion of version to short before passing to ClockEntry constructor

In VectorClock.java, line 249, Math.max(v1.getVersion(), v2.getVersion()) is incorrectly coerced from long to short. It results an exception: java.lang.IllegalArgumentException: Version -532 is not in the range (1, 32767) in ClockEntry constructor.

                newClock.versions.add(new ClockEntry(v1.getNodeId(),
                                                     (short) Math.max(v1.getVersion(),
                                                                      v2.getVersion())));

Client thread pool configured incorrectly

The client thread pool currently provides a single thread and it only grows when the queue is full and that queue is large. See the following thread for more information:

http://groups.google.com/group/project-voldemort/tree/browse_frm/thread/e2f3c35ff644a7c7/e97aea84f61cb9f4?rnum=11&_done=%2Fgroup%2Fproject-voldemort%2Fbrowse_frm%2Fthread%2Fe2f3c35ff644a7c7%2F04110e60b37acc90%3F#doc_04110e60b37acc90

Implement RoutedStoreTest.testObsoleteMasterFails

http://code.google.com/p/project-voldemort/issues/detail?id=91

Upgrade to google collections 1.0-rc2

I asked Kevin B about the stability of the google-collections API and he says that no major changes are in the works before 1.0 is released officially in the next 2 months.

It would be great to get this into the next release.

Mechanism to get a timeline consistent change stream out of Voldemort

http://code.google.com/p/project-voldemort/issues/detail?id=2

This is a very useful feature to have with tons of use cases- data backup , feedback loop back into offline processing.

WAR dependencies are a bit of a mess

This could be addressed with Maven (i.e., scoping of dependencies appropriately with test/provided), but right now, at least the junit and servlet JARs are superfluous if not also the jetty JARs, the tomcat Ant task JAR, and probably others.

voldemort binds to all interfaces instead of binding to just the host defined in the cluster.xml file for that server

http://code.google.com/p/project-voldemort/issues/detail?id=232

AbstractSocketPoolTest.startTest calls JUnit assert methods outside main thread

http://code.google.com/p/project-voldemort/issues/detail?id=141

Run C++ tests in Hudson

C++ tests should run in hudson. To do this best way is probably just to add an ant target that runs them and fails if any tests fail. We won't get the pretty xml output but we can live with that for now.

AdminClient is not reading the client properties file

When creating an instance of voldemort.client.protocol.admin.AdminClient it is creating a ClientConfig object to pass it to the SocketStoreClientFactory.

At the time this is done, the new SocketStoreClientFactory is created with the default ClientConfig values.

In my case I had jmx disabled on Voldemort, but I was seeing some JMX properties published. After investigating I saw that the cause was the one described before.

From my point of view this can be causing unexpected problems, as the one I found with JMX.

To me is a low priority bug, but AdminClient should load and use the client configuration properties file.

MetadataStore.put uses the same code for two branches

        } else if(CLUSTER_KEY.equals(keyStr)) {
            innerStore.put(keyStr, newVersioned);
        } else {
            // default put case
            innerStore.put(keyStr, newVersioned);
        }

Either the "else if" should be removed or there's some code missing in either branch.

Not all server side VoldemortExceptions are defined in ErrorCodeMapper.

PersistenceFailureException is not defined in ErrorCodeMapper. It caused the following exception:

Exception in thread "handler393980" java.lang.IllegalArgumentException: No mapping code for class voldemort.store.PersistenceFailureException
    at voldemort.store.ErrorCodeMapper.getCode(ErrorCodeMapper.java:66)
    at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.writeException(VoldemortNativeRequestHandler.java:158)
    at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.handlePut(VoldemortNativeRequestHandler.java:136)
    at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.handleRequest(VoldemortNativeRequestHandler.java:46)
    at voldemort.server.socket.SocketServerSession.run(SocketServerSession.java:39)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

A few other VoldemortException subclasses are also missing from ErrorCodeMapper, e.g. SerializationException, NoSuchCapabilityException, etc.

Move site from svn to git

Create a site repository and move content from google code's SVN.

Create Hudson Accounts for everyone

Anyone who wants one should get an account to be able to set up build jobs for their repositories or branches.

async client library

Correct me if I'm wrong, but there's currently no asynchronous java client library for voldemort. Would be great if we had one..

Provide easy access to users who don't use git

I am not sure if there's a way to expose a git repository as a svn one. Maybe the easiest is to provide tarballs straight from Git. That way someone can just download it immediately, make the changes and provide a patch.

Enable server side routing strategy in Java client

Currently Java client only supports client side routing, we need to enable the server-side routing/

VersionIncrementingStore is not updating the Versioned object after a successful put

http://code.google.com/p/project-voldemort/issues/detail?id=189

Need to verify that this still happens.

Voldemort should have way of limiting the size of keys and values in a store

http://code.google.com/p/project-voldemort/issues/detail?id=237

Create email notification for hudson build failures

Currently the hudson server has no ability to send email. Need to install sendmail or something so we can send build failure emails.

The new voldemort commit list is probably a good place to send them.

JsonTypeDefinition Map type key order inconsistency.

JsonTypeDefintion can be created using

public JsonTypeDefinition(Object type)
public static JsonTypeDefinition fromJson(String typeSig)

in fromJson()[2] For Map type signatures we sort the key-sub entries to have consistent order while serializing/deserializing But in the constructor[1] we are not sorting and saving the type as it is causing serializing/deserializing issues if map has same elements but ordering is not.

Fix: change the constructor to create a new type with sorted entries same as fromJson()

Create contribution guidelines

To start with, this can be a paragraph or two.

Vector clock inconsistency resolver incorrect

Rob Adams found some issues with the Vector clock inconsistency resolver and posted about it in the mailing list (including a test case):

http://groups.google.com/group/project-voldemort/browse_frm/thread/c311b293b85e4a91

Delete the google code site

Once everything has been moved, the google code site can be deleted.

Pack repeated fields in protocol buffers protocol

As Ismael pointed out protocol buffers now supports packed arrays using the [packed = true] option. This appears to be a magical flag similar to OPTIMIZE_FOR=speed. We have several repeated fields and should use this as there seems to be no downside.

http://code.google.com/p/protobuf/source/detail?r=92

I don't think this is a backwards compatible change so we should make it soon before anyone depends on this.

Migrate "fun projects" to github wiki

People should be able to add projects to the list or mark themselves as working on something without having committer capabilities. Hopefully it won't turn into a spam target.

We should also add a page for "easy projects" for people who just want to do something small and useful to get started.

MysqlGrowth uses fi instead of fj

Moving http://code.google.com/p/project-voldemort/issues/detail?id=65 over.

Someone needs to investigate this and close it out, if all is well

Convert site html to point at github site.

Need to change site html to point to github site.

Use varints to improve space usage of serialized VectorClock

If we use varints, we can improve space usage of the serialized VectorClock. It also allows us to remove the limitation of 32k clock entries (well, at least replace it with a 2^31 limitation).

The implementation itself is easy, but we need a migration strategy for all the data out there. Jay, do you think this is worth it and any ideas for the migration strategy?

I implemented the serialization part to show how easy it is:

http://github.com/ijuma/voldemort/commit/f4de671e14829ede778c17fa4d100d5ae0b04cb1

The size of the serialized vector clock from VectorClockTest.testSerialization went from 26 bytes to 18 bytes.

Cleanup parameter names

Some issues mentioned here
http://code.google.com/p/project-voldemort/issues/detail?id=9

But the larger picture should cover the following.

-- Refactor VoldemortConfig, ClientConfig, AdminClientConfig etc.
-- Sanity check default values
-- Rename parameters if needed. Make sure javadoc is very descriptive

Sample cluster configuration interacts poorly with rebalancing

http://code.google.com/p/project-voldemort/issues/detail?id=246

To my understanding , this can still happen. Leaving it open for now.

Add OSGi Metadata to Voldemort jars

Suspicious integer division in RequestCounter.getThroughput

The division in the code below is suspicious (doing integer division and then assigning to a float makes no sense). Also, why bother with a float here? I'd just use double.

public float getThroughput() {
    Accumulator oldv = getValidAccumulator();
    float elapsed = (System.currentTimeMillis() - oldv.startTimeMS) / Time.MS_PER_SECOND;
    if(elapsed > 0f) {
        return oldv.count / elapsed;
    } else {
        return -1f;
    }
}

Support lists, strings, and byte[] of length greater than Short.MAX_VALUE in JsonTypeSerializer

We didn't do varints because we are lazy bastards, but we really need to support values larger than Short.MAX_VALUE without breaking compatibility.

Here is a semi-hacky approach that should work. There are three cases:

if the first two bits are 0x then the number is a 2 byte positive length and can be used directly as a length of the bytes to read.

if the first two bits are 11 then the number value is null.

If the first two bits are 10 then the remainder of the first 4 bytes are the size.

We should have a test of backwards compatibility to ensure we don't break everything.

Convert hudson to run against git.

Convert Hudson to use the git repository.

Move issues from Google Code to this tracker

We should move the issues from Google Code to this tracker. I suggest we just move the ones that are not in progress. For the ones that are likely to be finished within this month (and have already been started), it seems to me that it's easier to just leave them there and close them once it's done.

The enabled storage engines should be derived automatically from the stores.xml file

http://code.google.com/p/project-voldemort/issues/detail?id=166

Implement daemon functionality using Akuma

A quote from here[1]:

"As explained in here, writing a proper daemon requires various function calls that are traditionally only available to native applications. So typically, Java applications rely on external wrappers like Apache commons daemon or Java service wrapper, or even worse, just ignore those conventions and do Runtime.exec.

But this creates an unnecessary inconsistency between how you launch a program in the foreground vs how you launch a program in the background, and hurt the user/administration experience."

[1]
http://weblogs.java.net/blog/kohsuke/archive/2009/01/writing_a_unix.html"

Test issue

Creating a test issue.