Coder Social home page Coder Social logo

voldemort / voldemort Goto Github PK

View Code? Open in Web Editor NEW
2.6K 2.6K 583.0 205.87 MB

An open source clone of Amazon's Dynamo.

Home Page: http://project-voldemort.com

License: Apache License 2.0

Python 0.97% Shell 0.83% C++ 2.34% Ruby 0.59% Java 94.47% HTML 0.11% CSS 0.01% Thrift 0.01% Batchfile 0.12% Makefile 0.08% M4 0.41% Haml 0.04% NASL 0.02%

voldemort's Introduction

Voldemort is a distributed key-value storage system

N.B.: Voldemort is no longer under development. LinkedIn was the primary maintainer and user of Voldemort, and stopped all production usage in 2018. Most of the Voldemort Read-Only use cases and some of the Read-Write use cases have migrated to Venice, which is actively maintained and also open sourced.

Overview

  • Data is automatically replicated over multiple servers across multiple datacenters.
  • Data is automatically partitioned so each server contains only a subset of the total data
  • Server failure is handled transparently
  • Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, and Java Serialization
  • Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system
  • Each node is independent of other nodes with no central point of failure or coordination
  • Pluggable storage engines, to cater to different workloads
  • SSD Optimized Read Write storage engine, with support for multi-tenancy
  • Built in mechanism to fetch & serve batch computed data from Hadoop
  • Support for pluggable data placement strategies to support things like distribution across data centers that are geographical far apart.

It is used at LinkedIn by numerous critical services powering a large portion of the site. .

QuickStart

You can refer to http://www.project-voldemort.com for more info

Download Code

cd ~/workspace
git clone https://github.com/voldemort/voldemort.git
cd voldemort
./gradlew clean build -x test

Start Server

# in one terminal
bin/voldemort-server.sh config/single_node_cluster

Use Client Shell

Client shell gives you fast access to the store. We already have a test store defined in the "single_node_cluster", whose key and value are both String.

# in another terminal
cd ~/workspace/voldemort
bin/voldemort-shell.sh test tcp://localhost:6666/

Now you have the the voldemort shell running. You can try these commands in the shell

put "k1" "v1"
put "k2" "v2"
get "k1"
getall "k1" "k2"
delete "k1"
get "k1"

You can find more commands by runninghelp

Want to dig into the detailed implementation or even contribute to Voldemort? A quick git guide for people who want to make contributions to Voldemort.

Comparison to relational databases

Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table. For applications that can use an O/R mapper like ActiveRecord or Hibernate this will provide horizontal scalability and much higher availability but at great loss of convenience. For large applications under internet-type scalability pressure, a system may likely consist of a number of functionally partitioned services or apis, which may manage storage resources across multiple data centers using storage systems which may themselves be horizontally partitioned. For applications in this space, arbitrary in-database joins are already impossible since all the data is not available in any single database. A typical pattern is to introduce a caching layer which will require hashtable semantics anyway. For these applications Voldemort offers a number of advantages:

  • Voldemort combines in memory caching with the storage system so that a separate caching tier is not required (instead the storage system itself is just fast).
  • Unlike MySQL replication, both reads and writes scale horizontally
  • Data partioning is transparent, and allows for cluster expansion without rebalancing all data
  • Data replication and placement is decided by a simple API to be able to accommodate a wide range of application specific strategies
  • The storage layer is completely mockable so development and unit testing can be done against a throw-away in-memory storage system without needing a real cluster (or even a real storage system) for simple testing

Contribution

The source code is available under the Apache 2.0 license. We are actively looking for contributors so if you have ideas, code, bug reports, or fixes you would like to contribute please do so.

For help please see the discussion group, or the IRC channel chat.us.freenode.net #voldemort. Bugs and feature requests can be filed on Github.

Special Thanks

We would like to thank JetBrains for supporting Voldemort Project by offering open-source license of their IntelliJ IDE to us.

voldemort's People

Contributors

abh1nay avatar afeinberg avatar arunthirupathi avatar baepiff avatar bbansal avatar bhasudha avatar bhsaurabh avatar bitti avatar cowtowncoder avatar ctasada avatar eliast avatar felixgv avatar gaojieliu avatar geir avatar gm10gen avatar gnb avatar ijuma avatar jayjwylie avatar jkreps avatar mebigfatguy avatar pbailis avatar readams avatar rsumbaly avatar sidw avatar singhsiddharth avatar stotch avatar tgockel avatar vinothchandar avatar voldemort avatar zhongjiewu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

voldemort's Issues

Sporadic BDB test failure on hudson

No idea what this is about but it seems to happen in about 1/5 of the runs

voldemort.store.PersistenceFailureException: Shutdown failed.
at voldemort.store.bdb.BdbStorageEngine.close(BdbStorageEngine.java:306)
at voldemort.store.bdb.BdbStorageEngineTest.tearDown(BdbStorageEngineTest.java:73)
Caused by: Environment invalid because of previous exception: com.sleepycat.je.RunRecoveryException: java.lang.InterruptedException
at com.sleepycat.je.txn.LockManager.lock(LockManager.java:245)
at com.sleepycat.je.txn.Txn.lockInternal(Txn.java:425)
at com.sleepycat.je.txn.Locker.lock(Locker.java:360)
at com.sleepycat.je.dbi.CursorImpl.lockLNDeletedAllowed(CursorImpl.java:2516)
at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2438)
at com.sleepycat.je.dbi.CursorImpl.delete(CursorImpl.java:805)
at com.sleepycat.je.Cursor.deleteNoNotify(Cursor.java:1303)
at com.sleepycat.je.Cursor.deleteInternal(Cursor.java:1278)
at com.sleepycat.je.Cursor.delete(Cursor.java:417)
at voldemort.store.bdb.BdbStorageEngine.delete(BdbStorageEngine.java:267)
at voldemort.store.bdb.BdbStorageEngineTest$2.run(BdbStorageEngineTest.java:157)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at com.sleepycat.je.txn.LockManager.lock(LockManager.java:243)

Allow required reads to be specified during get

Let's walk through a scenario. A store with preferred/required reads = 1, required writes = 2 and replication factor = 2. To make it simple, let's say that there are only 2 nodes in the cluster. Client is ok with seeing some temporary inconsistency, but doesn't want to lose writes.

After operating for some time, there's a temporary issue with storage in one of the nodes, so it loses some of the updates. At this stage, reads may return either the latest or an older version of certain keys depending on which node the request goes to.

To write, the client gets a key, updates it and does a put. However, if the client got the key from the node that has stale keys, the put will fail with an ObsoleteVersionException. Fair enough, this is the right behaviour. The question now, is how can the client fix this? The way that seems obvious to me is to do a get with required reads == required writes. This would give it enough information to fix things (repairReads may even be enough to fix the issue).

As far as I can see, Voldemort doesn't provide a way for users to perform a get with a different required reads than the one configured by default though. Is there a better way to achieve what I outlined here or is this something worth adding?

Use protocol buffers API to improve the efficiency of the network protocol

As Rob Adams noted in a thread long ago:

"Also, I think what you want to implement the message size thing is to
call Message.ByteSize() on writing and when reading construct a
CodedInputStream and then use the API on that to limit it to read a
specific number of bytes from the stream.

This is referenced here in the pb docs:
http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming"

I will implement this in a branch.

build script seems broken

After running "ant" in a terminal, the dist folder has these jars:

voldemort-${curr.release}-src.jar
voldemort-${curr.release}.jar
voldemort-contrib-${curr.release}.jar
voldemort-test-${curr.release}.jar

IllegalArgumentException in RoutedStore.getAll()

If preferredReads > 1 but there is only one available node, then the following code in RoutedStore.getAll() will throw an IllegalArgumentException, though it has already passed the checkRequiredReads(availableNodes) which means the request can still succeed if no more nodes fail. I think the fix is just to threshold it at 0.

List extraNodes = Lists.newArrayListWithCapacity(availableNodes.size() - preferredReads);

Here is the exception (from my branch, though so the stack trace may be off):

Exception in thread "handler698896" java.lang.IllegalArgumentException: Illegal Capacity: -1
    at java.util.ArrayList.(ArrayList.java:111)
    at com.google.common.collect.Lists.newArrayListWithCapacity(Lists.java:150)
    at voldemort.store.routed.RoutedStore.getAll(RoutedStore.java:271)
    at voldemort.server.protocol.pb.ProtoBuffRequestHandler.handleGetAll(ProtoBuffRequestHandler.java:85)
    at voldemort.server.protocol.pb.ProtoBuffRequestHandler.handleRequest(ProtoBuffRequestHandler.java:51)
    at voldemort.server.socket.SocketServerSession.run(SocketServerSession.java:39)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
    at java.lang.Thread.run(Thread.java:613)

0.96 performance for 100M inserts?

Can anyone point me to expected performance on 0.96 for the internal voldemort-performance-tool.sh benchmark (or links to YCSB voldemort stats)?

In particular, I'm interested in inserts and retrievals for >100M records (~16byte key, ~256byte value).

I'm primarily focused on single node performance, and assuming that sharding across more nodes will provide near linear scalability.

So far, the best single-node performance that I've been able to achieve on a server-class 12-core blade machine using only the in-memory cache persistence storage settings (disabled bdb, disabled slop) results in 1M insertions taking 60sec.

-- start 'server.properties' --
node.id=0
max.threads=100
slop.enable=false
enable.readonly.engine=true
enable.bdb.engine=false
socket.enable=true
storage.configs=voldemort.store.memory.CacheStorageConfiguration
-- end --

Given the server blade has 96GB of RAM and everything should be in memory, why would 1M inserts take 60sec?

Any tips or pointers to silly things I might be doing? If you had to insert several terabytes of data into voldemort, what would you suggest (other than waiting a week or two)?

Thanks!

-[dg]

Incorrect coersion of version to short before passing to ClockEntry constructor

In VectorClock.java, line 249, Math.max(v1.getVersion(), v2.getVersion()) is incorrectly coerced from long to short. It results an exception: java.lang.IllegalArgumentException: Version -532 is not in the range (1, 32767) in ClockEntry constructor.

                newClock.versions.add(new ClockEntry(v1.getNodeId(),
                                                     (short) Math.max(v1.getVersion(),
                                                                      v2.getVersion())));

Upgrade to google collections 1.0-rc2

I asked Kevin B about the stability of the google-collections API and he says that no major changes are in the works before 1.0 is released officially in the next 2 months.

It would be great to get this into the next release.

WAR dependencies are a bit of a mess

This could be addressed with Maven (i.e., scoping of dependencies appropriately with test/provided), but right now, at least the junit and servlet JARs are superfluous if not also the jetty JARs, the tomcat Ant task JAR, and probably others.

Run C++ tests in Hudson

C++ tests should run in hudson. To do this best way is probably just to add an ant target that runs them and fails if any tests fail. We won't get the pretty xml output but we can live with that for now.

AdminClient is not reading the client properties file

When creating an instance of voldemort.client.protocol.admin.AdminClient it is creating a ClientConfig object to pass it to the SocketStoreClientFactory.

At the time this is done, the new SocketStoreClientFactory is created with the default ClientConfig values.

In my case I had jmx disabled on Voldemort, but I was seeing some JMX properties published. After investigating I saw that the cause was the one described before.

From my point of view this can be causing unexpected problems, as the one I found with JMX.

To me is a low priority bug, but AdminClient should load and use the client configuration properties file.

MetadataStore.put uses the same code for two branches

        } else if(CLUSTER_KEY.equals(keyStr)) {
            innerStore.put(keyStr, newVersioned);
        } else {
            // default put case
            innerStore.put(keyStr, newVersioned);
        }

Either the "else if" should be removed or there's some code missing in either branch.

Not all server side VoldemortExceptions are defined in ErrorCodeMapper.

PersistenceFailureException is not defined in ErrorCodeMapper. It caused the following exception:

Exception in thread "handler393980" java.lang.IllegalArgumentException: No mapping code for class voldemort.store.PersistenceFailureException
    at voldemort.store.ErrorCodeMapper.getCode(ErrorCodeMapper.java:66)
    at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.writeException(VoldemortNativeRequestHandler.java:158)
    at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.handlePut(VoldemortNativeRequestHandler.java:136)
    at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.handleRequest(VoldemortNativeRequestHandler.java:46)
    at voldemort.server.socket.SocketServerSession.run(SocketServerSession.java:39)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)

A few other VoldemortException subclasses are also missing from ErrorCodeMapper, e.g. SerializationException, NoSuchCapabilityException, etc.

async client library

Correct me if I'm wrong, but there's currently no asynchronous java client library for voldemort. Would be great if we had one..

Provide easy access to users who don't use git

I am not sure if there's a way to expose a git repository as a svn one. Maybe the easiest is to provide tarballs straight from Git. That way someone can just download it immediately, make the changes and provide a patch.

Create email notification for hudson build failures

Currently the hudson server has no ability to send email. Need to install sendmail or something so we can send build failure emails.

The new voldemort commit list is probably a good place to send them.

JsonTypeDefinition Map type key order inconsistency.

JsonTypeDefintion can be created using

  1. public JsonTypeDefinition(Object type)
  2. public static JsonTypeDefinition fromJson(String typeSig)

in fromJson()[2] For Map type signatures we sort the key-sub entries to have consistent order while serializing/deserializing But in the constructor[1] we are not sorting and saving the type as it is causing serializing/deserializing issues if map has same elements but ordering is not.

Fix: change the constructor to create a new type with sorted entries same as fromJson()

Migrate "fun projects" to github wiki

People should be able to add projects to the list or mark themselves as working on something without having committer capabilities. Hopefully it won't turn into a spam target.

We should also add a page for "easy projects" for people who just want to do something small and useful to get started.

Use varints to improve space usage of serialized VectorClock

If we use varints, we can improve space usage of the serialized VectorClock. It also allows us to remove the limitation of 32k clock entries (well, at least replace it with a 2^31 limitation).

The implementation itself is easy, but we need a migration strategy for all the data out there. Jay, do you think this is worth it and any ideas for the migration strategy?

I implemented the serialization part to show how easy it is:

http://github.com/ijuma/voldemort/commit/f4de671e14829ede778c17fa4d100d5ae0b04cb1

The size of the serialized vector clock from VectorClockTest.testSerialization went from 26 bytes to 18 bytes.

Suspicious integer division in RequestCounter.getThroughput

The division in the code below is suspicious (doing integer division and then assigning to a float makes no sense). Also, why bother with a float here? I'd just use double.

public float getThroughput() {
    Accumulator oldv = getValidAccumulator();
    float elapsed = (System.currentTimeMillis() - oldv.startTimeMS) / Time.MS_PER_SECOND;
    if(elapsed > 0f) {
        return oldv.count / elapsed;
    } else {
        return -1f;
    }
}

Support lists, strings, and byte[] of length greater than Short.MAX_VALUE in JsonTypeSerializer

We didn't do varints because we are lazy bastards, but we really need to support values larger than Short.MAX_VALUE without breaking compatibility.

Here is a semi-hacky approach that should work. There are three cases:

if the first two bits are 0x then the number is a 2 byte positive length and can be used directly as a length of the bytes to read.

if the first two bits are 11 then the number value is null.

If the first two bits are 10 then the remainder of the first 4 bytes are the size.

We should have a test of backwards compatibility to ensure we don't break everything.

Move issues from Google Code to this tracker

We should move the issues from Google Code to this tracker. I suggest we just move the ones that are not in progress. For the ones that are likely to be finished within this month (and have already been started), it seems to me that it's easier to just leave them there and close them once it's done.

Implement daemon functionality using Akuma

A quote from here[1]:

"As explained in here, writing a proper daemon requires various function calls that are traditionally only available to native applications. So typically, Java applications rely on external wrappers like Apache commons daemon or Java service wrapper, or even worse, just ignore those conventions and do Runtime.exec.

But this creates an unnecessary inconsistency between how you launch a program in the foreground vs how you launch a program in the background, and hurt the user/administration experience."

[1]
http://weblogs.java.net/blog/kohsuke/archive/2009/01/writing_a_unix.html"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.