Coder Social home page Coder Social logo

beansdb's Introduction

What is Beansdb?

Beansdb is a distributed key-value storage system designed for large scale online system, aiming for high avaliablility and easy management. It took the ideas from Amazon's Dynamo, then made some simplify to Keep It Simple Stupid (KISS).

The clients write to N Beansdb node, then read from R of them (solving conflict). Data in different nodes is synced through hash tree, in cronjob.

It conforms to memcache protocol (not fully supported, see below), so any memcached client can interactive with it without any modification.

Beansdb is heavy used in http://www.douban.com/, is used to stored images, mp3, text fields and so on, see benchmark below.

Any suggestion or feedback is welcomed.

Features

  • High availability data storage with multi readable and writable repications

  • Soft state and final consistency, synced with hash tree

  • Easy Scaling out without interrupting online service

  • High performance read/write for a key-value based object

  • Configurable availability/consistency by N,W,R

  • Memcache protocol compatibility

Supported memcache commands

  • get
  • set(with version support)
  • append
  • incr
  • delete
  • stats
  • flush_all

Private commands

  • get @xxx, list the content of hash tree, such as @0f
  • get ?xxx, get the meta data of key.

Python Example

from dbclient import Beansdb

# three beansdb nodes on localhost
BEANSDBCFG = {
    "localhost:7901": range(16),
    "localhost:7902": range(16),
    "localhost:7903": range(16),
}

db = Beansdb(BEANSDBCFG, 16)

db.set('hello', 'world')
db.get('hello')
db.delete('hello')

Benchmark

 $ beansdb -d 
 $ memstorm -s localhost:7900 -n 1000000 -k 10 -l 100 
   
  ---- 
  Num of Records : 1000000 
  Non-Blocking IO : 0 
  TCP No-Delay : 0 
   
  Successful [SET] : 1000000 
  Failed [SET] : 0 
  Total Time [SET] : 51.77594s 
  Average Time [SET] : 0.00005s 
   
  Successful [GET] : 1000000 
  Failed [GET] : 0 
  Total Time [GET] : 40.93667s 
  Average Time [GET] : 0.00004s 

Real performance in production

  • cluster 1: 1.1B records, 55TB data, 48 nodes, 1100 get/25 set per seconds, med/avg/90%/99% time is 12/20/37/186 ms.
  • cluster 2: 3.3B records, 3.5TB data, 15 nodes, 1000 get/500 set per seconds, med/avg/90%/99% time is 1/11/15/123 ms.

beansdb's People

Contributors

davies avatar hurricane1026 avatar lembacon avatar sykp241095 avatar tonyseek avatar windreamer avatar xtao avatar youngsofun avatar zzl0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

beansdb's Issues

从0.6升级到0.7.1.4 有些数据丢失

数据量有1000G,有的data超过了4000M,没有错误日志
stats信息

STAT pid 30217
STAT uptime 5132
STAT time 1496715386
STAT version 0.7.1.4
STAT pointer_size 64
STAT rusage_user 35.517600
STAT rusage_system 55.968491
STAT rusage_maxrss 858868
STAT item_buf_size 4096
STAT curr_connections 1633
STAT total_connections 106287
STAT connection_structures 1645
STAT cmd_get 136505
STAT cmd_set 2753
STAT cmd_delete 0
STAT slow_cmd 80
STAT get_hits 24410
STAT get_misses 112096
STAT curr_items 2710
STAT total_items 11249364
STAT avail_space 800291680256
STAT total_space 1072578422016
STAT bytes_read 460723098
STAT bytes_written 3962555515
STAT threads 16

Request to create a tag

Could you please create a tag (i.e. a tagged version of beansdb), so I could submit a formula for beansdb to Homebrew since they no longer accept HEAD-only formulae.

Segment fault when update to the latest beansdb

Hi,
We are now update beansdb to the latest version. However, we encounter a segment fault in one of our servers(we have successful update a few already). The data is generated by the previous version of beansdb. The gdb debug info is as follows:
#0 dc_decode_key_with_fmt (dc=0x7f706ac19070,

buf=0x7f70b4d3f950 "{\006\003\376H\317\001", buf_size=256,
src=0x7f7069c76936 "\214-\v\020n$\003\254\067\332\001", len=65)
at codec.c:353

#1 dc_decode (dc=0x7f706ac19070, buf=0x7f70b4d3f950 "{\006\003\376H\317\001",

buf_size=256, src=0x7f7069c76936 "\214-\v\020n$\003\254\067\332\001",
len=65) at codec.c:483

#2 0x000000000040a3bb in key_hash (tree=0x7f70695d0cf0, node=0x7f70ac06af10)

at htree.c:155

#3 split_node (tree=0x7f70695d0cf0, node=0x7f70ac06af10) at htree.c:295
#4 0x000000000040a969 in add_item (tree=0x7f70695d0cf0, node=0x7f70ac06af10,

it=0x7f70695d0d38, keyhash=762181686, enlarge=true) at htree.c:276

#5 0x000000000040af9c in ht_add (tree=0x7f70695d0cf0,

key=0x7f70995d6950 "4a3f35c3a0ac209dad06f4aace8b6e53_tn", pos=6,
hash=25047, ver=1) at htree.c:990

#6 0x00000000004168a5 in bc_set (bc=0xdcf4d0,

key=0x7f70995d6950 "4a3f35c3a0ac209dad06f4aace8b6e53_tn",
value=<value optimized out>, vlen=<value optimized out>, flag=0,
version=<value optimized out>) at bitcask.c:1367

#7 0x0000000000403725 in store_item (it=,

comm=<value optimized out>) at beansdb.c:666

#8 0x0000000000407909 in complete_nread (c=0x7f6fa9b353c0) at beansdb.c:638
#9 drive_machine (c=0x7f6fa9b353c0) at beansdb.c:1607

---Type to continue, or q to quit---
#10 0x0000000000408b42 in worker_main (arg=)

at thread.c:218

#11 0x00007f70baae59d1 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f70ba8328fd in clone () from /lib64/libc.so.6

Before the crash, the beansdb-error.log shows many error and warn like:
2015-06-24 14:50:46.764666 WARN (0xf57fb700:htree.c:104) - BUG: bad version, oldv=-2096927116, newv=-637458766, key=, keyhash = 0x50c5d1f, oldpos = 1597585721
2015-06-24 14:50:46.764678 ERROR (0xf57fb700:codec.c:343) - invalid fmt index: 510
2015-06-24 14:50:46.764721 ERROR (0xf57fb700:codec.c:349) - invalid key: ff7
2015-06-24 14:50:46.764736 ERROR (0xf57fb700:codec.c:356) - invalid length of key: 231 != 9
2015-06-24 14:50:46.764752 ERROR (0xf57fb700:codec.c:356) - invalid length of key: 20 != 9
2015-06-24 14:50:46.764764 WARN (0xf57fb700:htree.c:104) - BUG: bad version, oldv=1873677826, newv=16890269, key=, keyhash = 0x811c9dc5, oldpos = 534574340
2015-06-24 14:50:46.779001 WARN (0x557fb700:htree.c:104) - BUG: bad version, oldv=-1303060716, newv=22352174, key=, keyhash = 0x811c9dc5, oldpos = 2908815360

After trace into the code, it seems the f in the following code is not null, but f->nargs is not accessable

static inline int dc_decode_key_with_fmt(Codec dc, char *buf, int buf_size, const char *src, int len)
{
if (len < 5)
return 0;
int intlen;
int idx = decode_varint_old(src, &intlen);
int32_t *args = (int32_t
)(src + intlen);
Fmt *f = dc->dict[idx];
....

Besides these, we cannot find exactly what is wrong. hope you can have more insight about this problem

There may exist problem in bc_optimize()

The code without comments is difficult to read, I think it's important to give every implement file some descriptions.

        HTree *cur_tree = optimizeDataFile(bc->tree, i, datapath, hintpath, limit, &recoverd);
        if (NULL == cur_tree) continue;
        pthread_mutex_lock(&bc->buffer_lock);
        bc->bytes -= recoverd;
        pthread_mutex_unlock(&bc->buffer_lock);

        pthread_mutex_lock(&bc->write_lock);
        ht_visit(cur_tree, update_items, bc->tree);
        pthread_mutex_unlock(&bc->write_lock);

optimizeDataFile() will scan the datapath file and build a new hash tree according to datafile. But

    unlink(hintpath);
    unlink(path);
    rename(tmp, path);

unlink datapath at last.

ht_visit(cur_tree, update_items, bc->tree) is aimed to update bc->tree to relocate value pos.

Is there may exist situation that when unlink(path) executed, new data file is renamed and bc->tree haven't been updated, request may fail if get request read this data file and bc->tree point a false position.

memstorm 测试结果可信吗

看了下memsotrm, set一个value,立刻就get这个value.这样的get数据都在cache里. 这个数据体现不出经过磁盘的性能啊.

A beansdb Crash Issue

Hi,

We recently encounter a beansdb crash issue, and the gdb trace is like this:
#0 decode_record (buf=0x7fd0a3953b90 <Address 0x7fd0a3953b90 out of bounds>,

size=4124399360, decomp=true) at src/record.c:147

#1 0x000000000040bee4 in bc_get (bc=0x28204c0,

key=0x7fd0d9a107e4 "-3418702703845858593") at src/bitcask.c:436

#2 0x000000000040e638 in hs_get (store=0x235b610,

key=0x7fd0d9a107e4 "-3418702703845858593", vlen=0x7fd365c4231c,
flag=0x7fd365c42318) at src/hstore.c:322

After trace into the code, we found that the reason is that, in the following code:
DataRecord* bc_get(Bitcask bc, const char key)
{
Item *item = ht_get(bc->tree, key);
if (NULL == item) return NULL;
if (item->ver < 0){
free(item);
return NULL;
}

uint32_t bucket = item->pos & 0xff;
uint32_t pos = item->pos & 0xffffff00;
if (bucket > bc->curr) {
    fprintf(stderr, "BUG: invalid bucket %d > %d\n", bucket, bc->curr);
    ht_remove(bc->tree, key);
    free(item);
    return NULL;
}
DataRecord* r = NULL;
if (bucket == bc->curr) {
    pthread_mutex_lock(&bc->buffer_lock);
    if (bucket == bc->curr && pos >= bc->wbuf_start_pos){
        uint32_t p = pos - bc->wbuf_start_pos;
        r = decode_record(bc->write_buffer + p, bc->wbuf_curr_pos - p, true);
    }
    pthread_mutex_unlock(&bc->buffer_lock);

.....
}

The code get the item and pos first. However, the item->pos seems possibly to be changed by other threads(possibly delete or flush, because the crash happens more on those machine that is deleting data and use flush all to get back the space ), which caused bc->write_buffer + p to be an invalid value. beansdb crashed when bc->write_buffer + p ovelflow, or get an invalid value when it is not overflow.

We intend to change, but it relates too many lock operation, we are not 100% sure of safe changing. We are appreciate if you can help to solve this bug

Typo in the repo title

Yet anonther distributed key-value storage system from Douban Inc.

"anonther" should be "another". 😃

segmentation fault on latest version

Hi,

We met segmentation fault again with the latest beansdb version. All the data is generated by the new version. This crash happens on beansdb that we are deleting old data (if we never delete data, it will not crash)

some possible error log output in the beansdb-error.log:

2015-09-07 07:07:30.302819 ERROR (0x1346c700:record.c:184) - invalid ksz=0, vsz=0, wbuf @891486464, key = (-896948590465773564)
2015-09-07 07:07:30.302990 ERROR (0x1346c700:record.c:380) - read file fail, /data/running/beansdb/storage/f/1/043.data @891486464, file size = 654926336, key = -896948590465773564
2015-09-07 07:07:30.303022 ERROR (0x1346c700:bitcask.c:1099) - Bug: get -896948590465773564 failed in /data/running/beansdb/storage/f/1/043.data @ 891486464

And the gdb info

(gdb) bt
#0 decode_record (buf=0x7fd4ef3a7f40 Address 0x7fd4ef3a7f40 out of bounds,

size=4043926784, decomp=true, path=0x42b242 "wbuf", pos=884000768,
key=0x7fd3c434a864 "5335789194110384689", do_logging=true, fail_reason=0x0)
at record.c:180

#1 0x0000000000413cc9 in bc_get (bc=0x18a7b80,

key=0x7fd3c434a864 "5335789194110384689", ret_pos=<value optimized out>,
return_deleted=<value optimized out>) at bitcask.c:1006

#2 0x00000000004179f0 in hs_get (store=0x11d1570,

key=0x7fd3c434a864 "5335789194110384689", vlen=0x7fd51526e51c,
flag=0x7fd51526e518) at hstore.c:397

#3 0x00000000004084ea in item_get (key=0x7fd3c434a864 "5335789194110384689",

nkey=19) at item.c:221

#4 0x0000000000405fc3 in process_get_command (c=0x7fd4983b4ce0,

command=<value optimized out>) at beansdb.c:918

#5 process_command (c=0x7fd4983b4ce0, command=)

at beansdb.c:1214

#6 0x00000000004073e4 in try_read_command (c=0x7fd4983b4ce0) at beansdb.c:1362
#7 drive_machine (c=0x7fd4983b4ce0) at beansdb.c:1590
#8 0x0000000000408b42 in worker_main (arg=)

at thread.c:218

#9 0x00000033a08079d1 in start_thread () from /lib64/libpthread.so.0
#10 0x00000033a00e88fd in clone () from /lib64/libc.so.6

Hope to get your response soon:)

error 4 in libc-2.12.so

Apr 21 15:15:39 www kernel: beansdb[1082]: segfault at 7f76da9caff8 ip 00007f76da07272c sp 00007fffc1f8e648 error 4 in libc-2.12.so[7f76d9ff7000+18a000]

beansdb 0.6.0

系统信息
centos 6.5 x64
[root@www ~]# ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
[root@www ~]#

thread.c(worker_main):close(fd) is unexpected in some situation

        if (c == NULL){
//            fprintf(stderr, "Bug: conn %d should not be NULL\n", fd);
//            close(fd);
//            when worker 1 get `conns[fd]` and unlock `leader`, it may
//            worker 2 still get fd readable or writeable, so it may
//            cause that `c` is NULL. close(fd) will cause unexpected
//            event occured.
            goto AGAIN;
}

Test in os x 10.8.2 when big bulk command comes

./beansdb
Bug: conn 8 should not be NULL
Bug: conn 8 should not be NULL
Bug: conn 8 should not be NULL

and more terrible is fd which listen will be close, it cause client connect failed!

I think the commit 04794ed1757738b3b178a4fbf7601701d16a1a3d is wrong

I am sorry my last issue the source is already changed , but the commit 04794ed1757738b3b178a4fbf7601701d16a1a3d I think not repair the problem. I still meet problem but different from pervious issue.

When I clone the newest commit and rebuild, I run shell below:

./beansdb
all ready.
key 20 (#8) should not in this tree (1:1)
key 21 (#8) should not in this tree (1:1)
key 21 (#8) should not in this tree (1:1)
key 9050 (#2) should not in this tree (1:3)
Segmentation fault: 11

My client script is

from socket import socket
from time import time

s = socket()
def cons(command, key, value):
    return '%s %s 0 0 %d\r\n%s\r\n' % (command, key, len(value), value)

port = 7900

start = time()
s.connect(('localhost', port))
total = 0
for i in range(1000000):
    message = cons('set', str(i), 'asdfadsfasdf')
    try:
        s.send(message)
        s.recv(100)
        total += 1
    except:
        s = socket()
        s.connect(('localhost', port))
        print total
        total = 0
end = time()
print end-start

The goal script writed I only want to test something .

        loop.nready --;
        int fd = loop.fired[loop.nready];
        conn *c = loop.conns[fd];
        if (c == NULL){
            fprintf(stderr, "Bug: conn %d should not be NULL\n", fd);
            delete_event(fd);
            close(fd);
            goto AGAIN;
        }
        //loop.conns[fd] = NULL; 
        pthread_mutex_unlock(&leader);

        if (drive_machine(c)) {
            if (update_event(fd, c->ev_flags, c)) conn_close(c);
        }

I think commet
loop.conns[fd] = NULL;
is useless.

When worker1 is parsing the command from client, worker2 get the same connection and enter drive_machine. The connection is mastered by two workers!!

osx 10.8.2
python 2.7

the word is"INADDR_ANY" NOT "INDRR_ANY"

In src/beansdb.c , 1651 line:
"-l <ip_addr> interface to listen on, default is INDRR_ANY\n"

should change "INDRR_ANY" to "INADDR_ANY" :
"-l <ip_addr> interface to listen on, default is INADDR_ANY\n"

Another Segmentation fault of the new version

Hi,
Sorry to trouble again, but we meet another Segmentation fault, which causes beansdb restarted.
This issue seems related to issue 23, or may be the bug of issue 23 is not solved in the new version?
the gdb info:
#0 get_item_hash (tree=, node=, it=0x7f975472b020, keyhash=) at htree.c:422
#1 0x0000000000409e85 in ht_get_withbuf (tree=0x7f9745b97570, key=, len=, buf=0x7f975472b020 "\005p\220\036\001", lock=true) at htree.c:1090
#2 0x0000000000409f32 in ht_get_maybe_tmp (tree=0x7f9745b97570, key=0x7f964d90ea84 "8451159738723177110", is_tmp=0x7f975472b42c, buf=0x7f975472b020 "\005p\220\036\001") at htree.c:1107
#3 0x000000000041371c in bc_get (bc=0x188b080, key=0x7f964d90ea84 "8451159738723177110", ret_pos=0x7f975472b4cc, return_deleted=true) at bitcask.c:981
#4 0x00000000004179f0 in hs_get (store=0x1229470, key=0x7f964d90ea84 "8451159738723177110", vlen=0x7f975472b51c, flag=0x7f975472b518) at hstore.c:397
#5 0x00000000004084ea in item_get (key=0x7f964d90ea84 "8451159738723177110", nkey=19) at item.c:221
#6 0x0000000000405fc3 in process_get_command (c=0x7f964c9e9af0, command=) at beansdb.c:918
#7 process_command (c=0x7f964c9e9af0, command=) at beansdb.c:1214
#8 0x00000000004073e4 in try_read_command (c=0x7f964c9e9af0) at beansdb.c:1362
#9 drive_machine (c=0x7f964c9e9af0) at beansdb.c:1590
#10 0x0000000000408b42 in worker_main (arg=) at thread.c:218
#11 0x00000032248079d1 in start_thread () from /lib64/libpthread.so.0
#12 0x00000032244e88fd in clone () from /lib64/libc.so.6

The possible reason of crash is p_len is not accessable,
(gdb) p p_len
Cannot access memory at address 0x7f974800000a

I did not see any ERROR log in the beansdb-error.log

cannot compile on macosx 10.8.2

make  all-recursive
Making all in doc
make[2]: Nothing to be done for `all'.
gcc -DHAVE_CONFIG_H -I.  -DNDEBUG   -I/opt/boxen/homebrew/include -MT beansdb-htree.o -MD -MP -MF .deps/beansdb-htree.Tpo -c -o beansdb-htree.o `test -f 'src/htree.c' || echo './'`src/htree.c
src/htree.c: In function ‘ht_open’:
src/htree.c:494: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘off_t’
src/htree.c:494: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘off_t’
src/htree.c:499: error: ‘POSIX_FADV_SEQUENTIAL’ undeclared (first use in this function)
src/htree.c:499: error: (Each undeclared identifier is reported only once
src/htree.c:499: error: for each function it appears in.)
make[2]: *** [beansdb-htree.o] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Crash on get

Program terminated with signal 11, Segmentation fault.
#0 decode_record (buf=0x1e98c430 <Address 0x1e98c430 out of bounds>,

size=3799723776, decomp=true) at src/record.c:147

147 int ksz = r->ksz, vsz = r->vsz;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.