Coder Social home page Coder Social logo

facebook / rocksdb Goto Github PK

View Code? Open in Web Editor NEW
27.4K 1.0K 6.1K 211.88 MB

A library that provides an embeddable, persistent key-value store for fast storage.

Home Page: http://rocksdb.org

License: GNU General Public License v2.0

Shell 0.90% Python 1.54% C++ 83.74% C 1.63% Makefile 0.69% Java 9.99% CMake 0.42% PowerShell 0.06% Perl 0.95% Assembly 0.05% Dockerfile 0.01% BitBake 0.03%
database storage-engine

rocksdb's Introduction

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat ([email protected]) and Jeff Dean ([email protected])

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.

rocksdb's People

Contributors

adamretter avatar agiardullo avatar ajkr avatar akankshamahajan15 avatar cbi42 avatar dhruba avatar emayanke avatar fyrz avatar haoboxu avatar hx235 avatar igorcanadi avatar islamabdelrahman avatar jay-zhuang avatar jaykorean avatar jowlyzhang avatar lightmark avatar liukai avatar ltamasi avatar maysamyabandeh avatar mdcallag avatar miasantreble avatar mrambacher avatar pdillinger avatar riversand963 avatar rven1 avatar sagar0 avatar siying avatar yhchiang avatar yuslepukhin avatar zhichao-cao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rocksdb's Issues

Provide 'make install'

make install is a standard way to copy all the library files to the system. rocksdb should have make install as well.

strategies to reduce write amplification

From the bentch marks, we can see the bottleneck of rocksdb is also the write amplification. Write throughput is one tenth of the read throughput.
The amplification can be calculated as 2_(N+1)(L-1). N is Dn / Dn-1, Dn is the total data size of level n, which is 10 in leveldb. L is total level, which is lg (Total Data Size), may be 6 or 7.
Because each key-value migrate to the highest level, it need to migrate L-1 times from Ln-1 to Ln. each migrate it need to read 1file in current level plus N files in higher level and write N+1 files in higher level.
In the bentch mark test, write amplification may be just about 2
(10+1)_(6-1) = 110
given the read throughput is much larger than write. There may be 2 strategies to reduce the amplification.

  1. Store more data in memory. if you store 80G data in memory and given N=10, then you need only one more level in disk. So the write amplification can be dramatically reduced. The shortcoming is that recovering the data in memory takes more time, may be about a minute. You can optimize the recover further.
  2. Change N, if you change N to 5, then L will be about 8, the amplification is 2_(5+1)_(8-1)=84. The short coming is that read speed will go down, may be only 3/4 of the origin.

compile error with g++-4.7

It seems that there is a missing include with g++-4.7:

$ make CXX=g++-4.7
g++-4.7 -g -Wall -Werror -I. -I./include -std=gnu++11  -DROCKSDB_PLATFORM_POSIX  -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DGFLAGS -DZLIB -DBZIP2   -DHAVE_JEMALLOC -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -Woverloaded-virtual -c db/compaction.cc -o db/compaction.o 
In file included from ./db/compaction.h:11:0,
                 from db/compaction.cc:10:
./db/version_set.h:437:3: error: 'atomic' in namespace 'std' does not name a type
./db/version_set.h: In member function 'uint64_t rocksdb::VersionSet::LastSequence() const':
./db/version_set.h:333:12: error: 'last_sequence_' was not declared in this scope
./db/version_set.h:333:32: error: 'memory_order_acquire' is not a member of 'std'
./db/version_set.h: In member function 'void rocksdb::VersionSet::SetLastSequence(uint64_t)':
./db/version_set.h:338:5: error: 'last_sequence_' was not declared in this scope
./db/version_set.h:339:29: error: 'memory_order_release' is not a member of 'std'
make: *** [db/compaction.o] Error 1

Coverity scan results

I put RocksDB through a Coverity Scan. Looks pretty good.

block_based_table_builder.cc line 112:
filter_block(opt.filter_policy == nullptr ? nullptr : new FilterBlockBuilder(opt))
There is no destructor that deletes this new FilterBlockBuilder.

db_stats_logger.cc line 42:
Should probably lock the mutex before changing bg_logstats_scheduled_.

BZip2_Compress in port/port_posix.h has a stray line:
return output;

NewRandomRWFile can technically leak the file descriptor fd if it takes the if (options.use_mmap_writes || options.use_mmap_reads) path.

Most of the false positives are related to the use of std::unique_ptr and some shortcuts in test code.

Mailing list and update on prefix API

Hi, there is document on "Proposal for prefix API". Is this available yet or any information on its development?

Lastly, I could not find any reference to rocksdb mailing list, if it does not exist let's start it please?

Thanks and regards.

build_tools/build_detect_version rocksdb_build_git_sha

18 env -i git rev-parse HEAD 2>&1 |
19 awk '
20 BEGIN {
21 print "#include "build_version.h"\n"
22 }
23 { print "const char* rocksdb_build_git_sha = "rocksdb_build_git_sha:" $0"";" }
24 ' > ${VFILE}

in my pc, exec "env -i git rev-parse HEAD" failed, print two lines info, like this.
$ env -i git rev-parse HEAD
fatal: Not a git repository (or any parent up to mount parent /mnt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

It can print two 'const char* rocksdb_build_git_sha = "rocksdb_build_git_sha:XXX" ' into util/build_version.cc, so that redifinition error happened when compiled src.

I think it is a bug, must to check result of command "env -i git rev-parse HEAD", to make sure the resutl just only one line.

Missing includes

Files

db/perf_context_test.cc
db/prefix_test.cc

seem to lack includes

#include "rocksdb/slice_transform.h"
#include "rocksdb/memtablerep.h"

Without these, compilation of targets perf_context_test and prefix_test results in following errors:

db/perf_context_test.cc: In function ‘std::shared_ptr<rocksdb::DB> rocksdb::OpenDb()’:
db/perf_context_test.cc:40:31: error: ‘NewFixedPrefixTransform’ is not a member of ‘rocksdb’
db/perf_context_test.cc:42:53: error: ‘NewHashSkipListRepFactory’ was not declared in this scope

db/prefix_test.cc: In member function ‘std::shared_ptr<rocksdb::DB> rocksdb::PrefixTest::OpenDb()’:
db/prefix_test.cc:109:56: error: ‘NewFixedPrefixTransform’ was not declared in this scope
db/prefix_test.cc:112:47: error: ‘NewHashSkipListRepFactory’ was not declared in this scope
db/prefix_test.cc: In member function ‘void rocksdb::_Test_PrefixHash::_Run()’:
db/prefix_test.cc:264:44: error: invalid use of incomplete type ‘const class rocksdb::SliceTransform’
In file included from ./include/rocksdb/db.h:17:0,
                 from db/prefix_test.cc:7:
./include/rocksdb/options.h:35:7: error: forward declaration of ‘const class rocksdb::SliceTransform’
db/prefix_test.cc:302:50: error: invalid use of incomplete type ‘const class rocksdb::SliceTransform’
In file included from ./include/rocksdb/db.h:17:0,
                 from db/prefix_test.cc:7:
./include/rocksdb/options.h:35:7: error: forward declaration of ‘const class rocksdb::SliceTransform’

Undefined symbol rocksdb::CreateDBStatistics()

Hi,

I got an error when compiling the following my test program.

Undefined symbols for architecture x86_64:
  "rocksdb::CreateDBStatistics()", referenced from:
      _main in test-5grmFu.o
ld: symbol(s) not found for architecture x86_64

This function is used only for testing?

Regards,

#include <cassert>
#include <rocksdb/db.h>
#include <rocksdb/statistics.h>

int main() {
    rocksdb::DB* db;
    rocksdb::Options options;
    options.create_if_missing = true;
    options.statistics = rocksdb::CreateDBStatistics();
    rocksdb::Status status = rocksdb::DB::Open(options, "/tmp/rocks.db", &db);
    assert(status.ok());
    return 0;
}

<cstdatomic> vs <atomic>

This check from build_tools/build_platform_detect won't define LEVELDB_CSTDATOMIC_PRESENT because doesn't exist for gcc 4.7. Maybe it did for older gcc versions, but I don't track modern gcc/g++ enough to understand it. I think the check might need to use "include " instead.

# If -std=c++0x works, use <cstdatomic>.  Otherwise use port_posix.h.
$CXX $CFLAGS -std=c++0x -x c++ - -o /dev/null 2>/dev/null  <<EOF
  #include <cstdatomic>
  int main() {}

EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DLEVELDB_PLATFORM_POSIX -DLEVELDB_CSTDATOMIC_PRESENT"
PLATFORM_CXXFLAGS="-std=c++0x"
else
COMMON_FLAGS="$COMMON_FLAGS -DLEVELDB_PLATFORM_POSIX"
fi

compile error on Ubuntu

Thanks for makes this project.

I'm tried to compile rocksdb on Ubuntu 12.04 LTS, but I got an error message.

g++ -g -Wall -Werror -I. -I./include -DOS_LINUX -fno-builtin-memcmp -DLEVELDB_PLATFORM_POSIX -DGFLAGS -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -std=gnu++0x -Woverloaded-virtual -c db/db_test.cc -o db/db_test.o 
In file included from db/db_test.cc:27:0:
db/db_test.cc: In member function ‘void rocksdb::_Test_TransactionLogIteratorCorruptedLog::_Run()’:
db/db_test.cc:4108:75: error: ‘truncate’ was not declared in this scope
       truncate(logfilePath.c_str(), wal_files.front()->SizeFileBytes() / 2)     
                                                                           ^
./util/testharness.h:111:78: note: in definition of macro ‘ASSERT_EQ’
 #define ASSERT_EQ(a,b) ::rocksdb::test::Tester(__FILE__, __LINE__).IsEq((a),(b))
                                                                              ^
make: *** [db/db_test.o] Error 1

so add below code.

#include <unistd.h>
#include <sys/types.h>

Am I wrong?

bulk load

how to use the bulk load which has been described in the benchmark?
I cannot find the benchmark test code in source code

Memory leak in LRU cache?

Hi,

I am seeing a strange memory leak issue with LRUCache. When LRUCache capacity is set as 8MB, then process's memory space remain constant and does not grow with time. When LRUCache's capacity is set > 8MB then process's memory space slowly grows and is ultimately killed by OS.

DB used for testing

DB already has 650M rows where each row's key is around 20 bytes and value averaging around 250 bytes. DB operations are put and multiGet. Keys used for multi-gets are chosen in random fashion. Rocksdb is loaded as part of JAVA process and DB operations are accessed through JNI layer. Cache is not exposed through JNI layer and default cache used by rocksdb is used.

Cache testing

  1. Default cache of rocksdb (8MB capacity) is used and process's memory size remains constant.
  2. When cache size is bumped up to say 512MB, process's memory space slowly grows and is finally killed by OS. Cache capacity was bumped up by changing db/db_impl.cc:
    result.block_cache = NewLRUCache(67108864);

Please note that between 1) and 2), there were no changes in code other than increasing cache capacity.

Can someone please comment if they have seen this behavior? Thanks.!

PS: I am using latest code of rocksdb from github repo. Other options used for creating DB:

Write buffer size: 128MB
DisableSeekCompaction: false
block size: 4KB
maxWriteBufferNumber: 3
Background compaction threads: 5
level0 file num compaction trigger: 4
level 0 stop files trigger: 12
numLevels: 4
maxGrandparent Overlap Factor: 10
Max bytes for level base: 1GB
No Compression
Custom compaction filter

Compilation error if zlib not found and bzip2 was

port/port_posix.h, line 339
A constant Z_BUF_ERROR is used, which seems to be undefined if zlib was not found while bzip2 was (since it is defined in zlib headers).

It seems that just deleting this line would be a nice workaround, since it is a case followed by the default case.

Z_BUF_ERROR is not valid for BZip2_Compress

This code doesn't compile when bzip2 is installed but zlib is not. Regardless is it not correct to use Z_BUF_ERROR here.

see http://www.bzip.org/1.0.3/html/low-level.html

int old_sz =0, new_sz =0;
while(_stream.next_in != nullptr && _stream.avail_in != 0) {
int st = BZ2_bzCompress(&_stream, BZ_FINISH);
switch (st) {
case BZ_STREAM_END:
break;
case BZ_FINISH_OK:
// No output space. Increase the output space by 20%.
// (Should we fail the compression since it expands the size?)
old_sz = output->size();
new_sz = (int)(output->size() * 1.2);
output->resize(new_sz);
// Set more output.
_stream.next_out = (char _)&(_output)[old_sz];
_stream.avail_out = new_sz - old_sz;
break;
case Z_BUF_ERROR:
default:
BZ2_bzCompressEnd(&_stream);
return false;
}
}

Document Installation on CentOS/RedHat Linux

It would be beneficial if you can add additional documentation for installing RocksDB on CentOS and other flavors of Linux that do not support apt-get. Example below:

Install zlib:
yum install zlib
yum install zlib-devel

Install bzip2:
yum install bzip2
yum install bzip2-devel

Install latest C++ compiler that supports C++ 11:
yum install gcc47-c++

Install snappy:
wget https://snappy.googlecode.com/files/snappy-1.1.1.tar.gz
tar -xzvf snappy-1.1.1.tar.gz
cd snappy-1.1.1
./configure && make && make install

Install gflags:
wget https://gflags.googlecode.com/files/gflags-2.0-no-svn-files.tar.gz
tar -xzvf gflags-2.0-no-svn-files.tar.gz
cd gflags-2.0
./configure && make && make install

Compilation error in Ubuntu 13.04(32bit OS)

I tried to compile rocksdb under ubuntu 13.04(32bit OS) but failed with the errors:

rocksdb ) make
g++ -g -Wall -Werror -I. -I./include -std=gnu++11   -lpthread -lrt -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DZLIB -DBZIP2   -DHAVE_JEMALLOC -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -Woverloaded-virtual -c db/builder.cc -o db/builder.o 
In file included from ./db/dbformat.h:18:0,
             from db/builder.cc:13:
./util/coding.h: In function ‘uint32_t rocksdb::DecodeFixed32(const char*)’:
./util/coding.h:70:7: error: ‘port’ has not been declared
   if (port::kLittleEndian) {
       ^
./util/coding.h: In function ‘uint64_t rocksdb::DecodeFixed64(const char*)’:
./util/coding.h:84:7: error: ‘port’ has not been declared
if (port::kLittleEndian) {
    ^
./util/coding.h:94:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
cc1plus: all warnings being treated as errors
make: *** [db/builder.o] Error 1

BTW, my gcc version is 4.8.1, anyone can help let me know how to solve the problem ?

compilation error on mac os mavericks

hi, I just compiled the latest sources on my local mac os x (mavericks).
in compiling, there is a single compilation errors like this.

util/env_posix.cc:1401:71: error: missing binary operator before token "("

my compiler is same with the recommended version.

[junyoung@junyoung-2:/opt/rocksdb]$ g++ --version
g++ (GCC) 4.7.3
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and os information is here

[junyoung@junyoung-2:/opt/rocksdb]$ uname -a
Darwin junyoung-2.local 13.0.0 Darwin Kernel Version 13.0.0: Thu Sep 19 22:22:27 PDT 2013; root:xnu-2422.1.72~6/RELEASE_X86_64 x86_64

to fix the problem I mentioned. we need to apply the following patch to the trunk.

diff --git a/util/env_posix.cc b/util/env_posix.cc
index 16c3d1c..fc97465 100644
--- a/util/env_posix.cc
+++ b/util/env_posix.cc
@@ -1398,12 +1398,14 @@ class PosixEnv : public Env {
                 (unsigned long)t);

         // Set the thread name to aid debugging
-#if defined(_GNU_SOURCE) && defined(__GLIBC_PREREQ) && (__GLIBC_PREREQ(2, 12))
+#if defined(_GNU_SOURCE) && defined(__GLIBC_PREREQ)
+#if defined(__GLIBC_PREREQ(2, 12))
         char name_buf[16];
         snprintf(name_buf, sizeof name_buf, "rocksdb:bg%zu", bgthreads_.size());
         name_buf[sizeof name_buf - 1] = '\0';
         pthread_setname_np(t, name_buf);
 #endif
+#endif

         bgthreads_.push_back(t);
       }

Failed in running test units

after the make, I ran make check in rocksdb's directory,.

Running testCountDelimDump...
ERunning testCountDelimIDump...
ERunning testDumpLoad...
ERunning testHexPutGet...
ERunning testInvalidCmdLines...
.Running testMiscAdminTask...
ERunning testSimpleStringPutGet...
ERunning testStringBatchPut...
ERunning testTtlPutGet...

E

ERROR: testCountDelimDump (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 150, in testCountDelimDump
self.assertRunOK("batchput x.1 x1 --create_if_missing", "OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_wWULQa/testdb batchput x.1 x1 --create_if_missing |grep -v "Created bg thread"

'Make all' not building

I git cloned RocksDB and tried building it using 'make all' command. I am seeing these errors. Am I missing something?

I see these errors:

In file included from ./db/dbformat.h:18:0,
from db/builder.cc:13:
./util/coding.h: In function ‘uint32_t rocksdb::DecodeFixed32(const char_)’:
./util/coding.h:70:7: error: ‘port’ has not been declared
if (port::kLittleEndian) {
^
./util/coding.h: In function ‘uint64_t rocksdb::DecodeFixed64(const char_)’:
./util/coding.h:84:7: error: ‘port’ has not been declared
if (port::kLittleEndian) {
^
./util/coding.h:94:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
cc1plus: all warnings being treated as errors
make: *** [db/builder.o] Error 1

Using more memory than the one specified in rocksdb::Options

Summary : RocksDB looks like using more memory than the size of total memory specified in rocksdb::Options. It leads Linux kernel to kill a process that is using RocksDB after using physical memory and swap.

Created bg thread 0x7f748f3ff700
./master.sh: line 1: 29105 Killed

cat /var/log/messages
Nov 28 15:34:57 userver90-157 kernel: Out of memory: Kill process 29105 (ssdb-server) score 335 or sacrifice child
Nov 28 15:34:57 userver90-157 kernel: Killed process 29105, UID 1000, (ssdb-server) total-vm:9259152kB, anon-   rss:4827800kB, file-rss:804kB

I was trying to load 150GB of data into a system that has only 16GB of physical memory. Insertion rate was 15000 key/value pairs per second. Key size is 32 bytes, Value size varies from 100 to 2KB.

Following is the list of option values I set.

options.create_if_missing = true;
options.filter_policy = rocksdb::NewBloomFilterPolicy(10);
    // 512M
options.block_cache = rocksdb::NewLRUCache(512 * 1048576);
    // 32K
options.block_size = 32 * 1024;
    // 64M
options.write_buffer_size = 64 * 1024 * 1024;
options.max_open_files = 200;
    options.compression = rocksdb::kNoCompression;

Options that are not set :

block_cache_compressed
memtable_factory
table_factory

Is there a single option parameter that can be set to limit the maximum size of memory that RocksDB can use?

Following is from the result of profiling with valgrind using massif.
The process is using 1.6GB. If I do not stop the process, it continues to allocate memory resulting in killing the process by Linux kernel.

n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)

69 227,407,665,278 1,718,009,376 1,717,227,820 781,556 0
99.95% (1,717,227,820B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->97.68% (1,678,140,930B) 0x475E17: rocksdb::ArenaImpl::AllocateNewBlock(unsigned long) (arena_impl.cc:72)
| ->97.68% (1,678,140,930B) 0x475E99: rocksdb::ArenaImpl::AllocateFallback(unsigned long) (arena_impl.cc:43)
| ->77.34% (1,328,755,428B) 0x450D72: rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&) (arena_impl.h:81)
| | ->75.00% (1,288,490,112B) 0x471990: rocksdb::(anonymous namespace)::MemTableInserter::Put(rocksdb::Slice const&, rocksdb::Slice const&) (write_batch.cc:201)
| | | ->75.00% (1,288,490,112B) 0x471D89: rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler_) const (write_batch.cc:84)
| | | ->75.00% (1,288,490,112B) 0x47225C: rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteBatch const_, rocksdb::MemTable_, rocksdb::Options const_, rocksdb::DB_, bool) (write_batch.cc:232)
| | | ->75.00% (1,288,490,112B) 0x43A489: rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch_) (db_impl.cc:2856)
| | | | ->75.00% (1,288,490,112B) 0x41C03B: BinlogQueue::commit() (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | | ->75.00% (1,288,490,112B) 0x40D8AD: SSDB::set(Bytes const&, Bytes const&, char) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | | ->75.00% (1,288,490,112B) 0x422670: proc_set(Server_, Link_, std::vector<Bytes, std::allocator > const&, std::vector<std::string, std::allocatorstd::string >) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | | ->75.00% (1,288,490,112B) 0x41F15F: Server::ProcWorker::proc(ProcJob) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | | ->75.00% (1,288,490,112B) 0x42A650: WorkerPool<Server::ProcWorker, ProcJob>::run_worker(void) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | | ->75.00% (1,288,490,112B) 0x4E3084F: start_thread (in /lib64/libpthread-2.12.so)
| | | | ->75.00% (1,288,490,112B) 0x5CDC94B: clone (in /lib64/libc-2.12.so)
| | | |
| | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
| | |
| | ->02.34% (40,265,316B) 0x47188C: rocksdb::(anonymous namespace)::MemTableInserter::Delete(rocksdb::Slice const&) (write_batch.cc:221)
| | ->02.34% (40,265,316B) 0x471D55: rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler
) const (write_batch.cc:92)
| | ->02.34% (40,265,316B) 0x47225C: rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteBatch const
, rocksdb::MemTable_, rocksdb::Options const_, rocksdb::DB_, bool) (write_batch.cc:232)
| | ->02.34% (40,265,316B) 0x43A489: rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch_) (db_impl.cc:2856)
| | ->01.95% (33,554,430B) 0x41D027: BinlogQueue::del_range(unsigned long, unsigned long) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->01.95% (33,554,430B) 0x41D148: BinlogQueue::log_clean_thread_func(void_) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->01.95% (33,554,430B) 0x4E3084F: start_thread (in /lib64/libpthread-2.12.so)
| | | ->01.95% (33,554,430B) 0x5CDC94B: clone (in /lib64/libc-2.12.so)
| | |
| | ->00.39% (6,710,886B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->20.34% (349,385,502B) 0x475F23: rocksdb::ArenaImpl::AllocateAligned(unsigned long) (arena_impl.cc:65)
| ->10.55% (181,193,922B) 0x48B638: rocksdb::SkipList<char const*, rocksdb::MemTableRep::KeyComparator&>::Insert(char const* const&) (skiplist.h:196)
| | ->10.55% (181,193,922B) 0x48B337: rocksdb::(anonymous namespace)::SkipListRep::Insert(char const_) (skiplistrep.cc:22)
| | ->10.55% (181,193,922B) 0x450D0C: rocksdb::MemTable::Add(unsigned long, rocksdb::ValueType, rocksdb::Slice const&, rocksdb::Slice const&) (memtable.cc:155)
| | ->08.20% (140,928,606B) 0x471990: rocksdb::(anonymous namespace)::MemTableInserter::Put(rocksdb::Slice const&, rocksdb::Slice const&) (write_batch.cc:201)
| | | ->08.20% (140,928,606B) 0x471D89: rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler_) const (write_batch.cc:84)
| | | ->08.20% (140,928,606B) 0x47225C: rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteBatch const_, rocksdb::MemTable_, rocksdb::Options const_, rocksdb::DB_, bool) (write_batch.cc:232)
| | | ->08.20% (140,928,606B) 0x43A489: rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch_) (db_impl.cc:2856)
| | | ->08.20% (140,928,606B) 0x41C03B: BinlogQueue::commit() (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.20% (140,928,606B) 0x40D8AD: SSDB::set(Bytes const&, Bytes const&, char) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.20% (140,928,606B) 0x422670: proc_set(Server_, Link_, std::vector<Bytes, std::allocator > const&, std::vector<std::string, std::allocatorstd::string >) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.20% (140,928,606B) 0x41F15F: Server::ProcWorker::proc(ProcJob) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.20% (140,928,606B) 0x42A650: WorkerPool<Server::ProcWorker, ProcJob>::run_worker(void) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.20% (140,928,606B) 0x4E3084F: start_thread (in /lib64/libpthread-2.12.so)
| | | ->08.20% (140,928,606B) 0x5CDC94B: clone (in /lib64/libc-2.12.so)
| | |
| | ->02.34% (40,265,316B) 0x47188C: rocksdb::(anonymous namespace)::MemTableInserter::Delete(rocksdb::Slice const&) (write_batch.cc:221)
| | ->02.34% (40,265,316B) 0x471D55: rocksdb::WriteBatch::Iterate(rocksdb::WriteBatch::Handler
) const (write_batch.cc:92)
| | ->02.34% (40,265,316B) 0x47225C: rocksdb::WriteBatchInternal::InsertInto(rocksdb::WriteBatch const
, rocksdb::MemTable_, rocksdb::Options const_, rocksdb::DB_, bool) (write_batch.cc:232)
| | ->02.34% (40,265,316B) 0x43A489: rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch_) (db_impl.cc:2856)
| | ->02.34% (40,265,316B) 0x41D027: BinlogQueue::del_range(unsigned long, unsigned long) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | ->02.34% (40,265,316B) 0x41D148: BinlogQueue::log_clean_thread_func(void_) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | ->02.34% (40,265,316B) 0x4E3084F: start_thread (in /lib64/libpthread-2.12.so)
| | ->02.34% (40,265,316B) 0x5CDC94B: clone (in /lib64/libc-2.12.so)
| |
| ->09.79% (168,191,580B) 0x48A7B6: rocksdb::SkipListFactory::CreateMemTableRep(rocksdb::MemTableRep::KeyComparator&, rocksdb::Arena_) (skiplist.h:196)
| ->09.79% (168,191,580B) 0x451E7D: rocksdb::MemTable::MemTable(rocksdb::InternalKeyComparator const&, std::shared_ptrrocksdb::MemTableRepFactory, int, rocksdb::Options const&) (memtable.cc:52)
| ->09.37% (161,061,264B) 0x42F694: rocksdb::DBImpl::MakeRoomForWrite(bool) (db_impl.cc:3136)
| | ->09.37% (161,061,264B) 0x43A052: rocksdb::DBImpl::Write(rocksdb::WriteOptions const&, rocksdb::WriteBatch_) (db_impl.cc:2808)
| | ->08.98% (154,350,378B) 0x41C03B: BinlogQueue::commit() (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.98% (154,350,378B) 0x40D8AD: SSDB::set(Bytes const&, Bytes const&, char) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.98% (154,350,378B) 0x422670: proc_set(Server_, Link_, std::vector<Bytes, std::allocator > const&, std::vector<std::string, std::allocatorstd::string >) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.98% (154,350,378B) 0x41F15F: Server::ProcWorker::proc(ProcJob) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.98% (154,350,378B) 0x42A650: WorkerPool<Server::ProcWorker, ProcJob>::run_worker(void) (in /home/widerplanet/kangmo/refs/ssdb-rocks/ssdb-server)
| | | ->08.98% (154,350,378B) 0x4E3084F: start_thread (in /lib64/libpthread-2.12.so)
| | | ->08.98% (154,350,378B) 0x5CDC94B: clone (in /lib64/libc-2.12.so)
| | |
| | ->00.39% (6,710,886B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->00.42% (7,130,316B) in 1+ places, all below ms_print's threshold (01.00%)
|
->02.23% (38,237,022B) 0x4A3354: rocksdb::ReadBlockContents(rocksdb::RandomAccessFile
, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockContents
, rocksdb::Env_, bool) (format.cc:88)
| ->02.04% (35,115,915B) 0x49792A: rocksdb::BlockBasedTable::ReadFilter(rocksdb::Slice const&, rocksdb::BlockBasedTable::Rep_, unsigned long_) (block_based_table_reader.cc:405)
| | ->02.04% (35,115,915B) 0x499D01: rocksdb::BlockBasedTable::GetFilter(bool) const (block_based_table_reader.cc:822)
| | ->02.04% (35,115,915B) 0x49C5A2: rocksdb::BlockBasedTable::Open(rocksdb::Options const&, rocksdb::EnvOptions const&, std::unique_ptr<rocksdb::RandomAccessFile, std::default_deleterocksdb::RandomAccessFile >&&, unsigned long, std::unique_ptr<rocksdb::TableReader, std::default_deleterocksdb::TableReader >) (block_based_table_reader.cc:321)
| | ->02.04% (35,115,915B) 0x49621F: rocksdb::BlockBasedTableFactory::GetTableReader(rocksdb::Options const&, rocksdb::EnvOptions const&, std::unique_ptr<rocksdb::RandomAccessFile, std::default_deleterocksdb::RandomAccessFile >&&, unsigned long, std::unique_ptr<rocksdb::TableReader, std::default_deleterocksdb::TableReader >
) const (block_based_table_factory.cc:26)
| | ->02.04% (35,115,915B) 0x456DAC: rocksdb::TableCache::FindTable(rocksdb::EnvOptions const&, unsigned long, unsigned long, rocksdb::Cache::Handle__, bool_, bool) (table_cache.cc:76)
| | ->02.04% (35,063,961B) 0x457021: rocksdb::TableCache::NewIterator(rocksdb::ReadOptions const&, rocksdb::EnvOptions const&, unsigned long, unsigned long, rocksdb::TableReader*, bool) (table_cache.cc:104)
| | | ->01.24% (21,363,233B) 0x42E626: rocksdb::DBImpl::FinishCompactionOutputFile(rocksdb::DBImpl::CompactionState
, rocksdb::Iterator_) (db_impl.cc:2004)
| | | | ->01.13% (19,425,164B) 0x436942: rocksdb::DBImpl::DoCompactionWork(rocksdb::DBImpl::CompactionState_, rocksdb::DBImpl::DeletionState&) (db_impl.cc:2378)
| | | | | ->01.13% (19,425,164B) 0x437FCD: rocksdb::DBImpl::BackgroundCompaction(bool_, rocksdb::DBImpl::DeletionState&) (db_impl.cc:1821)
| | | | | ->01.13% (19,425,164B) 0x43DC85: rocksdb::DBImpl::BackgroundCallCompaction() (db_impl.cc:1713)
| | | | | ->01.13% (19,425,164B) 0x47E23F: rocksdb::(anonymous namespace)::PosixEnv::ThreadPool::BGThreadWrapper(void_) (env_posix.cc:1362)
| | | | | ->01.13% (19,425,164B) 0x4E3084F: start_thread (in /lib64/libpthread-2.12.so)
| | | | | ->01.13% (19,425,164B) 0x5CDC94B: clone (in /lib64/libc-2.12.so)
| | | | |
| | | | ->00.11% (1,938,069B) in 1+ places, all below ms_print's threshold (01.00%)
| | | |
| | | ->00.80% (13,700,728B) in 1+ places, all below ms_print's threshold (01.00%)
| | |
| | ->00.00% (51,954B) in 1+ places, all below ms_print's threshold (01.00%)
| |
| ->00.18% (3,121,107B) in 1+ places, all below ms_print's threshold (01.00%)
|
->00.05% (849,868B) in 1+ places, all below ms_print's threshold (01.00%)

make: *** [util/autovector_test.o] Error 1

Error when "make"
g++ -g -I. -I./include -std=gnu++11 -DROCKSDB_PLATFORM_POSIX -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_ATOMIC_PRESENT -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS -DZLIB -DBZIP2 -DHAVE_JEMALLOC -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -Woverloaded-virtual -c util/autovector_test.cc -o util/autovector_test.o
util/autovector_test.cc: In member function ‘void rocksdb::_Test_PerfBench::_Run()’:
util/autovector_test.cc:242:66: error: unable to deduce ‘std::initializer_list<_Tp>&&’ from ‘{0ul, 1ul, 4u, rocksdb::kSize, 16u}’
util/autovector_test.cc:242:66: error: unable to deduce ‘auto’ from ‘’
util/autovector_test.cc:261:66: error: unable to deduce ‘std::initializer_list<_Tp>&&’ from ‘{0ul, 1ul, 4u, rocksdb::kSize, 16u}’
util/autovector_test.cc:261:66: error: unable to deduce ‘auto’ from ‘’
make: *** [util/autovector_test.o] Error 1

and My env:
Ubuntu 13.10; gcc 4.7.3; g++ 4.7.3;

make error "./util/testharness.h:93:3: error:"

I compile rocksdb with (make clean; make), there is an error.

In file included from util/arena_test.cc:12:0:
./util/testharness.h: In instantiation of ‘rocksdb::test::Tester& rocksdb::test::Tester::IsEq(const X&, const Y&) [with X = int; Y = long unsigned int; rocksdb::test::Tester = rocksdb::test::Tester]’:
util/arena_test.cc:66:3:   required from here
./util/testharness.h:93:3: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
cc1plus: all warnings being treated as errors
Linux yancey 3.2.0-57-generic #87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

Redis Comparison

Hi,
I have general question but dont know wher to ask this:
How is RocksDB compared with Redis?

Miss Spelling in README

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Stuctured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

rocksdb links with libsnappy included by RocksDB but includes system header file for snappy

Header + library should either both come from the system or both from RocksDB. Also, the check only passes when the system has header+library for snappy.

Is it time to remove snappy from the RocksDB distro and always use the system versions? That would make this check easier to fix.

# Test whether Snappy library is installed
# http://code.google.com/p/snappy/
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null  <<EOF
  #include <snappy.h>
  int main() {}

EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DSNAPPY"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS ${SNAPPY_LDFLAGS:-./snappy/libs/libsnappy.a}"
fi

compilation error on executing make command

error: ‘res.rocksdb::BatchResult::sequence’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
return res.sequence;
^
cc1plus: all warnings being treated as errors
make: *** [db/db_test.o] Error 1

Unable to build on Ubuntu 13.04

make clean; make results in the following error:
overloaded-virtual -c db/builder.cc -o db/builder.o
In file included from ./db/dbformat.h:18:0,
from db/builder.cc:13:
./util/coding.h: In function ‘uint32_t rocksdb::DecodeFixed32(const char_)’:
./util/coding.h:70:7: error: ‘port’ has not been declared
if (port::kLittleEndian) {
^
./util/coding.h: In function ‘uint64_t rocksdb::DecodeFixed64(const char_)’:
./util/coding.h:84:7: error: ‘port’ has not been declared
if (port::kLittleEndian) {
^
./util/coding.h:94:1: error: control reaches end of non-void function [-Werror=return-type]
}
^
cc1plus: all warnings being treated as errors
make: *** [db/builder.o] Error 1

Any help would be greatly appreciated.

make check fails on range assertion

I've tested this on both OSX 10.9 and an Ubuntu 13.10 server VM. Both appear to build without errors, but make check fails with the following:

==== Test TableTest.ApproximateOffsetOfCompressed
Value 10029 is not in range [2000, 3000]
table/table_test.cc:1154: Assertion failure Between(c.ApproximateOffsetOf("k03"), 2000, 3000)
make: *** [check] Error 1

Can't build inside of openVZ container

Hi,
I am running Debian x64 inside a openVZ container with Proxmox 3.x

When i try to build with "make", it fails:
db/compaction_picker.cc: In member function 'virtual rocksdb::Compaction* rocksdb::UniversalCompactionPicker::PickCompaction(rocksdb::Version_)':
db/compaction_picker.cc:556:41: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'std::vectorrocksdb::FileMetaData_::size_type {aka unsigned int}' [-Werror=format]
cc1plus: all warnings being treated as errors
make: *** [db/compaction_picker.o] Error 1

If I disable "all warnings being treated as errors" in the makefile, it fails when I build the shared library:

db/db_bench.cc:1529:76: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat]
db/db_bench.cc:1536:73: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat]
..
util/options.cc:167:69: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat]
util/options.cc:170:77: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat]
util/options.cc:171:69: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat]
util/options.cc:228:25: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat]
util/options.cc:248:36: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat]
..

Any hints?

Compilation error on Linux Arch

I am trying to compile version 2.6.fb on Linux Arch and have following error. It happens probably due to the latest version of GCC.

$ make
....
g++ -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -g -Wall -Werror -I. -I./include -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -DOS_LINUX -fno-builtin-memcmp -DLEVELDB_PLATFORM_POSIX -DGFLAGS -DZLIB -DBZIP2  -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -std=gnu++0x -Woverloaded-virtual -c db/db_test.cc -o db/db_test.o 
g++ -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -g -Wall -Werror -I. -I./include -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -DOS_LINUX -fno-builtin-memcmp -DLEVELDB_PLATFORM_POSIX -DGFLAGS -DZLIB -DBZIP2  -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -std=gnu++0x -Woverloaded-virtual -c db/dbformat_test.cc -o db/dbformat_test.o 
g++ -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -g -Wall -Werror -I. -I./include -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -DOS_LINUX -fno-builtin-memcmp -DLEVELDB_PLATFORM_POSIX -DGFLAGS -DZLIB -DBZIP2  -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -std=gnu++0x -Woverloaded-virtual -c util/env_test.cc -o util/env_test.o 
g++ -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -g -Wall -Werror -I. -I./include -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -DOS_LINUX -fno-builtin-memcmp -DLEVELDB_PLATFORM_POSIX -DGFLAGS -DZLIB -DBZIP2  -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -std=gnu++0x -Woverloaded-virtual -c util/blob_store_test.cc -o util/blob_store_test.o 
g++ -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -g -Wall -Werror -I. -I./include -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -DOS_LINUX -fno-builtin-memcmp -DLEVELDB_PLATFORM_POSIX -DGFLAGS -DZLIB -DBZIP2  -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -std=gnu++0x -Woverloaded-virtual -c util/filelock_test.cc -o util/filelock_test.o 
db/db_test.cc: In function ‘rocksdb::SequenceNumber rocksdb::ReadRecords(std::unique_ptr<rocksdb::TransactionLogIterator>&, int&)’:
db/db_test.cc:3934:14: error: ‘res.rocksdb::BatchResult::sequence’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
   return res.sequence;
              ^

Major Compactions

I've been doing some testing on a service that uses rocksdb for storage internally and finding that major compactions are sometimes causing outages which last for a few minutes (since major compactions seem to block everything else). Also, when I restart the process, sometimes a major compaction is triggered which causes the db to take many minutes to open.

Wondering where I should start looking to alleviate these issues. Thanks!

Poor bulk load performance with vector memtable

Hi, I seem to be getting really poor ingest performance with the vector memtable. I'm running the 'Bulk load database into L0..." command from Performance Benchmarks, changing only the database path:

bpl=10485760;overlap=10;mcz=2;del=300000000;levels=2;ctrig=10000000; delay=10000000; stop=10000000; wbn=30; mbc=20; mb=1073741824;wbs=268435456; dds=1; sync=0; r=1000000000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=fillrandom --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/mnt/db_bench --sync=$sync --disable_wal=1 --compression_type=zlib --stats_interval=$si --compression_ratio=50 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --memtablerep=vector --use_existing_db=0 --disable_auto_compactions=1 --source_compaction_factor=10000000

After 5 minutes the statistics are:

2013/12/23-00:00:32  ... thread 0: (1000000,5000000) ops and (14495.0,15005.6) ops/second in (68.989201,333.209596) seconds
                               Compactions
Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall Stall-cnt
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0       16     3015 301.5       119         0      3015         0         0      3015        0.0       0.0        25.2        0        0        0        0       16       0.0         0
Uptime(secs): 333.2 total, 69.0 interval
Writes cumulative: 4999999 total, 4999999 batches, 1.0 per batch, 3.87 ingest GB
WAL cumulative: 0 WAL writes, 0 WAL syncs, 0.00 writes per sync, 0.00 GB written
Compaction IO cumulative (GB): 3.87 new, 0.00 read, 2.94 write, 2.94 read+write
Compaction IO cumulative (MB/sec): 11.9 new, 0.0 read, 9.0 write, 9.0 read+write
Amplification cumulative: 0.8 write, 0.8 compaction
Writes interval: 1000000 total, 1000000 batches, 1.0 per batch, 793.5 ingest MB
WAL interval: 0 WAL writes, 0 WAL syncs, 0.00 writes per sync, 0.00 MB written
Compaction IO interval (MB): 793.46 new, 0.00 read, 565.31 write, 565.31 read+write
Compaction IO interval (MB/sec): 11.5 new, 0.0 read, 8.2 write, 8.2 read+write
Amplification interval: 0.7 write, 0.7 compaction
Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 0.000 leveln_slowdown
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown

When I rm -rf that and then run exactly the same command but with --memtablerep=skip_list, ingest is 6x faster over the first five minutes:

2013/12/23-00:08:09  ... thread 0: (1000000,28000000) ops and (76160.0,90472.0) ops/second in (13.130248,309.487848) seconds
                               Compactions
Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall Stall-cnt
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  0       87    15938 1593.8      2438         0     15938         0         0     15938        0.0       0.0         6.5        0        0        0        0       87       0.0         0
Uptime(secs): 309.5 total, 13.1 interval
Writes cumulative: 27999999 total, 27999999 batches, 1.0 per batch, 21.70 ingest GB
WAL cumulative: 0 WAL writes, 0 WAL syncs, 0.00 writes per sync, 0.00 GB written
Compaction IO cumulative (GB): 21.70 new, 0.00 read, 15.56 write, 15.56 read+write
Compaction IO cumulative (MB/sec): 71.8 new, 0.0 read, 51.5 write, 51.5 read+write
Amplification cumulative: 0.7 write, 0.7 compaction
Writes interval: 1000000 total, 1000000 batches, 1.0 per batch, 793.5 ingest MB
WAL interval: 0 WAL writes, 0 WAL syncs, 0.00 writes per sync, 0.00 MB written
Compaction IO interval (MB): 793.46 new, 0.00 read, 549.57 write, 549.57 read+write
Compaction IO interval (MB/sec): 60.4 new, 0.0 read, 41.8 write, 41.8 read+write
Amplification interval: 0.7 write, 0.7 compaction
Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 0.000 leveln_slowdown
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown

I'm running b26dc95 on ubuntu on an i2.xlarge EC2 instance. The db_bench prelude looks like this:

LevelDB:    version 2.0
Date:       Sun Dec 22 23:54:59 2013
CPU:        4 * Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     800 bytes each (40000 bytes after compression)
Entries:    1000000000
RawSize:    778198.2 MB (estimated)
FileSize:   38162231.4 MB (estimated)
Write rate limit: 0
Compression: zlib
Memtablerep: vector
WARNING: Assertions are enabled; benchmarks unnecessarily slow

make check fails

Ran "make" and then "make check"
I am using Ubuntu 12.10

My repo is at:
commit 75df72f
Author: Kai Liu [email protected]
Date: Sat Nov 16 22:59:22 2013 -0800

Change the logic in KeyMayExist()

==== Test DBTest.CompressedCache
db/db_test.cc:1890: failed: 0 > 0
Created bg thread 0x2ada30469700
make: *** [check] Error 1


The ASSERT_GT fails

1885 case 1:
1886 // no block cache, only compressed cache
1887 ASSERT_EQ(options.statistics.get()->getTickerCount(BLOCK_CACHE_MISS),
1888 0);
1889 ASSERT_GT(options.statistics.get()->getTickerCount
1890 (BLOCK_CACHE_COMPRESSED_MISS), 0);
1891 break;

BackupableDBTest.CorruptionsTest failed

==== Test BackupableDBTest.CorruptionsTest
Created bg thread 0x2b414669c700
utilities/backupable/backupable_db_test.cc:588: Assertion failure file_manager_->FileExists(backupdir_ + "/meta/2")
#0 ./backupable_db_test() [0x41b38b] ~basic_string /usr/include/c++/4.8/bits/basic_string.h:539
#1 ./backupable_db_test() [0x41bff4] ~_Test_CorruptionsTest /home/jfan/dev/offlinedb/rocksdb/utilities/backupable/backupable_db_test.cc:522
#2 ./backupable_db_test() [0x4be6eb] rocksdb::test::RunAllTests() /home/jfan/dev/offlinedb/rocksdb/util/testharness.cc:46
#3 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x2b41460f6ed5] ?? ??:0
#4 ./backupable_db_test() [0x4132fe] _start ??:?

make: *** [check] Error 1

on Ubuntu 13.10 after resolved dependencies following instructions of INSTALL.md. Built successfully but unit test failed in above test.

make check Error

Hi,
I get an error when running the test. I have all dependencies installed and didn't saw a error while building.

Any ideas?

My system:
openSUSE 12.3
gcc version 4.7.2 20130108
zlib-devel - 1.2.7-7.1.1
libbz2-1 - 1.0.6-23.1.3
libbz2-devel - 1.0.6-23.1.3
libsnappy1 - 1.1.1-8.1
snappy-devel - 1.1.1-9.2
Latests gflags from github, compiled and installed without problems (https://github.com/schuhschuh/gflags)

git clone https://github.com/facebook/rocksdb.git
make clean; make

"make check" output:

python tools/ldb_test.py
Running testCountDelimDump...
ERunning testCountDelimIDump...
ERunning testDumpLoad...
ERunning testHexPutGet...
ERunning testInvalidCmdLines...
.Running testMiscAdminTask...
ERunning testSimpleStringPutGet...
ERunning testStringBatchPut...
ERunning testTtlPutGet...

E

ERROR: testCountDelimDump (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 150, in testCountDelimDump
self.assertRunOK("batchput x.1 x1 --create_if_missing", "OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_r827wv/testdb batchput x.1 x1 --create_if_missing |grep -v "Created bg thread"

ERROR: testCountDelimIDump (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 159, in testCountDelimIDump
self.assertRunOK("batchput x.1 x1 --create_if_missing", "OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_kuZOEm/testdb batchput x.1 x1 --create_if_missing |grep -v "Created bg thread"

ERROR: testDumpLoad (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 231, in testDumpLoad
"OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_ZHtyAs/testdb batchput --create_if_missing x1 y1 x2 y2 x3 y3 x4 y4 |grep -v "Created bg thread"

ERROR: testHexPutGet (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 178, in testHexPutGet
self.assertRunOK("put a1 b1 --create_if_missing", "OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_34UXEu/testdb put a1 b1 --create_if_missing |grep -v "Created bg thread"

ERROR: testMiscAdminTask (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 324, in testMiscAdminTask
"OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test__57ZVN/testdb batchput --create_if_missing x1 y1 x2 y2 x3 y3 x4 y4 |grep -v "Created bg thread"

ERROR: testSimpleStringPutGet (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 93, in testSimpleStringPutGet
self.assertRunOK("put --create_if_missing x1 y1", "OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_bLX8DS/testdb put --create_if_missing x1 y1 |grep -v "Created bg thread"

ERROR: testStringBatchPut (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 140, in testStringBatchPut
self.assertRunOK("batchput x1 y1 --create_if_missing", "OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_tqn8ny/testdb batchput x1 y1 --create_if_missing |grep -v "Created bg thread"

ERROR: testTtlPutGet (main.LDBTestCase)

Traceback (most recent call last):
File "tools/ldb_test.py", line 207, in testTtlPutGet
self.assertRunOK("put a1 b1 --ttl --create_if_missing", "OK")
File "tools/ldb_test.py", line 82, in assertRunOK
expectedOutput, unexpected)
File "tools/ldb_test.py", line 54, in assertRunOKFull
params, shell=True)
File "tools/ldb_test.py", line 25, in my_check_output
(retcode, cmd))
Exception: Exit code is not 0. It is 1. Command: ./ldb --db=/tmp/ldb_test_AkrY96/testdb put a1 b1 --ttl --create_if_missing |grep -v "Created bg thread"


Ran 9 tests in 0.072s

FAILED (errors=8)
make: *** [ldb_tests] Fehler 1

python bindings

I am planning to code rocksdb bindings for python. So far, I am able to use basic rocksdb functionality in python :)

I think, I will use python ctypes and rocksdb C API. ATM, rocksdb/include/c.h is almost the same with leveldb counterpart (eg. all function names and structs starts with leveldb). Will you keep it that way, or do you plan to change it? or I can maintain and take care of that part happily :)

As a second option, I can directly use rocksdb C++ API, by writing CPython extension. But the outcome won't be portable between different python implementations and will only work with CPython.

What do you think about that?

compilation error on executing make command

/home/gaurav.chaudhary/Downloads/rocksdb-2.6.fb/db/db_bench.cc:2416: undefined reference to google::SetUsageMessage(std::string const&)' /home/gaurav.chaudhary/Downloads/rocksdb-2.6.fb/db/db_bench.cc:2417: undefined reference togoogle::ParseCommandLineFlags(int_, char_*, bool)'
collect2: error: ld returned 1 exit status
make: *
* [db_bench] Error 1

32bit build with -fPIC

I have seen attempts to make rocksdb compiling under 32-bit. I have the following change locally to get rid of a compiling error in util/crc32c.cc

util/crc32c.cc: In function ‘uint32_t rocksdb::crc32c::Extend(uint32_t, const char*, size_t)’:
util/crc32c.cc:328:57: error: PIC register clobbered by ‘ebx’ in ‘asm’
util/crc32c.cc:328:57: error: PIC register clobbered by ‘ebx’ in ‘asm’
util/crc32c.cc:328:57: error: PIC register clobbered by ‘ebx’ in ‘asm’
diff --git a/util/crc32c.cc b/util/crc32c.cc
index bca955a..3a72a04 100644
--- a/util/crc32c.cc
+++ b/util/crc32c.cc
@@ -17,6 +17,7 @@
 #include <nmmintrin.h>
 #endif
 #include "util/coding.h"
+#include <cpuid.h>

 namespace rocksdb {
 namespace crc32c {
@@ -323,10 +324,16 @@ static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
 // Detect if SS42 or not.
 static bool isSSE42() {
   #ifdef __GNUC__
-  uint32_t c_;
-  uint32_t d_;
-  __asm__("cpuid" : "=c"(c_), "=d"(d_) : "a"(1) : "ebx");
-  return c_ & (1U << 20); // copied from CpuId.h in Folly.
+  unsigned int a_;
+  unsigned int b_;
+  unsigned int c_;
+  unsigned int d_;
+
+  if (__get_cpuid(1, &a_, &b_, &c_, &d_) == 0) {
+      return false;
+  }
+  return c_ & (1U << 20);
   #else
   return false;
   #endif

I don't know how portable __get_cpuid is on other compilers. I'm using gcc version 4.7.3. At least I would like to share my fix.

make should create shared library

I ran make and see that it created static library only for the librocks.

There are some systems (e.g. Linux Arch) that strongly prefer to use shared libraries. It would be great that the build system provide shared libs as well.

make: *** [db/table_properties_collector_test.o] Error 1

when i compiled rocksdb(make clean; make), i got the error as followings:

db/table_properties_collector_test.cc:137:15: error: converting to ‘rocksdb::TableProperties::UserCollectedProperties {aka std::unordered_map<std::basic_string<char>, std::basic_string<char> >}’ from initializer list would use explicit constructor ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map(std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type, const hasher&, const key_equal&, const allocator_type&) [with _Key = std::basic_string<char>; _Tp = std::basic_string<char>; _Hash = std::hash<std::basic_string<char> >; _Pred = std::equal_to<std::basic_string<char> >; _Alloc = std::allocator<std::pair<const std::basic_string<char>, std::basic_string<char> > >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type = long unsigned int; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::hasher = std::hash<std::basic_string<char> >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::key_equal = std::equal_to<std::basic_string<char> >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::allocator_type = std::allocator<std::pair<const std::basic_string<char>, std::basic_string<char> > >]’
   return {};
           ^
make: *** [db/table_properties_collector_test.o] Error 1

my environment:

uname -a

Darwin icycrystal4-MBP.local 13.0.0 Darwin Kernel Version 13.0.0: Thu Sep 19 22:22:27 PDT 2013; root:xnu-2422.1.72~6/RELEASE_X86_64 x86_64

g++ -v

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin13.0.0/4.9.0/lto-wrapper
Target: x86_64-apple-darwin13.0.0
Configured with: ../gcc-4.9-20130929/configure --enable-languages=c,c++,fortran
Thread model: posix
gcc version 4.9.0 20130929 (experimental) (GCC)

ll

-rw-r--r--    1 crystal  staff       607 Dec 20 17:40 CONTRIBUTING.md
-rw-r--r--    1 crystal  staff      2431 Dec 20 17:40 INSTALL.md
-rw-r--r--    1 crystal  staff      1646 Dec 20 17:40 LICENSE
-rw-r--r--    1 crystal  staff     17154 Dec 20 17:40 Makefile
-rw-r--r--    1 crystal  staff      1408 Dec 20 17:40 PATENTS
-rw-r--r--    1 crystal  staff      2892 Dec 20 17:40 README
-rw-r--r--    1 crystal  staff        96 Dec 20 17:40 README.fb
-rw-r--r--    1 crystal  staff      2165 Dec 23 11:31 build_config.mk
drwxr-xr-x   10 crystal  staff       340 Dec 23 10:32 build_tools
drwxr-xr-x    4 crystal  staff       136 Dec 20 17:40 coverage
drwxr-xr-x  116 crystal  staff      3944 Dec 23 11:13 db
-rwxr-xr-x    1 crystal  staff   2248372 Dec 23 11:31 db_bench
-rwxr-xr-x    1 crystal  staff   2688068 Dec 23 11:31 db_test
drwxr-xr-x    7 crystal  staff       238 Dec 20 17:40 doc
drwxr-xr-x    6 crystal  staff       204 Dec 20 17:40 hdfs
drwxr-xr-x    3 crystal  staff       102 Dec 20 17:40 helpers
drwxr-xr-x    4 crystal  staff       136 Dec 20 17:40 include
-rw-r--r--    1 crystal  staff  29860872 Dec 23 11:31 librocksdb.a
drwxr-xr-x    3 crystal  staff       102 Dec 20 17:40 linters
drwxr-xr-x   14 crystal  staff       476 Dec 23 11:13 port
-rwxr-xr-x    1 crystal  staff   2094136 Dec 23 11:31 signal_test
drwxr-xr-x   50 crystal  staff      1700 Dec 23 11:11 table
drwxr-xr-x   12 crystal  staff       408 Dec 22 17:09 tools
drwxr-xr-x  128 crystal  staff      4352 Dec 23 11:31 util
drwxr-xr-x    7 crystal  staff       238 Dec 20 17:40 utilities

git pull

Already up-to-date

OSX compilation

Thanks for open sourcing your work.

I think it would be extremely helpful if OSX were a supported platform for the purposes of experimentation.

I'm curious to hear your thoughts as I've begun preparing a patch that takes this approach with some tradeoffs, notably:

  • only microsecond timer accuracy
  • sacrificing some of the posix_fadvise hinting that is supported on other platforms

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.