Coder Social home page Coder Social logo

sz-npe / caas-lsm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from asu-idi/caas-lsm

0.0 0.0 0.0 20.72 MB

[SIGMOD '24] CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure

License: GNU General Public License v2.0

Shell 1.01% C++ 82.82% Python 1.63% Perl 1.04% C 2.52% Java 9.66% Assembly 0.06% PowerShell 0.06% Makefile 0.74% BitBake 0.03% CMake 0.42% Dockerfile 0.01%

caas-lsm's Introduction

[SIGMOD'24] CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure

Paper

CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated Infrastructure

Qiaolin Yu, Chang Guo, Jay Zhuang, Viraj Thakkar, Jianguo Wang, Zhichao Cao.

ACM Conference on Management of Data (SIGMOD 2024), Research Track Full Paper.

Dependencies

Baselines

  • Notice that multiple CSAs should bind with one Control Plane.

Rocks-Local

Search the repository for this code and delete it.

tmp_options.compaction_service = std::make_shared<MyTestCompactionService>(
      dbname, compaction_options, compaction_stats, remote_listeners,
      remote_table_properties_collector_factories);

CaaS-LSM

  • Config the address of Control Plane, CSA, and HDFS server in include/rocksdb/options.h
  • Build and compile
  • Run Control Plane and CSA
cd $build
./procp_server #run Control Plane server
./csa_server #run CSA server

Disaggre-RocksDB

  • Config the address of CSA, and HDFS server in include/rocksdb/options.h
  • Build and compile
  • Run CSA
git checkout disaggre-rocksdb
cd $build
./csa_server # The name is the same, but the function of CSA is different with that of CaaS-LSM

Terark-Local

Terark-Native

  • checkout branch to terark-native
sudo apt-get install libaio-dev
  • Before building, open WITH_TOOLS and WITH_TERARK_ZIP, it's neccessary for remote compaction mode.
./build.sh
  • Use remote_compaction_worker_101

Terark-CaaS

  • Copy the code in db/compaction/remote_compaction of CaaS-LSM, including procp_server.cc, csa_server.cc, utils.h, compaction_service.proto
  • Change CompactionArgs to string, since TerarkDB uses encoded string in network transmit.
  • Use the same way in CaaS-LSM to start.

To evaluate the baselines

Run db_bench

./db_bench --benchmarks="fillrandom" --num=4000000 --statistics --threads=16 --max_background_compactions=8 --db=/xxx/xxx  --statistics

OPS comparison

ops

P99 latency comparison

p99

Conclusion

The OPS of CaaS-LSM surpassed Disaggre-RocksDB by up to 61%, and TerarkDB-CaaS surpassed native TerarkDB up to 42%.

Test CaaS-LSM in distributed applications

Test Nebula

Test Kvrocks

  • Clone Kvrocks at https://github.com/apache/incubator-kvrocks

  • Before build:

    • modify this part in "cmake/rocksdb.cmake" to switch the branch of the default RocksDB to this repository
    FetchContent_DeclareGitHubWithMirror(rocksdb
       facebook/rocksdb v7.8.3
       MD5=f0cbf71b1f44ce8f50407415d38b9d44
     )
    
  • Build: ./x.py build

  • Single mode:

    • build/kvrocks -c kvrocks.conf
  • Cluster mode:

    • Based on kvrocks controller https://github.com/KvrocksLabs/kvrocks_controller.git with commit df83752849ef41ce91037ca5c9cc6c670a480d56
    • Dependencies: etcd https://etcd.io/docs/v3.5/install/
    • Build kvrocks controller: make
    • Start controller server: ./_build/kvrocks-controller-server -c ./config/config.yaml
    • A fast way to build cluster: python scripts/e2e_test.py
    • Check cluster status: ./_build/kvrocks-controller-cli -c ./config/kc_cli_config.yaml
    • modify kvrocks.conf: port(e.g., 30001-30006), cluster-enabled(yes), dir /tmp/kvrocks(/tmp/kvrocks1-6)

Evaluation Results

OPS of Nebula

nebula_sche_ops

Latency of Nebula

nebula_sche_latency

Conclusion

Nebula-Random-Sche has a total OPS of 5,669 and an average latency of 526 ms, which are about 86% lower and 6$X$ higher than Nebula-CaaS-LSM respectively.

OPS of Kvrocks

kvrocks_ops_new_2

Latency of Kvrocks

kvrocks_latency_new_2

Conclusion

With better scheduling of compaction jobs in Kvrocks-CaaS, the overall OPS is about 20% better than that of Kvrocks-Local, and the average latency improves by 30%. In the cross-datacenter scenario, according to the log file, Kvrocks-Local experiences compaction jobs piled and a severe write slowdown after intensive compaction starts. In contrast, Kvrocks-CaaS runs smoothly and improves the overall OPS by 28% and P99 latency by 65%.

caas-lsm's People

Contributors

qiaolin-yu avatar zhichao-cao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.