Coder Social home page Coder Social logo

felis's Introduction

Papers

Caracal: Contention Management with Deterministic Concurrency Control - SOSP'21 Paper Slides Talk Long Talk

Build

If you on the CSL cluster, you don't need to install any dependencies. Otherwise, you need to install Clang 8 manually.

  1. First, run the configure script
./configure

to download the build tool buck and generate local config file for it.

  1. Now you can use buck to build. You can either use the buck.pex downloaded by the script, or if you're on the CSL cluster, buck installed in the environment.

The command

buck build db

will generate the debug binary to buck-out/gen/db#debug. If you need optimized build, you can run

buck build db_release

to generate the release binary to buck-out/gen/db#release.

Run

Setting Things Up

Felis need to use HugePages for memory allocation (to reduce the TLB misses). Common CSL cluster machines should have these already setup, and you may skip this step. The following pre-allocates 400GB of HugePages. You can adjust the amount depending on your memory size. (Each HugePage is 2MB by default in Linux.)

echo 204800 > /proc/sys/vm/nr_hugepages

Run the Controller

To run the workload, the felis-controller is needed. It is in a separate git repository. Please check the README in felis-controller.

First, you need to enter the config for the nodes and controller, in config.json in felis-controller.

Then, run the controller. We usually run the controller on localhost when doing single-node experiments, and on a separate machine when doing distributed experiments. It doesn't really matter though.

As long as the configuration doesn't change, you can let the controller run all the time.

Start the database on each node

Once the controller is initialized, on each node you can run:

buck-out/gen/db#release -c 127.0.0.1:<rpc_port> -n host1 -w tpcc -Xcpu16 -Xmem20G -XVHandleBatchAppend -XVHandleParallel

-c is the felis-controller IP address (<rpc_port> and <http_port> below are specified in config.json as well), -n is the host name for this node, and -w means the workload it will run (tpcc/ycsb).

-X are for the extended arguments. For a list of -X, please refer to opts.h. Mostly you will need -Xcpu and -Xmem to specify how many cores and how much memory to use. (Currently, number of CPU must be multiple of 8. That's a bug, but we don't have time to fix it though.)

Start running the workload

The node will initialize workload dataset and once they are idle, they are waiting for further commands from the controller. When all of them finish initialization, you can tell the controller that everybody can proceed:

curl localhost:<http_port>/broadcast/ -d '{"type": "status_change", "status": "connecting"}'

Upon receiving this, the controller would broadcast to every node to start running the benchmark. When it all finishes, you can also use the following commands to safely shutdown. (Optional)

curl localhost:<http_port>/broadcast/ -d '{"type": "status_change", "status": "exiting"}'

Logs

If you are running the debug version, the logging level is "debug" by default, otherwise, the logging level is "info". You can always tune the debugging level by setting the LOGGER environmental variable. Possible values for LOGGER are: trace, debug, info, warning, error, critical, off.

The debug level will output to a log file named dbg-hostname.log where hostname is your node name. This is to prevent debugging log flooding your screen.

Development

ccls language server

We use ccls https://github.com/MaskRay/ccls to help our development. ccls is a C/C++/ObjC language server supporting cross references, hierarchies, completion and semantic highlighting. It is not essential for running the experiment.

If you have run the ./configure script, it would generate a .ccls configuration file for you. ccls supports Emacs, Vim and VSCode.

Mike has a precompiled ccls binary on the cluster machine. You can download at http://fs.csl.utoronto.ca/~mike/ccls.

Zhiqi has some experience with using ccls with VSCode.

Test

FIXME: Unit tests are broken now. You may skip this section.

Use

./buck build test

to build the test binary. Then run the buck-out/gen/dbtest to run all unit tests. We use google-test. To run partial test, please look at https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#running-a-subset-of-the-tests .

felis's People

Contributors

farnasirim avatar mikeandmore avatar rayzgz avatar se9fault avatar shujianqian avatar xyene avatar zero747 avatar

Stargazers

 avatar

Forkers

rayzgz

felis's Issues

BatchPcCnt != 0 or segfault

Commit

on commit c64bd90

Config

-c 127.0.0.1:3148 -n host1 -w ycsb -XMaxNodeLimit1 -XOutputDir/mnt/home/rayzhang/workspace/felis/output -Xcpu32 -Xmem20G -XVHandleBatchAppend -XPriorityTxn -XStripBatched1 -XPercentagePriorityTxn20 -XEpochSize100000 -XNrEpoch20 -XSIDRowWTS -XRowRTS -XConflictRowRTS -XSIDRowRTS -XStripPriority32 -XSIDBitmap

Bug Symptom

  • OnXXXComplete is sometimes called when batch/priority piece counter is not zero, resulting in the program to abort
  • Segfault, pending further investigation
  • Potentially reset() called when sched_pol is not empty, haven't been able to recreate yet

Potential Fix

  • Understand how the phase switch mechanism works currently and investigate whether the aforementioned problem can happen

Reset() abort issue

If the interval between priority txn is tuned properly, then sometimes the following function will abort.

  // routine_sched.cc
  void Reset() override {
    abort_if(len > 0, "Reset() called, but len {} > 0", len);
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.