Coder Social home page Coder Social logo

haochenpan / rabia Goto Github PK

View Code? Open in Web Editor NEW
37.0 5.0 13.0 60.2 MB

Rabia: Simplifying State-Machine Replication Through Randomization (SOSP 2021)

License: Apache License 2.0

Python 15.25% Shell 27.11% Go 54.02% Dockerfile 0.57% Coq 3.05%
distributed-systems reliability fault-tolerance formal-verification state-machine-replication consensus

rabia's Introduction

Rabia

Introduction

We introduce Rabia, a simple and high performance framework for implementing state-machine replication (SMR) within a datacenter. The main innovation of Rabia is in using randomization to simplify the design. Rabia provides the following two features: (i) It does not need any fail-over protocol and supports trivial auxiliary protocols like log compaction, snapshotting, and reconfiguration, components that are often considered the most challenging when developing SMR systems; and (ii) It provides high performance, up to 1.5x higher throughput than the closest competitor (i.e., EPaxos) in a favorable setup (same availability zone with three replicas) and is comparable with a larger number of replicas or when deployed in multiple availability zones.

Our SOSP paper, "Rabia: Simplifying State-Machine Replication Through Randomization," describes Rabia's design and evaluations in detail (SOSP Artifact Review Summary) and earns three badges: artifact available, artifact evaluated, and artifact reproduced.

Project Keywords:

  • state-machine replication (SMR), consensus, and formal verification

CCS Concepts:

  • Computer systems organization → Dependable and fault-tolerant systems and networks;
  • Computing methodologies → Distributed algorithms.

Repository structure

  • deployment, internal, roles, and main.go: Rabia's implementation in Go and the project's auxiliary code
  • proofs: proof scripts for the core weak Multivalued consensus part of the Rabia protocol.
  • redis-raft: redis-raft related code and instructions
  • epaxos: compiled binaries of Paxos and EPaxos for cloudlab machines from various branches in (E)Paxos and (E)Paxos-NP codebases + scripts to run them
  • docs: documentations, see below

Documentations

Paper errata -- Errata of our paper

How to install and run Rabia -- install and run Rabia on a single machine or a cluster of machines

How to read Rabia's codebase -- an introduction to Rabia's implementation

Package-level comments -- contains all Go packages' comments, some design assumptions and rationales, which can be served as an in-depth guide to this codebase.

Rabia's Roadmap and ToDos -- for overarching objectives and and granular items

Developer notes -- contains FAQs and some miscellaneous hints for developers

Main contributors

Lewis Tseng, Joseph Tassarotti, Haochen Pan, Jesse Tuğlu, Neo Zhou, Tianshu Wang, Yicheng Shen, Andrew Chapman and Matthew Abbene -- Boston College

Roberto Palmieri -- Lehigh University

Zheng Xiong -- The University of Texas at Austin

rabia's People

Contributors

angelmotta avatar haochenpan avatar jtassarotti avatar ltseng3 avatar ulgut avatar yichengshen avatar zhouaea avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rabia's Issues

Definition of mid80Throughput, in the presence of client side batching

I ran an open loop experiment, as follows.

  1. Clone the repo
  2. Modified the NClients=3, ClientBatchSize=10 and Rabia_ClosedLoop=false in rabia/deployment/profile/profile0.sh
  3. The following is the output
<nil> WRN  Client Conn.=1 Interval not-NULL Slots=26125 Interval throughput (cmd/sec)=65313 NULL Slots=1 Normal Slots=295054 Svr Id=0 Unmatched Slots=1
<nil> WRN  Client Conn.=1 Interval not-NULL Slots=26115 Interval throughput (cmd/sec)=65288 NULL Slots=1 Normal Slots=295191 Svr Id=1 Unmatched Slots=0
<nil> WRN  Client Conn.=1 Interval not-NULL Slots=26096 Interval throughput (cmd/sec)=65240 NULL Slots=1 Normal Slots=295204 Svr Id=2 Unmatched Slots=0
<nil> WRN  ClientId=1 TotalRecv=1000000 TotalSent=1000000 avgLat=4616435 maxLat=5079694 maxLatIdx=63417 mid80Dur=43.913168199 mid80End=1654616177153984932 mid80RecvTimeDur=39.018847504 mid80Requests=80000 mid80Start=1654616133240816744 mid80Throughput (cmd/sec)=1821.7770040518685 mid80Throughput2 (cmd/sec)=2050.2912084166205 minLat=4227477 p50Lat=4620806 p95Lat=4840542 p99Lat=4992735 recvEnd=1654616181792460398 sendEnd=1654616177487489504 sendStart=1654616133037995977
<nil> WRN  ClientId=0 TotalRecv=1000000 TotalSent=1000000 avgLat=4625584 maxLat=5078731 maxLatIdx=63456 mid80Dur=43.936794374 mid80End=1654616177153773840 mid80RecvTimeDur=39.018529197 mid80Requests=80000 mid80Start=1654616133216979462 mid80Throughput (cmd/sec)=1820.7973781387368 mid80Throughput2 (cmd/sec)=2050.3079343685495 minLat=4229113 p50Lat=4629064 p95Lat=4850382 p99Lat=5004433 recvEnd=1654616181792315112 sendEnd=1654616177451300230 sendStart=1654616133037899793
<nil> WRN  ClientId=2 TotalRecv=1000000 TotalSent=1000000 avgLat=4608395 maxLat=5080453 maxLatIdx=62624 mid80Dur=43.883519311 mid80End=1654616177154144261 mid80RecvTimeDur=39.018827943 mid80Requests=80000 mid80Start=1654616133270624963 mid80Throughput (cmd/sec)=1823.0078456799365 mid80Throughput2 (cmd/sec)=2050.2922362728746 minLat=4230412 p50Lat=4618728 p95Lat=4828899 p99Lat=4983292 recvEnd=1654616181810146751 sendEnd=1654616177521779356 sendStart=1654616133038982731

The replica side throughput shows a value close to 65k, where as in the client side it shows 1.8k. This difference comes from the fact that client side throughput is calculated as len(array)/time, so if we multiply the client side throughput by 10, we get more or less equal result.

Is my understanding correct?

Thank you

Error Trying to run Rubia on a single VM - note: module requires Go 1.17

Hello
I am an undergraduate CS Student, studying and reading your paper. I was trying to run Rubia on a single VM following your directions but It fails in step number 3, as I see it throws the error in the "build_binary" step of the single.sh file

  1. download the project to the default path (see section 1.3)
  2. install Rabia and its dependencies
  3. check the installation:
    cd ./run
    . single.sh # the default parameter runs a Rabia cluster on a single machine for 20 seconds

For the output looks like the sys dependency use a higher go version Go.1.17 as I see in the go.mod of that dependency and Rabia use Go.1.15 according to the install.sh file

What could I do in this case? I am not sure if I should indicate a higher go version in the install.sh or Rabia (1.17) or what is the recommended way to solve this issue?

Thanks in advanced for your assistance.

The complete output error:
golang.org/x/sys/unix
../../../../pkg/mod/golang.org/x/[email protected]/unix/syscall.go:83:16: undefined: unsafe.Slice
../../../../pkg/mod/golang.org/x/[email protected]/unix/syscall_linux.go:1018:20: undefined: unsafe.Slice
../../../../pkg/mod/golang.org/x/[email protected]/unix/syscall_linux.go:2289:9: undefined: unsafe.Slice
../../../../pkg/mod/golang.org/x/[email protected]/unix/syscall_unix.go:118:7: undefined: unsafe.Slice
../../../../pkg/mod/golang.org/x/[email protected]/unix/sysvshm_unix.go:33:7: undefined: unsafe.Slice
note: module requires Go 1.17
chmod: cannot access '/home/angelinux/go/src/rabia/rabia': No such file or directory
3. start all servers
-bash: /home/angelinux/go/src/rabia/rabia: No such file or directory
-bash: /home/angelinux/go/src/rabia/rabia: No such file or directory
-bash: /home/angelinux/go/src/rabia/rabia: No such file or directory
-bash: /home/angelinux/go/src/rabia/rabia: No such file or directory
[1]- Exit 127 RC_Role=svr RC_Index=${idx} RC_SvrIp=${SvrIps[$idx]} RC_PPort=${SvrPPorts[$idx]} RC_NPort=${SvrNPorts[$idx]} RC_Peers=${RC_Peers[@]} ${RCFolder}/rabia
[2] Exit 127 RC_Role=svr RC_Index=${idx} RC_SvrIp=${SvrIps[$idx]} RC_PPort=${SvrPPorts[$idx]} RC_NPort=${SvrNPorts[$idx]} RC_Peers=${RC_Peers[@]} ${RCFolder}/rabia
[3] Exit 127 RC_Role=svr RC_Index=${idx} RC_SvrIp=${SvrIps[$idx]} RC_PPort=${SvrPPorts[$idx]} RC_NPort=${SvrNPorts[$idx]} RC_Peers=${RC_Peers[@]} ${RCFolder}/rabia
-bash: /home/angelinux/go/src/rabia/rabia: No such file or directory
Traceback (most recent call last):
File "/home/angelinux/go/src/rabia/deployment/analysis/analysis.py", line 256, in
print_statistics()
File "/home/angelinux/go/src/rabia/deployment/analysis/analysis.py", line 211, in print_statistics
for param in get_experiments(log_folder):
File "/home/angelinux/go/src/rabia/deployment/analysis/analysis.py", line 26, in get_experiments
files = [f for f in listdir(log_folder_path) if isfile(join(log_folder_path, f)) and "log" in f]
FileNotFoundError: [Errno 2] No such file or directory: '/home/angelinux/go/src/rabia/logs'
[4]+ Exit 127 RC_Role=cli RC_Index=${idx} RC_Proxy=${RC_Proxies[$proxy_idx]} ${RCFolder}/rabia

What is the use of NetworkBatchSize and NetworkBatchTimeout in internal/config.go?

config.go defines two variables; (1) NetworkBatchSize and (2) NetworkBatchTimeout

Except for the logging, I cannot see any place these variables are used. What is the purpose of them?

In the Rabia code base. there are two instances of batching; (1) client side batching and (2) replica side batching of client batches. Is my understanding correct? If so, what purpose does NetworkBatch serve?

Thanks

How to specify arrival rates in the client?

I am trying to do a performance analysis of Rabia, and wanted to do an open loop test with different arrival rates.

In roles/client/client.go in OpenLoopClient(), before sending a request in sendOneRequest(i int), the client sleeps for a fixed inter-arrival-time. However, this is different from the standard Poisson client arrivals, so I would like to know if its possible to send Poisson open loop requests; or is there a way I can modify the code to obtain that?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.