Coder Social home page Coder Social logo

pelikan's Introduction

Pelikan

Pelikan is Twitter's framework for developing cache services. It is:

  • Fast: Pelikan provides high-throughput and low-latency caching solutions.

  • Reliable: Pelikan is designed for large-scale deployment and the implementation is informed by our operational experiences.

  • Modular: Pelikan is a framework for rapidly developing new caching solutions by focusing on the inherent architectural similarity between caching services and providing reusable low-level components.

License: Apache-2.0 Build Status Fuzz Status Zulip Chat

Website | Chat

Content

Overview

After years of using and working on various cache services, we built a common framework that reveals the inherent architectural similarity among them.

By creating well-defined modules, most of the low-level functionalities are reused as we create different binaries. The implementation learns from our operational experiences to improve performance and reliability, and leads to software designed for large-scale deployment.

The framework approach allows us to develop new features and protocols quickly.

Products

Pelikan contains the following products:

  • pelikan_segcache_rs: a Memcached-like server with extremely high memory efficiency and excellent core scalability. See our NSDI'21 paper for design and evaluation details.
  • pelikan_pingserver_rs: an over-engineered, production-ready ping server useful as a tutorial and for measuring baseline RPC performance
  • momento_proxy: a proxy which allows existing applications to use Momento instead of a Memcache-compatible cache backend.

Legacy

Pelikan legacy codebase can be found within the legacy folder of this project. It is composed of the original C codebase and backend implementations. It remains as a reference, but is not recommended for production deployments.

Features

  • runtime separation of control and data plane
  • predictably low latencies via lockless data structures, worker never blocks
  • per-module config options and metrics that can be composed easily
  • multiple storage and protocol implementations, easy to further extend
  • low-overhead command logger for hotkey and other important data analysis

Building Pelikan

Requirement

  • Rust stable toolchain
  • C toolchain: llvm/clang (>= 7.0)
  • Build tools: cmake (>= 3.2)

Build

git clone https://github.com/twitter/pelikan.git
cd pelikan
cargo build --release

Tests

cargo test

Usage

Using pelikan_segcache_rs as an example, other executables are highly similar.

To get info of the service, including usage format and options, run:

target/release/pelikan_segcache_rs --help

To launch the service with default settings, simply run:

target/release/pelikan_segcache_rs

To launch the service with the sample config file, run:

target/release/pelikan_segcache_rs config/segcache.toml

You should be able to try out the server using an existing memcached client, or simply with telnet.

$ telnet localhost 12321
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
set foo 0 0 3
bar
STORED

Attention: use admin port for all non-data commands.

$ telnet localhost 9999
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
version
VERSION 0.1.0
stats
STAT pid 54937
STAT time 1459634909
STAT uptime 22
STAT version 100
STAT ru_stime 0.019172
...

Configuration

Pelikan is file-first when it comes to configurations, and currently is config-file only. You can create a new config file following the examples included under the config directory.

Community

Stay in touch

Contributing

Please take a look at our community manifesto and coding style guide.

If you want to submit a patch, please follow these steps:

  1. create a new issue
  2. fork on github & clone your fork
  3. create a feature branch on your fork
  4. push your feature branch
  5. create a pull request linked to the issue

Documentation

We have made progress and are actively working on documentation, and will put it on our website. Meanwhile, check out the current material under docs/

License

This software is licensed under the Apache 2.0 license, see LICENSE for details.

pelikan's People

Contributors

0xcdc avatar 1a1a11a avatar aetimmes avatar atn1990 avatar bipinprasad avatar brayniac avatar dependabot[bot] avatar eaddingtonwhite avatar hderms avatar iffyio avatar jschmieg avatar juliaferraioli avatar kevyang avatar lomew avatar michalbiesek avatar noxiouz avatar pbalcer avatar sagar0 avatar seppo0010 avatar slyphon avatar sriganes avatar swlynch99 avatar thinkingfish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pelikan's Issues

partial parsing is unsafe with dbuf

The following scenario will lead to memory errors:

  1. incomplete request arrive at the server, which contains the full header (e.g. set foo 0 0 3\r\n) but not the value (e.g. bar\r\n)
  2. parsing is done for the header, pstate set to REQ_VAL. Parsing stops here. PARSE_EUNFIN returned, request object is kept for more data but otherwise the socket is done with for the moment.
  3. more data arrives on the same socket, forcing a dbuf_double, during which realloc results in a new address for the buffer.
  4. now the bstring.data pointers in each element of keys become dangling pointers.

This probably will take a long time to surface because requests that big are uncommon, and realloc does not always move the base address. But once it does it will be very difficult to debug.

Solution: don't do partial parsing.
Future consideration: We can reintroduce partial parsing if we switch to chained buffers.

Race condition on line 22 of src/time/time.c

I compiled pelikan with clang-7 and -fsanitize=thread without error. I then ran ./pelikan_pingserver and was greeted with the following:

launching server with default values.
set up the ccommon::log module
Set up the ccommon::debug module
create logger with filename (null) cap 0
[Mon May 28 17:34:41 2018][INFO] /root/pelikan/deps/ccommon/src/cc_signal.c:77 override handler for SIGSEGV

**SNIP**

[Mon May 28 17:34:41 2018][INFO] /root/pelikan/deps/ccommon/src/event/cc_epoll.c:90 epoll fd 9 with nevent 1024
name: daemonize                       type: boolean          current: no                   ( default: no                   )

**SNIP**

name: tcp_poolsize                    type: unsigned int     current: 0                    ( default: 0                    )
==================
WARNING: ThreadSanitizer: data race (pid=31778)
  Write of size 4 at 0x0000011318a8 by thread T2:
    #0 time_update /root/pelikan/src/time/time.c:22:9 (pelikan_pingserver+0x4b7786)
    #1 _server_evwait /root/pelikan/src/core/data/server.c:301:5 (pelikan_pingserver+0x4b4a34)
    #2 core_server_evloop /root/pelikan/src/core/data/server.c:310 (pelikan_pingserver+0x4b4a34)

  Previous write of size 4 at 0x0000011318a8 by thread T1:
    #0 time_update /root/pelikan/src/time/time.c:22:9 (pelikan_pingserver+0x4b7786)
    #1 _worker_evwait /root/pelikan/src/core/data/worker.c:286:5 (pelikan_pingserver+0x4b58e4)
    #2 core_worker_evloop /root/pelikan/src/core/data/worker.c:297 (pelikan_pingserver+0x4b58e4)

  Location is global 'now' of size 4 at 0x0000011318a8 (pelikan_pingserver+0x0000011318a8)

  Thread T2 (tid=31781, running) created by main thread at:
    #0 pthread_create /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:965:3 (pelikan_pingserver+0x449da5)
    #1 core_run /root/pelikan/src/core/core.c:29:11 (pelikan_pingserver+0x4b5c93)
    #2 main /root/pelikan/src/server/pingserver/main.c:198:5 (pelikan_pingserver+0x4b2b71)

  Thread T1 (tid=31780, running) created by main thread at:
    #0 pthread_create /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:965:3 (pelikan_pingserver+0x449da5)
    #1 core_run /root/pelikan/src/core/core.c:23:11 (pelikan_pingserver+0x4b5c7a)
    #2 main /root/pelikan/src/server/pingserver/main.c:198:5 (pelikan_pingserver+0x4b2b71)

SUMMARY: ThreadSanitizer: data race /root/pelikan/src/time/time.c:22:9 in time_update
==================

investigate how test discovery and module import should be used for integration test

A few questions here:

  • should we use python3? There's no historical baggage, so the question boils down to if there are new features that would benefit testing.
  • what's the best way to do test discovery? We use unittest.TestCase a little differently from how it is normally done (by loading from a list of files to create individual TestCase's instead of specifying the entire test in-place), so we may or may not be able to use the standard unittest discovery mechanism.
  • what's the best way to import modules? Mostly just not familiar with how python2/3 differ on this aspect.

optimize travis ci

  • remove sudo requirement (so we can use containers)
  • check if caching can be used

de-register writable event on a socket after successful `_worker_event_write()`

Currently write/writable event for a socket is registered only when a call of write would block for the corresponding fd, however, once added, this event is never removed until all events on the fd are, usually upon to connection termination.

This behavior is incorrect, at least for our epoll implementation. Since we do not want to simply add the EPOLLONESHOT flag to event_add_write, which disables all future events upon firing (and the event fired could be either read or write, so that's not particularly useful), we have a couple of options:

  • disable read and add writable event to the fd with EPOLLONESHOT, and re-add read only when the next write is successful. This implies we will not try to process more data coming from the socket until we have flushed out previous responses for this connection;
  • add return status to _worker_event_write(), and upon a successful write, modify the events associated with the filedescriptor to remove write-related flags.

end-to-end tests for pelikan servers

We need some end-to-end test cases and harness to make sure the servers behave as expected. Initially looking into the most basic, deterministic scenarios. In these cases a sequence of request/response pairs will be used, the requests are sent to the server and the responses have to match exactly.

Since we don't have an official client library in Pelikan just yet ( @kevyang wrote one for memcached, but it needs a bit of polishing which I will do soon), we can start with sending serialized strings and treating responses the same way. Under this approach, the only part in the response that needs attention is the cas value for gets commands. Exact values can be ensured if we locally track the number of writes sent to the server; alternatively we can use regex to parse out the responses.

I'm proposing separating the test cases from the harness to maximize readability and flexibility.

Test cases can be stored in a file with some delimiter between individual request/response pairs, e.g.:

>>>
set foo 0 0 3
bar
<<<
STORED

>>>
gets foo
<<<
VALUE foo 0 3 1
bar
END

And test harness can read from such a file, as well as a server config file to learn the whereabouts of the server, and play the tests against it. I've looked at test harness written in Perl, Tcl, Bash and Python, and I lean toward Python for readability, given integration tests don't have particular performance requirements.
Perl, Tcl and Python are pretty much universally installed on all POSIX compatible systems (not to mention bash), so availability is not a problem for any of them.
Anything else that's worth looking at, or is there an argument for/against any of them?

/cc @seppo0010

binaries created in source directory

The pelikan_twemcache and other binaries are left in SRCDIR/_bin rather than BUILDDIR/_bin, as would be expected and what the documentation describes.

Add peer to klog

Currently, we have not implemented having the peer name logged in klog.

choose a linter/formatting tool

A few things to consider:

  • availability across all platforms we plan to support;
  • customized rules;
  • IDE/editor integration;
  • build integration.

implement stats

Also implement parse_reply and add reply tests for protocol/admin once stats are implemented.

slab unittest is failing

make test fails with the following information in slab.log

/Users/yao/workspace/pelikan/test/storage/slab/check_slab.c:108:F:basic item api:test_insert_large:0: item_insert not OK - return status 1

flaky test in travis?

$ cd ../test && python -m unittest discover

F

======================================================================

FAIL: test_miss (integration.test_twemcache_basic.TwemcacheBasicTest)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/twitter/pelikan/test/integration/test_twemcache_basic.py", line 7, in test_miss

    self.assertMetrics(('request_parse', 1), ('get', 1), ('get_key_miss', 1))

  File "/home/travis/build/twitter/pelikan/test/integration/base.py", line 40, in assertMetrics

    stats.get(k, None),

AssertionError: Expected request_parse to be 1, got 0 instead

----------------------------------------------------------------------

Ran 1 test in 0.005s

FAILED (failures=1)

The command "cd ../test && python -m unittest discover" exited with 1.

Done. Your build exited with 1.

This occurred for all tests last night, went away this morning, and has now returned.

Change how integration test harness ingest test cases

Currently we need blank lines to separate individual commands inside the same test sequence, and executes each command following the req-rsp-stat order.

It would be more flexible to allow a single stream of mixed req-rsp-stat lines, with empty lines being optional and for visual separation only. For example, we can check stats after sending half of a command and before receiving any responses. This requires changing how tests are loaded, and how the harness executes individual lines.

Because any tests that work with the existing harness will work in the improved version, we can continue to add more test sequences at the meantime.

fold address information into channel

So each connection will have an addr member, this in turn can be used for issue #37 and we can simplify the signature of channel_open_fn to just include a pointer to the channel object.

No tabs should be used for structure fields alignment

The following note in our coding style contradicts the earlier no-literal-tabs policy.

Try to make the structure readable by aligning the member names using either tabs or spaces.
Use only one space or tab if it suffices to align at least ninety percent of the member names."

Is the tabs thingy added here just by mistake? Is it ok to change it so that we follow no-tabs policy everywhere?

Refactor worker and admin to move out implementation specific code

Most of the code in core/worker.c::_post_read and core/admin.c::_admin_post_read is specific to a cache implementation. The current code works for twemcache (and slimcache) but if someone wants to create a new implementation (say, pingserver) or support a new protocol (say, redis), the current implementation, I feel, is not ideal.

A few examples:
In the pingserver, I don't care about

  • REQ_GET/REQ_GETS/REQ_QUIT
  • request/response 'borrowing'
  • request cardinality

It might be better to move these post-read-processings to specific implementation directories.

fetching large amount of data

Hi,
I used twemcache for caching large amount of data(consisting of approximately 111000 characters), however I cannot fetch more than 80000 characters when I try to get the value back.

Create build targets

Current thoughts:
debug:

  • -DHAVE_ASSERT_LOG=on
  • -DHAVE_ASSERT_PANIC=on
  • -DHAVE_LOGGING=on
  • -DHAVE_STATS=on
  • -ggdb3 -O0

release:

  • -DHAVE_ASSERT_LOG=on
  • -DHAVE_ASSERT_PANIC=off
  • -DHAVE_LOGGING=on
  • -DHAVE_STATS=on
  • -ggdb3 -O2

bare:

  • -DHAVE_ASSERT_LOG=off
  • -DHAVE_ASSERT_PANIC=off
  • -DHAVE_LOGGING=off
  • -DHAVE_STATS=off
  • -ggdb3 -O2

Fix build in Linux

Linking C executable ../../../_bin/pelikan_slimcache
../../core/libcore.a(admin.c.o): In function `_admin_event':
/home/vagrant/pelikan/src/core/admin.c:237: undefined reference to `op_destroy'
../../core/libcore.a(admin.c.o): In function `_admin_post_read':
/home/vagrant/pelikan/src/core/admin.c:133: undefined reference to `parse_op'
/home/vagrant/pelikan/src/core/admin.c:151: undefined reference to `reply_create'
/home/vagrant/pelikan/src/core/admin.c:185: undefined reference to `compose_rep'
/home/vagrant/pelikan/src/core/admin.c:191: undefined reference to `op_reset'
/home/vagrant/pelikan/src/core/admin.c:192: undefined reference to `reply_destroy_all'
/home/vagrant/pelikan/src/core/admin.c:162: undefined reference to `reply_create'
/home/vagrant/pelikan/src/core/admin.c:202: undefined reference to `reply_destroy_all'
/home/vagrant/pelikan/src/core/admin.c:178: undefined reference to `compose_rep'
/home/vagrant/pelikan/src/core/admin.c:120: undefined reference to `op_create'
collect2: error: ld returned 1 exit status

To fork or not to fork?

Shall we each clone the repo under our own account, and submit pull requests from there, or shall we continue to create branches under the main repo?

Only those with write-access to the main repo can create branches, which means the workflow is different for those who have this access and those who don't. On the other hand, it is easier to see what branches are outstanding when they are all in one place, before a PR is created, and it doesn't require we modify how we setup our local repo yet again.

memory leak with buf_sock

For every new connection 2 buf_sock objects are created, instead of 1. This actually has been observed in production, but it takes so long for the additional objects to become a memory problem that it remained undetected till recently.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.