twitter / pelikan Goto Github PK

View Code? Open in Web Editor NEW

1.9K 51.0 173.0 11.26 MB

Pelikan is Twitter's unified cache backend

License: Apache License 2.0

CMake 2.77% C 61.34% Shell 0.56% Python 0.92% Rust 34.35% Dockerfile 0.06%

pelikan's Introduction

Pelikan

Pelikan is Twitter's framework for developing cache services. It is:

Fast: Pelikan provides high-throughput and low-latency caching solutions.
Reliable: Pelikan is designed for large-scale deployment and the implementation is informed by our operational experiences.
Modular: Pelikan is a framework for rapidly developing new caching solutions by focusing on the inherent architectural similarity between caching services and providing reusable low-level components.

Website | Chat

Content

Overview
- Products
- Features
Build
Usage
Community
- Stay in touch
- Contributing
Documentation
License

Overview

After years of using and working on various cache services, we built a common framework that reveals the inherent architectural similarity among them.

By creating well-defined modules, most of the low-level functionalities are reused as we create different binaries. The implementation learns from our operational experiences to improve performance and reliability, and leads to software designed for large-scale deployment.

The framework approach allows us to develop new features and protocols quickly.

Products

Pelikan contains the following products:

pelikan_segcache_rs: a Memcached-like server with extremely high memory efficiency and excellent core scalability. See our NSDI'21 paper for design and evaluation details.
pelikan_pingserver_rs: an over-engineered, production-ready ping server useful as a tutorial and for measuring baseline RPC performance
momento_proxy: a proxy which allows existing applications to use Momento instead of a Memcache-compatible cache backend.

Legacy

Pelikan legacy codebase can be found within the legacy folder of this project. It is composed of the original C codebase and backend implementations. It remains as a reference, but is not recommended for production deployments.

Features

runtime separation of control and data plane
predictably low latencies via lockless data structures, worker never blocks
per-module config options and metrics that can be composed easily
multiple storage and protocol implementations, easy to further extend
low-overhead command logger for hotkey and other important data analysis

Building Pelikan

Requirement

Rust stable toolchain
C toolchain: llvm/clang (>= 7.0)
Build tools: cmake (>= 3.2)

Build

git clone https://github.com/twitter/pelikan.git
cd pelikan
cargo build --release

Tests

cargo test

Usage

Using pelikan_segcache_rs as an example, other executables are highly similar.

To get info of the service, including usage format and options, run:

target/release/pelikan_segcache_rs --help

To launch the service with default settings, simply run:

target/release/pelikan_segcache_rs

To launch the service with the sample config file, run:

target/release/pelikan_segcache_rs config/segcache.toml

You should be able to try out the server using an existing memcached client, or simply with telnet.

$ telnet localhost 12321
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
set foo 0 0 3
bar
STORED

Attention: use admin port for all non-data commands.

$ telnet localhost 9999
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
version
VERSION 0.1.0
stats
STAT pid 54937
STAT time 1459634909
STAT uptime 22
STAT version 100
STAT ru_stime 0.019172
...

Configuration

Pelikan is file-first when it comes to configurations, and currently is config-file only. You can create a new config file following the examples included under the config directory.

Community

Stay in touch

Join our project chat on for questions and discussions
Follow us on Twitter: @pelikan_cache
Visit http://pelikan.io

Contributing

Please take a look at our community manifesto and coding style guide.

If you want to submit a patch, please follow these steps:

create a new issue
fork on github & clone your fork
create a feature branch on your fork
push your feature branch
create a pull request linked to the issue

Documentation

We have made progress and are actively working on documentation, and will put it on our website. Meanwhile, check out the current material under docs/

License

This software is licensed under the Apache 2.0 license, see LICENSE for details.

pelikan's People

Contributors

Stargazers

Watchers

Forkers

thinkingfish brayniac sagar0 jimjag bhanug goldengoat kevyang wonism minchang misanyrock michft tchen0123 adityamarella yetone alihalabyah yanhe huayl yodamaster bedeedidiong lomew rugby110 nitingithub stxml seppo0010 cmeon minhlongdo alexxnica kryndex zhangjinde zmyer ericdlg harishks kelvinni synecdoche myjeffxie iffyio slyphon dkuspawono ramanan-natarajan servicefoundation andyrudoff sriganes pbalcer octorocka bitrio kfilipek michalbiesek pmem karpiop jschmieg tomekidczak97 microhexhq diegopacheco vinodkumarlogan swlynch99 guoanwu foreverhui danluu skrinikov noxiouz moneytech adamkorcz wldp satanson achx90 doytsujin mailmahee bryanvaz strogo db-extreme paulnice jyang1997 cyberflamego cassyunknown shbakram bssrdf wandaitzuchen karllolee bipinprasad classicvalues victoryang00 reksmeysrey victormwenda hderms ventfang prateekvishnu mayhemheroes samkenxstream aetimmes gerhobbelt amyvmiwei sonic182 atn1990 junchengytwitter junmingzhao42 stpetersintel beinan tw-amanm albsen jacques

pelikan's Issues

partial parsing is unsafe with dbuf

The following scenario will lead to memory errors:

incomplete request arrive at the server, which contains the full header (e.g. set foo 0 0 3\r\n) but not the value (e.g. bar\r\n)
parsing is done for the header, pstate set to REQ_VAL. Parsing stops here. PARSE_EUNFIN returned, request object is kept for more data but otherwise the socket is done with for the moment.
more data arrives on the same socket, forcing a dbuf_double, during which realloc results in a new address for the buffer.
now the bstring.data pointers in each element of keys become dangling pointers.

This probably will take a long time to surface because requests that big are uncommon, and realloc does not always move the base address. But once it does it will be very difficult to debug.

Solution: don't do partial parsing.
Future consideration: We can reintroduce partial parsing if we switch to chained buffers.

Race condition on line 22 of src/time/time.c

I compiled pelikan with clang-7 and -fsanitize=thread without error. I then ran ./pelikan_pingserver and was greeted with the following:

launching server with default values.
set up the ccommon::log module
Set up the ccommon::debug module
create logger with filename (null) cap 0
[Mon May 28 17:34:41 2018][INFO] /root/pelikan/deps/ccommon/src/cc_signal.c:77 override handler for SIGSEGV

**SNIP**

[Mon May 28 17:34:41 2018][INFO] /root/pelikan/deps/ccommon/src/event/cc_epoll.c:90 epoll fd 9 with nevent 1024
name: daemonize                       type: boolean          current: no                   ( default: no                   )

**SNIP**

name: tcp_poolsize                    type: unsigned int     current: 0                    ( default: 0                    )
==================
WARNING: ThreadSanitizer: data race (pid=31778)
  Write of size 4 at 0x0000011318a8 by thread T2:
    #0 time_update /root/pelikan/src/time/time.c:22:9 (pelikan_pingserver+0x4b7786)
    #1 _server_evwait /root/pelikan/src/core/data/server.c:301:5 (pelikan_pingserver+0x4b4a34)
    #2 core_server_evloop /root/pelikan/src/core/data/server.c:310 (pelikan_pingserver+0x4b4a34)

  Previous write of size 4 at 0x0000011318a8 by thread T1:
    #0 time_update /root/pelikan/src/time/time.c:22:9 (pelikan_pingserver+0x4b7786)
    #1 _worker_evwait /root/pelikan/src/core/data/worker.c:286:5 (pelikan_pingserver+0x4b58e4)
    #2 core_worker_evloop /root/pelikan/src/core/data/worker.c:297 (pelikan_pingserver+0x4b58e4)

  Location is global 'now' of size 4 at 0x0000011318a8 (pelikan_pingserver+0x0000011318a8)

  Thread T2 (tid=31781, running) created by main thread at:
    #0 pthread_create /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:965:3 (pelikan_pingserver+0x449da5)
    #1 core_run /root/pelikan/src/core/core.c:29:11 (pelikan_pingserver+0x4b5c93)
    #2 main /root/pelikan/src/server/pingserver/main.c:198:5 (pelikan_pingserver+0x4b2b71)

  Thread T1 (tid=31780, running) created by main thread at:
    #0 pthread_create /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:965:3 (pelikan_pingserver+0x449da5)
    #1 core_run /root/pelikan/src/core/core.c:23:11 (pelikan_pingserver+0x4b5c7a)
    #2 main /root/pelikan/src/server/pingserver/main.c:198:5 (pelikan_pingserver+0x4b2b71)

SUMMARY: ThreadSanitizer: data race /root/pelikan/src/time/time.c:22:9 in time_update
==================

compose/parse_status_t -> compose/parse_status_i

[doc] design and design philosophy

setup travis CI after going public

C coding style

Switch inclusion of header files from <> to ""

investigate how test discovery and module import should be used for integration test

A few questions here:

should we use python3? There's no historical baggage, so the question boils down to if there are new features that would benefit testing.
what's the best way to do test discovery? We use unittest.TestCase a little differently from how it is normally done (by loading from a list of files to create individual TestCase's instead of specifying the entire test in-place), so we may or may not be able to use the standard unittest discovery mechanism.
what's the best way to import modules? Mostly just not familiar with how python2/3 differ on this aspect.

optimize travis ci

remove sudo requirement (so we can use containers)
check if caching can be used

de-register writable event on a socket after successful `_worker_event_write()`

Currently write/writable event for a socket is registered only when a call of write would block for the corresponding fd, however, once added, this event is never removed until all events on the fd are, usually upon to connection termination.

This behavior is incorrect, at least for our epoll implementation. Since we do not want to simply add the EPOLLONESHOT flag to event_add_write, which disables all future events upon firing (and the event fired could be either read or write, so that's not particularly useful), we have a couple of options:

disable read and add writable event to the fd with EPOLLONESHOT, and re-add read only when the next write is successful. This implies we will not try to process more data coming from the socket until we have flushed out previous responses for this connection;
add return status to _worker_event_write(), and upon a successful write, modify the events associated with the filedescriptor to remove write-related flags.

[hotkey module] rather than keeping a copy of key in queue, use a pointer to the entry in kc_map

by doing this, we can prevent storing the key twice and incurring the memory cost of doing so

end-to-end tests for pelikan servers

We need some end-to-end test cases and harness to make sure the servers behave as expected. Initially looking into the most basic, deterministic scenarios. In these cases a sequence of request/response pairs will be used, the requests are sent to the server and the responses have to match exactly.

Since we don't have an official client library in Pelikan just yet ( @kevyang wrote one for memcached, but it needs a bit of polishing which I will do soon), we can start with sending serialized strings and treating responses the same way. Under this approach, the only part in the response that needs attention is the cas value for gets commands. Exact values can be ensured if we locally track the number of writes sent to the server; alternatively we can use regex to parse out the responses.

I'm proposing separating the test cases from the harness to maximize readability and flexibility.

Test cases can be stored in a file with some delimiter between individual request/response pairs, e.g.:

>>>
set foo 0 0 3
bar
<<<
STORED

>>>
gets foo
<<<
VALUE foo 0 3 1
bar
END

And test harness can read from such a file, as well as a server config file to learn the whereabouts of the server, and play the tests against it. I've looked at test harness written in Perl, Tcl, Bash and Python, and I lean toward Python for readability, given integration tests don't have particular performance requirements.
Perl, Tcl and Python are pretty much universally installed on all POSIX compatible systems (not to mention bash), so availability is not a problem for any of them.
Anything else that's worth looking at, or is there an argument for/against any of them?

/cc @seppo0010

development manifesto

Add timing wheel to core/admin

This will be necessary for background thread functionality e.g. logging

[hotkey] enable sampling within twemcache and slimcache

Test - ignore

Ignore this.

initialize metrics globally (instead of per module)

this will simplify module setup (reduce boilerplate) and allow us to print metric info without having to run all the setup sequences.

binaries created in source directory

The pelikan_twemcache and other binaries are left in SRCDIR/_bin rather than BUILDDIR/_bin, as would be expected and what the documentation describes.

Reduce code-duplication in core/worker and admin modules

Investigate segfault issue with stats

@thinkingfish reported an issue where if a stats command is issued twice, a segfault occurs. Also, stats output is not terminated by "END\r\n"

Remove log_core_loop

This should be merged into the background thread

Add a cmake flag to skip tests

Add peer to klog

Currently, we have not implemented having the peer name logged in klog.

Update admin protocol to take a second field (e.g. "stats slab")

choose a linter/formatting tool

A few things to consider:

availability across all platforms we plan to support;
customized rules;
IDE/editor integration;
build integration.

implement stats

Also implement parse_reply and add reply tests for protocol/admin once stats are implemented.

Add ability to handle partial value in pelikan-twemcache

slab unittest is failing

make test fails with the following information in slab.log

/Users/yao/workspace/pelikan/test/storage/slab/check_slab.c:108:F:basic item api:test_insert_large:0: item_insert not OK - return status 1

remove the need to check return code by making setup() return void and always exit upon failure

request_free gauge going negative

While doing some debugging on travis problems I noticed request_free having a value of -1 after doing a simple get to an empty key https://travis-ci.org/seppo0010/pelikan/jobs/205298464

Add hotkey detection

Use 80-ch line in rst source files

Fix leak in post_read in core_worker

see the TODO for details

flaky test in travis?

$ cd ../test && python -m unittest discover

F

======================================================================

FAIL: test_miss (integration.test_twemcache_basic.TwemcacheBasicTest)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/build/twitter/pelikan/test/integration/test_twemcache_basic.py", line 7, in test_miss

    self.assertMetrics(('request_parse', 1), ('get', 1), ('get_key_miss', 1))

  File "/home/travis/build/twitter/pelikan/test/integration/base.py", line 40, in assertMetrics

    stats.get(k, None),

AssertionError: Expected request_parse to be 1, got 0 instead

----------------------------------------------------------------------

Ran 1 test in 0.005s

FAILED (failures=1)

The command "cd ../test && python -m unittest discover" exited with 1.

Done. Your build exited with 1.

This occurred for all tests last night, went away this morning, and has now returned.

Change how integration test harness ingest test cases

Currently we need blank lines to separate individual commands inside the same test sequence, and executes each command following the req-rsp-stat order.

It would be more flexible to allow a single stream of mixed req-rsp-stat lines, with empty lines being optional and for visual separation only. For example, we can check stats after sending half of a command and before receiving any responses. This requires changing how tests are loaded, and how the harness executes individual lines.

Because any tests that work with the existing harness will work in the improved version, we can continue to add more test sequences at the meantime.

add admin-protocol related unit test

[hotkey module] move hotkey window over to an array based scheme rather than linked list

for the sake of memory locality, it may be better to use array rather than linked list to keep track of hotkey samples. consider using cc_ring_array for this purpose.

fold address information into channel

So each connection will have an addr member, this in turn can be used for issue #37 and we can simplify the signature of channel_open_fn to just include a pointer to the channel object.

move metrics printing macros, helper function to protocol/admin

So we don't duplicate the same ~50 lines of code in 3 different folders

No tabs should be used for structure fields alignment

The following note in our coding style contradicts the earlier no-literal-tabs policy.

Try to make the structure readable by aligning the member names using either tabs or spaces.
Use only one space or tab if it suffices to align at least ninety percent of the member names."

Is the tabs thingy added here just by mistake? Is it ok to change it so that we follow no-tabs policy everywhere?

Refactor worker and admin to move out implementation specific code

Most of the code in core/worker.c::_post_read and core/admin.c::_admin_post_read is specific to a cache implementation. The current code works for twemcache (and slimcache) but if someone wants to create a new implementation (say, pingserver) or support a new protocol (say, redis), the current implementation, I feel, is not ideal.

A few examples:
In the pingserver, I don't care about

REQ_GET/REQ_GETS/REQ_QUIT
request/response 'borrowing'
request cardinality

It might be better to move these post-read-processings to specific implementation directories.

fetching large amount of data

Hi,
I used twemcache for caching large amount of data(consisting of approximately 111000 characters), however I cannot fetch more than 80000 characters when I try to get the value back.

Create a PingServer

Create build targets

Current thoughts:
debug:

-DHAVE_ASSERT_LOG=on
-DHAVE_ASSERT_PANIC=on
-DHAVE_LOGGING=on
-DHAVE_STATS=on
-ggdb3 -O0

release:

-DHAVE_ASSERT_LOG=on
-DHAVE_ASSERT_PANIC=off
-DHAVE_LOGGING=on
-DHAVE_STATS=on
-ggdb3 -O2

bare:

-DHAVE_ASSERT_LOG=off
-DHAVE_ASSERT_PANIC=off
-DHAVE_LOGGING=off
-DHAVE_STATS=off
-ggdb3 -O2

Fix build in Linux

Linking C executable ../../../_bin/pelikan_slimcache
../../core/libcore.a(admin.c.o): In function `_admin_event':
/home/vagrant/pelikan/src/core/admin.c:237: undefined reference to `op_destroy'
../../core/libcore.a(admin.c.o): In function `_admin_post_read':
/home/vagrant/pelikan/src/core/admin.c:133: undefined reference to `parse_op'
/home/vagrant/pelikan/src/core/admin.c:151: undefined reference to `reply_create'
/home/vagrant/pelikan/src/core/admin.c:185: undefined reference to `compose_rep'
/home/vagrant/pelikan/src/core/admin.c:191: undefined reference to `op_reset'
/home/vagrant/pelikan/src/core/admin.c:192: undefined reference to `reply_destroy_all'
/home/vagrant/pelikan/src/core/admin.c:162: undefined reference to `reply_create'
/home/vagrant/pelikan/src/core/admin.c:202: undefined reference to `reply_destroy_all'
/home/vagrant/pelikan/src/core/admin.c:178: undefined reference to `compose_rep'
/home/vagrant/pelikan/src/core/admin.c:120: undefined reference to `op_create'
collect2: error: ld returned 1 exit status

To fork or not to fork?

Shall we each clone the repo under our own account, and submit pull requests from there, or shall we continue to create branches under the main repo?

Only those with write-access to the main repo can create branches, which means the workflow is different for those who have this access and those who don't. On the other hand, it is easier to see what branches are outstanding when they are all in one place, before a PR is created, and it doesn't require we modify how we setup our local repo yet again.