Coder Social home page Coder Social logo

danielealbano / cachegrand Goto Github PK

View Code? Open in Web Editor NEW
965.0 15.0 34.0 9.32 MB

cachegrand - a modern data ingestion, processing and serving platform built for today's hardware

License: BSD 3-Clause "New" or "Revised" License

CMake 0.96% C 59.43% C++ 36.74% Shell 0.66% Dockerfile 0.10% Assembly 0.10% Python 1.06% Tcl 0.95%
memcache redis caching linux high-performance io-uring redis-server key-value key-value-store low-latency

cachegrand's People

Contributors

danielealbano avatar fireuse avatar fossabot avatar justinholmes avatar lgtm-com[bot] avatar ryanrussell avatar valkyrie00 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cachegrand's Issues

Honor the pidfile_path setting

The configuration file https://github.com/danielealbano/cachegrand/blob/main/etc/cachegrand.yaml.skel exposes a pidfile_path setting which currently it's not honored by cachegrand.

At the startup check if the file is present and in case don't start, an option should be provided to force the start even if the pid file is present.
The pid file should be created (or updated) only right before cachegrand starts its event loop.
The pid file should be deleted at the exit if it was created by the running instance.

GET and SET Redis command must be able to process streams of data, not just expecting that the payload fits entirely in memory

Currently the Redis commands which don't fit the network buffer (which size is defined internally) are discarded because considered too long.

The various components in use already support infinite-length commands and all the processing is done using a sliding window on the buffers.

The issue is just in the get and set implementation which are writing or reading straight from the memory because they don't use the kvdb which wasn't in place at the time of the implementation.

Once the KVDB will be implemented these commands have to be updated to use it.

Honor socket keepalive settings

The current networking implementation in cachegrand already has a basic support for handling the socket keepalive settings, but this needs to be improved as it's not possible to set the time, the interval and the probes parameters instead already exposed in the config file.

These settings are extremely useful, especially when cachegrand runs behind reverse nats (common on cloud providers, e.g. Amazon) as otherwise long running connections would be closed resulting in failed operations.

Honor network timeouts

The current networking layer implemented in cachegrand doesn't honor the network timeouts that can be tuned in the config file.

The networking layer should be updated to control as needed read, write and inactivity timeouts

Add Valgrind memory tracking to the slab allocator

The Slab Allocator currently acts as a standard memory allocator for the most of cachegrand but because it acts as a custom memory allocator when configured to use hugepages Valgrind is not able to track the memory usage properly.

To improve the code quality and reduce the risk of memory leaks, it's important to let Valgrind track the memory allocated and freed by the Slab Allocator.

It's possible to find more details at the following page
https://valgrind.org/docs/manual/mc-manual.html#mc-manual.options

The memory allocated doesn't use red zones so it has to be set to zero, also the memory pools are not guaranteed to be "zeroed" as after the first usage (e.g. pulled from the hugepages cache, pushed back in there and then pulled again) they are not.

Refactor the "protocols" to be loadable modules

cachegrand already supports the concept of "modules", internally it's called protocols.

The current implementation, though, requires to build the "protocols" within the main code base and is built to invoke, directly, the initializers, the destructors, etc.

Although this approach works it's not flexible nor extendable.

The implemented layer should be refactored to allow loading modules (as external libraries) at the startup which will take care of handling the new incoming connection and perform the various initialisation / cleanup operations as required.

This will also require changes to the file organisation, headers, the build system, etc.

As the modules may need to perform I/O operations the logic should be rewritten to allow them to run once the scheduler has been defined, ie., using a new fiber per module to be instantiated.

(the same logic should be implemented also to perform the very initial setup operation, before module loading, e.g. loading the keystore value from the disk and populate the hashtable and the cache with the "hot" keys and any other required operation)

Migrate network I/O from a callback-based architecture to a fiber-based interface.

The callback-based architecture implemented in cachegrand to manage the I/O and the timers is extremely complex, because of the various layers of the software:

  • each layer requires its own context requiring, de-facto, building out a number of nested struct that have to be passed around
  • requires a number of internal double fetches, hops and branches that not only doesn't allow scaling up as more protocols and I/O backends supported would especially increase the internal branching seriously impacting the ability to predict the right branch that will be used.

Switching to the fibers will at the same time avoid the need to pass around the context, as it can be stored and preserved on the stack assigned to the fibers, and also improve the performance as switching forth and back from a fiber takes on a modern CPU less than 20ns meanwhile 5 different hops require more than 25ns.

Implement support for XXH3 in the hashtable

The hashtable implemented in cachegrand supports to ability to be easily linked to different hashing algorithms.

The hashing algorithms, to be supported, needs just a few lines of code to be implemented in https://github.com/danielealbano/cachegrand/blob/main/src/data_structures/hashtable/mcmp/hashtable_support_hash.c wrapping them in the appropriate ifdef

The repository of XXH3 has to be added as submodule in the 3rdparty folder and the code has to be built as static library via the cmake-based build infrastructure if the required build options are passed.

The cmake-build option has to be called USE_HASH_ALGORITHM_XXH3 meanwhile the cachegrand define has to be called CACHEGRAND_CMAKE_CONFIG_USE_HASH_ALGORITHM_XXH3.

Relevant source code references
https://github.com/danielealbano/cachegrand/blob/356214a25c3fb806d2238829b402f308d3e16e81/src/data_structures/hashtable/mcmp/hashtable_support_hash.h
https://github.com/danielealbano/cachegrand/blob/72b6716e8291482cb2c0dd415d2ecfb9c4e06674/src/data_structures/hashtable/mcmp/hashtable_support_hash.c
https://github.com/danielealbano/cachegrand/blob/41affe228c7b19a7ebdd4a8197216647570e7a67/src/cmake_config.h.in
https://github.com/danielealbano/cachegrand/blob/1ad09fe6aba43fd38e01265f22640abf4d6939cc/src/CMakeLists.txt
https://github.com/danielealbano/cachegrand/blob/356214a25c3fb806d2238829b402f308d3e16e81/tools/cmake/options.cmake
https://github.com/danielealbano/cachegrand/blob/1ad09fe6aba43fd38e01265f22640abf4d6939cc/3rdparty/t1ha.cmake

Rewrite the human-readable protocol parser to avoid memory leaks

The current implementation of the human-readable Redis protocol parser is buggy as it searches for a space and then perform a number of additional operations expecting data to be present afterwards.

The bug is being triggered by the code at line

} while (arg_start_char_ptr++ && arg_start_char_ptr < buffer_end_ptr);

The test

// SECTION("multiple argument 1 byte at time, no quotes, with new line, clone data") {

triggers the issue

Implement AARCH64 Armv8 support

The only platform currently supported is AMD64 but the adding support for AARCH64 ARMv8 shouldn't be particularly challenging as most of the code has been written with portability in mind.

There are a number of element that have to be taken into account to port cachegrand to AARCH64 ARMv8, below the most relevant ones:

  • The ABI for the registries and the stack pointer is different, the right ones need to be updated as needed in fiber.s, also the struct fiber has to be updated to hold the correct registries
  • The testing and the benchmarking code rely on a specific intel instruction exposed as intrinsic by gcc which takes care of flushing out of the L1D, L2 and L3 cached data of specific pages, this functionality needs to be ported
  • The T1HA hash algorithm isn't optimized for ARM, the support for XXH3 (issue #70) should be implemented first
  • The instructions for the memory fences and atomic operations may behave differently or have different performances, it's important to review the performances using the implemented benchmarks as it may be wort to change the implementation to perform better for the AARCH64 ARMv8 architecture.

Implement a code generator to to automatically create all redis commands scaffolding, arguments parsing and commands callbacks

Implementing Redis commands currently require writing a lot of scaffolding code because the module doesn't have a knowledge of the type of data it can receive, only if they can be long and therefore be processed as stream of data or if they can be short enough to be read entirely in one go from the buffer.

The redis repository contains a number of json files, one per command, with the specs of the commands themselves (e.g. the command name, the command group, the ACLs required, the parameters it takes, the specifics for the key, etc.).

It's necessary to implement a command generator that will:

  • automatically generate the necessary data structs for the commands
  • automatically generate all the scaffolding code for these commands to handle keys, values, tokens and tokens parameters, filling as needed the data struct for the command
  • automatically take care of handling common errors (e.g. key too long, unsupported parameter, missing parameter for a specified token, etc.)

Link to the json files
https://github.com/redis/redis/tree/unstable/src/commands

It's necessary to analize the json structure before investigating how to build the generator

Code coverage on spinlock.c changes randomly as an if can't easily be specifically triggered

Code coverage changes randomly when the tests for spinlock.c run (e.g. https://app.codecov.io/gh/danielealbano/cachegrand/compare/75/changes ).

This is being caused by

if (spins == UINT32_MAX) {
            LOG_E(TAG, "Possible stuck spinlock detected for thread %lu in %s at %s:%u",
                    pthread_self(), src_func, src_path, src_line);
}

The amount of "spins" of the spinlocks during the tests are not always enough, would be necessary to implement a test to ensure that this case is triggered and that the output produced by the log matches the expectation (although it's bad that from within a potentially stuck spinlock the software tries to print out logs as the spinlock stuck-ed maybe log related).

Also the log level should be changed to warning not error as it's not guaranteed to be an error and there can be a lot of contention for real.

Use OS keychain to load certificates and private keys

cachegrand currently loads the certificates and private keys directly from a local file but this is not a secure approach because the process can be potentially dumped and an attacker would easily have access to the private key.

In addition, the current implementation requires the server to be restarted if the certificate is rotated which is extremely destructive.

To avoid these scenario cachegrand should use the operating system keychain and rely on an external authorized process to update / rotate the certificates / private keys stored in there.

Here an example of how to populate a keychain

    key_serial_t keyring = add_key("keyring", "localhost", NULL, 0, 0);
    printf("The Keyring id is <%jx>\n", (uintmax_t)keyring);
    if (keyring == -1) {
        perror("add_key keyring");
        exit(EXIT_FAILURE);
    }

    size_t rsa_cert_len = 0, rsa_key_len = 0;
    char *rsa_cert = read_the_rsa_certificate_file(&rsa_cert_len);
    char *rsa_key = read_the_rsa_private_key_file(&rsa_key_len);

    if (add_key("tls_cert", "public-key", rsa_cert, rsa_cert_len, keyring) == -1) {
        perror("add_key tls_cert");
        return EXIT_FAILURE;
    }
    printf("Public key added to the keyring\n");

    if (add_key("tls_priv", "private-key", rsa_key, rsa_key_len, keyring) == -1) {
        perror("add_key tls_cert");
        return EXIT_FAILURE;
    }
    printf("Private key added to the keyring\n");

Add TLS support

Currently cachegrand doesn't provide any kind of over-the-wire encryption which might lead to successful MITM attacks.

To avoid these kind of problems TLS encryption for the connection can be implemented, as it's also supported by the modules curently implemented (e.g. Redis & Prometheus).

To add support for TLS there are multiple options as OpenSSL, mbedtls, kTLS, etc., and although the most common option is OpenSSL for cachegrand kTLS is a better suited option because it's able to provide great performances althoguh losing support for some older TLS versions, e.g. TLS 1.1 and previous, which is not a problem per-se because these shouldn't be used!

kTLS has also a native support for the hardware accelerators which is a great advantage.

Here some useful performance comparisons
https://legacy.netdevconf.info/0x14/pub/slides/25/TLS%20Perf%20Characterization%20slides%20-%20Netdev%200x14%20v2.pdf
https://legacy.netdevconf.info/0x14/pub/papers/29/0x14-paper29-talk-paper.pdf (focuses on offloading)

More information on kTLS
https://docs.kernel.org/networking/tls-offload.html
https://github.com/ktls/af_ktls
https://docs.nvidia.com/networking/display/MLNXOFEDv531001/Kernel+Transport+Layer+Security+(kTLS)+Offloads

Some reference repos for the actual implementation
https://github.com/insanum/ktls_test

Reduce max_clients in the default configuration from 10000 to 100

The io_uring I/O back-end requires a fixed amount of lockable memory (non-swappable, non-movable and long-lived) per client, as it uses this memory to exchange information with the kernel.
Although the default amount of memory available on Linux can easily be increased the default is fairly low and if cachegrand is started on a desktop environment it can easily crash with it as the software running on the desktop environment may use it as well (e.g. gnome, kde, firefox, google chrome, etc.).
If memory is needed but not available the software requiring may crash leading to a crash of the entire desktop environment and/or the browser!

The default config file has the max_clients parameter set to 10000 and it has to be reduced to 100.

Also update the README.md to mention that the memlock memory is accounted per user and it's used by a number of different kind of software, especially desktop environment, and therefore increasing the max_clients too much without having enough memlock memory available may lead to random crashes or malfunction of various kind of software (e.g. gnome, kde, firefox, chrome, etc.)

Honor max_key_length and max_command_length exposed by the redis settings

The Redis protocol is a text-based protocol and relies on line endings (ie. \r\n) to identify the end of the data in a number of cases.

As it's desirable in some cases to limit the length of the data received, cachegrand already exposes 2 ad-hoc settings tunable in the config file:

  • max_key_length
  • max_command_length

These 2 settings allow the users to limit the length of the keys using to operate on the internal cache and to limit the length of the whole command to prevent commands way too long that would require too much memory and can be used to perform DDOS or to limit the amount of data being set or retrieved in the cache.

Fortify builds

To improve the overall code quality, stability and security cachegrand should be built using extra options to fortify the builds.

Some work has already been started in an ad hoc branch adding the necessary build options
"-Wall"; "-Wextra"; "-Wpedantic"; "-Wformat=2"; "-Wformat-overflow=2"; "-Wformat-truncation=2";
"-Wformat-security"; "-Wnull-dereference"; "-Wstack-protector"; "-Wtrampolines"; "-Walloca"; "-Wvla";
"-Warray-bounds=2"; "-Wimplicit-fallthrough=3"; "-Wtraditional-conversion"; "-Wshift-overflow=2";
"-Wcast-qual"; "-Wstringop-overflow=4"; "-Wconversion"; "-Warith-conversion"; "-Wlogical-op";
"-Wduplicated-cond"; "-Wduplicated-branches"; "-Wformat-signedness"; "-Wshadow"; "-Wstrict-overflow=4";
"-Wundef"; "-Wstrict-prototypes"; "-Wswitch-default"; "-Wswitch-enum"; "-Wstack-usage=1000000";
"-Wcast-align=strict";
"-D_FORTIFY_SOURCE=2";
"-fstack-protector-strong"; "-fstack-clash-protection"; "-fPIE";
"-Wl,-z,relro"; "-Wl,-z,now"; "-Wl,-z,noexecstack"; "-Wl,-z,separate-code";

It's necessary to resolve all the warnings / errors to be able to merge this branch.

Implement redis commands APPEND, properly implement the SET command, implement SET command variations (e.g. SETNX and so on)

Implement redis commands APPEND, properly implement the SET command, implement SET command variations (e.g. SETNX and so on)

Below the full list of commands to implement

Command Notes
✔ APPEND
✔ COPY Missing DB parameter
✔ DBSIZE
✔ DECR
✔ DECRBY
✔ DEL
✔ EXISTS
✔ EXPIRE
✔ EXPIREAT
✔ EXPIRETIME
✔ FLUSHDB Missing ASYNC parameter
✔ GET
✔ GETDEL
✔ GETEX
✔ GETRANGE
✔ GETSET
✔ HELLO Missing AUTH and SETNAME parameters
✔ INCR
✔ INCRBY
✔ INCRBYFLOAT
✔ KEYS
✔ LCS Missing IDX, MINMATCHLEN and WITHMATCHLEN parameters
✔ MGET
✔ MSET
✔ MSETNX
✔ PERSIST
✔ PEXPIRE
✔ PEXPIREAT
✔ PEXPIRETIME
✔ PING
✔ PSETEX
✔ PTTL
✔ QUIT
✔ RANDOMKEY
✔ RENAME
✔ RENAMENX
✔ SCAN Missing TYPE parameter
✔ SET
✔ SETEX
✔ SETNX
✔ SETRANGE
✔ SHUTDOWN
✔ STRLEN
✔ SUBSTR
✔ TOUCH
✔ TTL
✔ UNLINK

Investigate sporadic protocol parser errors

When benchmarking, after hundred of millions of commands, I have noticed some very sporadic protocol parser errors which are most likely caused by an improper accouting of the data processed in combination with the buffer rewind.

A test should be introduced to test the behaviour of cachegrand when the buffer has to be rewound and a line terminator falls in between as it's very likely the issue is being caused by that.

Optimize fibers scheduling in the io_uring-based I/O interface

The fiber solution implemented in cachegrand comes with extra complexity and security concerns, although some of these issues have already been addressed (e.g. stack guards), there is also room for wide improvements.

Currently the fiber scheduler implemented in the io_uring I/O interface switch to a client fiber back to the scheduler fiber, this second switch can be entirely avoided if the context of the scheduler is kept on the thread (e.g. with a local thread context) instead that on the stack, de-facto halving the execution time from about 20ns to about 10ns.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.