The cachegrand from danielealbano

Implement garbage collector for the KeyValue storage

Properly de-allocate memory chunks when storage_db_entry_index_allocate_key_chunks or storage_db_entry_index_allocate_value_chunks fail

Currently when storage_db_entry_index_allocate_key_chunks or storage_db_entry_index_allocate_value_chunks fail, the memory allocated for the chunks doesn't get freed.

The failure of one of these 2 functions might be caused by not having enough memory or disk space, depending on the backend, so it's imperative to properly free up the memory in case of failure.

Implement support to automatically upsize the hashtable if needed instead of having a fixed amount of allowed keys

Implement support to automatically upsize or downsize the hashtable if needed instead of having a fixed amount of allowed keys

Handle errors when a listener fails to accept a connection in the io_uring fiber-based network interface

When an incoming connection fails to be accept the fiber of the listener is terminated as well as the entire worker.

This behaviour is definitely wrong, failing to accept a connection may be caused by a temporary error and shouldn't cause the termination of the entire platform!

Honor the pidfile_path setting

The configuration file https://github.com/danielealbano/cachegrand/blob/main/etc/cachegrand.yaml.skel exposes a pidfile_path setting which currently it's not honored by cachegrand.

At the startup check if the file is present and in case don't start, an option should be provided to force the start even if the pid file is present.
The pid file should be created (or updated) only right before cachegrand starts its event loop.
The pid file should be deleted at the exit if it was created by the running instance.

Implement a benchmarking suite based on memtier_benchmark to run on new PRs, PR updates and PR merges to collect and automatically publish benchmark data

Implement packaging for RHEL 9

Implement packaging for RHEL 9 and newer

Use the build docs for reference.

Implement packaging for Ubuntu 20.04 and newer

Use the build docs for reference.

Share as much as possible with the deb package build process

Add support for direct socket opening in io_uring when using a 5.19 kernel

GET and SET Redis command must be able to process streams of data, not just expecting that the payload fits entirely in memory

Currently the Redis commands which don't fit the network buffer (which size is defined internally) are discarded because considered too long.

The various components in use already support infinite-length commands and all the processing is done using a sliding window on the buffers.

The issue is just in the get and set implementation which are writing or reading straight from the memory because they don't use the kvdb which wasn't in place at the time of the implementation.

Once the KVDB will be implemented these commands have to be updated to use it.

Implement redis hashes commands (HGET,HGETALL,HSET,HMSET,HDEL,HEXISTS,HKEYS,HLEN,HVALS)

Implement packaging for Fedora 34 and newer

Add an AVX512F-optmized hashtable variant

Implement an AVX512F-optimized variant of the search algorithm and the get, set and delete backend operations

Pre-calculate the amount of hugepages and lockable memory required by cachegrand and check if there is enough available at the startup and/or during the execution, taking into account the amounts already allocated / locked.

Pre-calculate the amount of hugepages and lockable memory required by cachegrand and check if there is enough available at the startup and/or during the execution, taking into account the amounts already allocated / locked.

Implement the io_uring based storage I/O interface

Honor socket keepalive settings

The current networking implementation in cachegrand already has a basic support for handling the socket keepalive settings, but this needs to be improved as it's not possible to set the time, the interval and the probes parameters instead already exposed in the config file.

These settings are extremely useful, especially when cachegrand runs behind reverse nats (common on cloud providers, e.g. Amazon) as otherwise long running connections would be closed resulting in failed operations.

Honor network timeouts

The current networking layer implemented in cachegrand doesn't honor the network timeouts that can be tuned in the config file.

The networking layer should be updated to control as needed read, write and inactivity timeouts

Add Valgrind memory tracking to the slab allocator

The Slab Allocator currently acts as a standard memory allocator for the most of cachegrand but because it acts as a custom memory allocator when configured to use hugepages Valgrind is not able to track the memory usage properly.

To improve the code quality and reduce the risk of memory leaks, it's important to let Valgrind track the memory allocated and freed by the Slab Allocator.

It's possible to find more details at the following page
https://valgrind.org/docs/manual/mc-manual.html#mc-manual.options

The memory allocated doesn't use red zones so it has to be set to zero, also the memory pools are not guaranteed to be "zeroed" as after the first usage (e.g. pulled from the hugepages cache, pushed back in there and then pulled again) they are not.

Implement KeyValue storage reload at startup

Refactor the "protocols" to be loadable modules

cachegrand already supports the concept of "modules", internally it's called protocols.

The current implementation, though, requires to build the "protocols" within the main code base and is built to invoke, directly, the initializers, the destructors, etc.

Although this approach works it's not flexible nor extendable.

The implemented layer should be refactored to allow loading modules (as external libraries) at the startup which will take care of handling the new incoming connection and perform the various initialisation / cleanup operations as required.

This will also require changes to the file organisation, headers, the build system, etc.

As the modules may need to perform I/O operations the logic should be rewritten to allow them to run once the scheduler has been defined, ie., using a new fiber per module to be instantiated.

(the same logic should be implemented also to perform the very initial setup operation, before module loading, e.g. loading the keystore value from the disk and populate the hashtable and the cache with the "hot" keys and any other required operation)

Migrate network I/O from a callback-based architecture to a fiber-based interface.

The callback-based architecture implemented in cachegrand to manage the I/O and the timers is extremely complex, because of the various layers of the software:

each layer requires its own context requiring, de-facto, building out a number of nested struct that have to be passed around
requires a number of internal double fetches, hops and branches that not only doesn't allow scaling up as more protocols and I/O backends supported would especially increase the internal branching seriously impacting the ability to predict the right branch that will be used.

Switching to the fibers will at the same time avoid the need to pass around the context, as it can be stored and preserved on the stack assigned to the fibers, and also improve the performance as switching forth and back from a fiber takes on a modern CPU less than 20ns meanwhile 5 different hops require more than 25ns.

Implement support for XXH3 in the hashtable

The hashtable implemented in cachegrand supports to ability to be easily linked to different hashing algorithms.

The hashing algorithms, to be supported, needs just a few lines of code to be implemented in https://github.com/danielealbano/cachegrand/blob/main/src/data_structures/hashtable/mcmp/hashtable_support_hash.c wrapping them in the appropriate ifdef

The repository of XXH3 has to be added as submodule in the 3rdparty folder and the code has to be built as static library via the cmake-based build infrastructure if the required build options are passed.

The cmake-build option has to be called USE_HASH_ALGORITHM_XXH3 meanwhile the cachegrand define has to be called CACHEGRAND_CMAKE_CONFIG_USE_HASH_ALGORITHM_XXH3.

Relevant source code references
https://github.com/danielealbano/cachegrand/blob/356214a25c3fb806d2238829b402f308d3e16e81/src/data_structures/hashtable/mcmp/hashtable_support_hash.h
https://github.com/danielealbano/cachegrand/blob/72b6716e8291482cb2c0dd415d2ecfb9c4e06674/src/data_structures/hashtable/mcmp/hashtable_support_hash.c
https://github.com/danielealbano/cachegrand/blob/41affe228c7b19a7ebdd4a8197216647570e7a67/src/cmake_config.h.in
https://github.com/danielealbano/cachegrand/blob/1ad09fe6aba43fd38e01265f22640abf4d6939cc/src/CMakeLists.txt
https://github.com/danielealbano/cachegrand/blob/356214a25c3fb806d2238829b402f308d3e16e81/tools/cmake/options.cmake
https://github.com/danielealbano/cachegrand/blob/1ad09fe6aba43fd38e01265f22640abf4d6939cc/3rdparty/t1ha.cmake

Add support for IORING_SETUP_TASKRUN_FLAG and IORING_SETUP_COOP_TASKRUN when using a 5.19 kernel

Implement tests for the redis protocol writer

The redis protocol writer implemented in https://github.com/danielealbano/cachegrand/blob/main/src/protocol/redis/protocol_redis_writer.c doesn't have tests, these should be implemented.

Remove the "connection" timeout from the config file, it's unsupported.

The connection timeout exposed in the skel config file https://github.com/danielealbano/cachegrand/blob/main/etc/cachegrand.yaml.skel is not actually supported and the config file validator reports it as error preventing cachegrand from starting with the default config.

The setting has to be removed from the default config file.

Rewrite the human-readable protocol parser to avoid memory leaks

The current implementation of the human-readable Redis protocol parser is buggy as it searches for a space and then perform a number of additional operations expecting data to be present afterwards.

The bug is being triggered by the code at line

cachegrand/src/protocol/redis/protocol_redis_reader.c

Line 175 in 69c5e92

} while (arg_start_char_ptr++ && arg_start_char_ptr < buffer_end_ptr);

The test

cachegrand/tests/protocols/redis/test-protocol-redis-reader-inline.cpp

Line 510 in 69c5e92

    
           //        SECTION("multiple argument 1 byte at time, no quotes, with new line, clone data") {

triggers the issue

Implement a module to support monitoring via prometheus

Implement AARCH64 Armv8 support

The only platform currently supported is AMD64 but the adding support for AARCH64 ARMv8 shouldn't be particularly challenging as most of the code has been written with portability in mind.

There are a number of element that have to be taken into account to port cachegrand to AARCH64 ARMv8, below the most relevant ones:

The ABI for the registries and the stack pointer is different, the right ones need to be updated as needed in fiber.s, also the struct fiber has to be updated to hold the correct registries
The testing and the benchmarking code rely on a specific intel instruction exposed as intrinsic by gcc which takes care of flushing out of the L1D, L2 and L3 cached data of specific pages, this functionality needs to be ported
The T1HA hash algorithm isn't optimized for ARM, the support for XXH3 (issue #70) should be implemented first
The instructions for the memory fences and atomic operations may behave differently or have different performances, it's important to review the performances using the implemented benchmarks as it may be wort to change the implementation to perform better for the AARCH64 ARMv8 architecture.

Implement a code generator to to automatically create all redis commands scaffolding, arguments parsing and commands callbacks

Implementing Redis commands currently require writing a lot of scaffolding code because the module doesn't have a knowledge of the type of data it can receive, only if they can be long and therefore be processed as stream of data or if they can be short enough to be read entirely in one go from the buffer.

The redis repository contains a number of json files, one per command, with the specs of the commands themselves (e.g. the command name, the command group, the ACLs required, the parameters it takes, the specifics for the key, etc.).

It's necessary to implement a command generator that will:

automatically generate the necessary data structs for the commands
automatically generate all the scaffolding code for these commands to handle keys, values, tokens and tokens parameters, filling as needed the data struct for the command
automatically take care of handling common errors (e.g. key too long, unsupported parameter, missing parameter for a specified token, etc.)

Link to the json files
https://github.com/redis/redis/tree/unstable/src/commands

It's necessary to analize the json structure before investigating how to build the generator

Code coverage on spinlock.c changes randomly as an if can't easily be specifically triggered

Code coverage changes randomly when the tests for spinlock.c run (e.g. https://app.codecov.io/gh/danielealbano/cachegrand/compare/75/changes ).

This is being caused by

if (spins == UINT32_MAX) {
            LOG_E(TAG, "Possible stuck spinlock detected for thread %lu in %s at %s:%u",
                    pthread_self(), src_func, src_path, src_line);
}

The amount of "spins" of the spinlocks during the tests are not always enough, would be necessary to implement a test to ensure that this case is triggered and that the output produced by the log matches the expectation (although it's bad that from within a potentially stuck spinlock the software tries to print out logs as the spinlock stuck-ed maybe log related).

Also the log level should be changed to warning not error as it's not guaranteed to be an error and there can be a lot of contention for real.

Use OS keychain to load certificates and private keys

cachegrand currently loads the certificates and private keys directly from a local file but this is not a secure approach because the process can be potentially dumped and an attacker would easily have access to the private key.

In addition, the current implementation requires the server to be restarted if the certificate is rotated which is extremely destructive.

To avoid these scenario cachegrand should use the operating system keychain and rely on an external authorized process to update / rotate the certificates / private keys stored in there.

Here an example of how to populate a keychain

    key_serial_t keyring = add_key("keyring", "localhost", NULL, 0, 0);
    printf("The Keyring id is <%jx>\n", (uintmax_t)keyring);
    if (keyring == -1) {
        perror("add_key keyring");
        exit(EXIT_FAILURE);
    }

    size_t rsa_cert_len = 0, rsa_key_len = 0;
    char *rsa_cert = read_the_rsa_certificate_file(&rsa_cert_len);
    char *rsa_key = read_the_rsa_private_key_file(&rsa_key_len);

    if (add_key("tls_cert", "public-key", rsa_cert, rsa_cert_len, keyring) == -1) {
        perror("add_key tls_cert");
        return EXIT_FAILURE;
    }
    printf("Public key added to the keyring\n");

    if (add_key("tls_priv", "private-key", rsa_key, rsa_key_len, keyring) == -1) {
        perror("add_key tls_cert");
        return EXIT_FAILURE;
    }
    printf("Private key added to the keyring\n");

Add TLS support

Currently cachegrand doesn't provide any kind of over-the-wire encryption which might lead to successful MITM attacks.

To avoid these kind of problems TLS encryption for the connection can be implemented, as it's also supported by the modules curently implemented (e.g. Redis & Prometheus).

To add support for TLS there are multiple options as OpenSSL, mbedtls, kTLS, etc., and although the most common option is OpenSSL for cachegrand kTLS is a better suited option because it's able to provide great performances althoguh losing support for some older TLS versions, e.g. TLS 1.1 and previous, which is not a problem per-se because these shouldn't be used!

kTLS has also a native support for the hardware accelerators which is a great advantage.

Here some useful performance comparisons
https://legacy.netdevconf.info/0x14/pub/slides/25/TLS%20Perf%20Characterization%20slides%20-%20Netdev%200x14%20v2.pdf
https://legacy.netdevconf.info/0x14/pub/papers/29/0x14-paper29-talk-paper.pdf (focuses on offloading)

More information on kTLS
https://docs.kernel.org/networking/tls-offload.html
https://github.com/ktls/af_ktls
https://docs.nvidia.com/networking/display/MLNXOFEDv531001/Kernel+Transport+Layer+Security+(kTLS)+Offloads

Some reference repos for the actual implementation
https://github.com/insanum/ktls_test

Review source code license header before v0.1 release

Review source code license header before v0.1 release, it may be missing in some C files and it should be added to the headers and tests as well

Implement a http module interface to get statistics and other general information out of cachegrand

Add an module for Prometheus to support the basic scraping of the metrics via the text protocol

Implement redis sets commands (SADD, SREM, SISMEMBERS, SMEMBERS, SMOVE)

Implement zero-copy send for payloads bigger than 1.5K in the io_uring backend when using kernel 6.0

Implement zero-copy send for payloads bigger than 4K in the io_uring backend (waiting for ZC network writes support to be merged in io_uring)

Implement the KeyValue append-only database

Reduce max_clients in the default configuration from 10000 to 100

The io_uring I/O back-end requires a fixed amount of lockable memory (non-swappable, non-movable and long-lived) per client, as it uses this memory to exchange information with the kernel.
Although the default amount of memory available on Linux can easily be increased the default is fairly low and if cachegrand is started on a desktop environment it can easily crash with it as the software running on the desktop environment may use it as well (e.g. gnome, kde, firefox, google chrome, etc.).
If memory is needed but not available the software requiring may crash leading to a crash of the entire desktop environment and/or the browser!

The default config file has the max_clients parameter set to 10000 and it has to be reduced to 100.

Also update the README.md to mention that the memlock memory is accounted per user and it's used by a number of different kind of software, especially desktop environment, and therefore increasing the max_clients too much without having enough memlock memory available may lead to random crashes or malfunction of various kind of software (e.g. gnome, kde, firefox, chrome, etc.)

Honor max_key_length and max_command_length exposed by the redis settings

The Redis protocol is a text-based protocol and relies on line endings (ie. \r\n) to identify the end of the data in a number of cases.

As it's desirable in some cases to limit the length of the data received, cachegrand already exposes 2 ad-hoc settings tunable in the config file:

max_key_length
max_command_length

These 2 settings allow the users to limit the length of the keys using to operate on the internal cache and to limit the length of the whole command to prevent commands way too long that would require too much memory and can be used to perform DDOS or to limit the amount of data being set or retrieved in the cache.

Implement redis lists commands (RPUSH, RPUSHX, LPUSH, LINDEX, LLEN, LPOP, LSET, RPOP)

Implement the memory based storage I/O interface

Implement tests for the worker_network_io_uring_op interface

Implement tests for the worker_network_io_uring_op interface (it's possible to use as reference the worker_storage_io_uring_op tests)

The "hot" data should be loaded into memory instead of being repeatedly loaded from disk to reduce latencies and let IOPS to be used for writes, reading uncacheble data (e.g. way too big keys) or reading new data

Fortify builds

To improve the overall code quality, stability and security cachegrand should be built using extra options to fortify the builds.

Some work has already been started in an ad hoc branch adding the necessary build options
"-Wall"; "-Wextra"; "-Wpedantic"; "-Wformat=2"; "-Wformat-overflow=2"; "-Wformat-truncation=2";
"-Wformat-security"; "-Wnull-dereference"; "-Wstack-protector"; "-Wtrampolines"; "-Walloca"; "-Wvla";
"-Warray-bounds=2"; "-Wimplicit-fallthrough=3"; "-Wtraditional-conversion"; "-Wshift-overflow=2";
"-Wcast-qual"; "-Wstringop-overflow=4"; "-Wconversion"; "-Warith-conversion"; "-Wlogical-op";
"-Wduplicated-cond"; "-Wduplicated-branches"; "-Wformat-signedness"; "-Wshadow"; "-Wstrict-overflow=4";
"-Wundef"; "-Wstrict-prototypes"; "-Wswitch-default"; "-Wswitch-enum"; "-Wstack-usage=1000000";
"-Wcast-align=strict";
"-D_FORTIFY_SOURCE=2";
"-fstack-protector-strong"; "-fstack-clash-protection"; "-fPIE";
"-Wl,-z,relro"; "-Wl,-z,now"; "-Wl,-z,noexecstack"; "-Wl,-z,separate-code";

It's necessary to resolve all the warnings / errors to be able to merge this branch.

Implement packaging for Debian 11 and newer

Use the build docs for reference.

Share as much as possible with the deb package build process

Improve pipeline performances buffering the data to be sent via network_send

Implement redis commands APPEND, properly implement the SET command, implement SET command variations (e.g. SETNX and so on)

Below the full list of commands to implement

Command	Notes
✔ APPEND
✔ COPY	Missing DB parameter
✔ DBSIZE
✔ DECR
✔ DECRBY
✔ DEL
✔ EXISTS
✔ EXPIRE
✔ EXPIREAT
✔ EXPIRETIME
✔ FLUSHDB	Missing ASYNC parameter
✔ GET
✔ GETDEL
✔ GETEX
✔ GETRANGE
✔ GETSET
✔ HELLO	Missing AUTH and SETNAME parameters
✔ INCR
✔ INCRBY
✔ INCRBYFLOAT
✔ KEYS
✔ LCS	Missing IDX, MINMATCHLEN and WITHMATCHLEN parameters
✔ MGET
✔ MSET
✔ MSETNX
✔ PERSIST
✔ PEXPIRE
✔ PEXPIREAT
✔ PEXPIRETIME
✔ PING
✔ PSETEX
✔ PTTL
✔ QUIT
✔ RANDOMKEY
✔ RENAME
✔ RENAMENX
✔ SCAN	Missing TYPE parameter
✔ SET
✔ SETEX
✔ SETNX
✔ SETRANGE
✔ SHUTDOWN
✔ STRLEN
✔ SUBSTR
✔ TOUCH
✔ TTL
✔ UNLINK

Investigate sporadic protocol parser errors

When benchmarking, after hundred of millions of commands, I have noticed some very sporadic protocol parser errors which are most likely caused by an improper accouting of the data processed in combination with the buffer rewind.

A test should be introduced to test the behaviour of cachegrand when the buffer has to be rewound and a line terminator falls in between as it's very likely the issue is being caused by that.

Optimize fibers scheduling in the io_uring-based I/O interface

The fiber solution implemented in cachegrand comes with extra complexity and security concerns, although some of these issues have already been addressed (e.g. stack guards), there is also room for wide improvements.

Currently the fiber scheduler implemented in the io_uring I/O interface switch to a client fiber back to the scheduler fiber, this second switch can be entirely avoided if the context of the scheduler is kept on the thread (e.g. with a local thread context) instead that on the stack, de-facto halving the execution time from about 20ns to about 10ns.

danielealbano / cachegrand Goto Github PK

cachegrand's People

Contributors

Stargazers

Watchers

Forkers

cachegrand's Issues

Recommend Projects

Recommend Topics

Recommend Org