danielealbano / cachegrand Goto Github PK
View Code? Open in Web Editor NEWcachegrand - a modern data ingestion, processing and serving platform built for today's hardware
License: BSD 3-Clause "New" or "Revised" License
cachegrand - a modern data ingestion, processing and serving platform built for today's hardware
License: BSD 3-Clause "New" or "Revised" License
Implement garbage collector for the KeyValue storage
Currently when storage_db_entry_index_allocate_key_chunks or storage_db_entry_index_allocate_value_chunks fail, the memory allocated for the chunks doesn't get freed.
The failure of one of these 2 functions might be caused by not having enough memory or disk space, depending on the backend, so it's imperative to properly free up the memory in case of failure.
Implement support to automatically upsize or downsize the hashtable if needed instead of having a fixed amount of allowed keys
When an incoming connection fails to be accept the fiber of the listener is terminated as well as the entire worker.
This behaviour is definitely wrong, failing to accept a connection may be caused by a temporary error and shouldn't cause the termination of the entire platform!
The configuration file https://github.com/danielealbano/cachegrand/blob/main/etc/cachegrand.yaml.skel exposes a pidfile_path setting which currently it's not honored by cachegrand.
At the startup check if the file is present and in case don't start, an option should be provided to force the start even if the pid file is present.
The pid file should be created (or updated) only right before cachegrand starts its event loop.
The pid file should be deleted at the exit if it was created by the running instance.
Implement a benchmarking suite based on memtier_benchmark to run on new PRs, PR updates and PR merges to collect and automatically publish benchmark data
Implement packaging for RHEL 9 and newer
Use the build docs for reference.
Implement packaging for Ubuntu 20.04 and newer
Use the build docs for reference.
Share as much as possible with the deb package build process
Add support for direct socket opening in io_uring when using a 5.19 kernel
Currently the Redis commands which don't fit the network buffer (which size is defined internally) are discarded because considered too long.
The various components in use already support infinite-length commands and all the processing is done using a sliding window on the buffers.
The issue is just in the get and set implementation which are writing or reading straight from the memory because they don't use the kvdb which wasn't in place at the time of the implementation.
Once the KVDB will be implemented these commands have to be updated to use it.
Implement redis hashes commands (HGET,HGETALL,HSET,HMSET,HDEL,HEXISTS,HKEYS,HLEN,HVALS)
Implement packaging for Fedora 34 and newer
Implement an AVX512F-optimized variant of the search algorithm and the get, set and delete backend operations
Pre-calculate the amount of hugepages and lockable memory required by cachegrand and check if there is enough available at the startup and/or during the execution, taking into account the amounts already allocated / locked.
Implement the io_uring based storage I/O interface
The current networking implementation in cachegrand already has a basic support for handling the socket keepalive settings, but this needs to be improved as it's not possible to set the time
, the interval
and the probes
parameters instead already exposed in the config file.
These settings are extremely useful, especially when cachegrand runs behind reverse nats (common on cloud providers, e.g. Amazon) as otherwise long running connections would be closed resulting in failed operations.
The current networking layer implemented in cachegrand doesn't honor the network timeouts that can be tuned in the config file.
The networking layer should be updated to control as needed read, write and inactivity timeouts
The Slab Allocator currently acts as a standard memory allocator for the most of cachegrand but because it acts as a custom memory allocator when configured to use hugepages Valgrind is not able to track the memory usage properly.
To improve the code quality and reduce the risk of memory leaks, it's important to let Valgrind track the memory allocated and freed by the Slab Allocator.
It's possible to find more details at the following page
https://valgrind.org/docs/manual/mc-manual.html#mc-manual.options
The memory allocated doesn't use red zones so it has to be set to zero, also the memory pools are not guaranteed to be "zeroed" as after the first usage (e.g. pulled from the hugepages cache, pushed back in there and then pulled again) they are not.
Implement KeyValue storage reload at startup
cachegrand already supports the concept of "modules", internally it's called protocols.
The current implementation, though, requires to build the "protocols" within the main code base and is built to invoke, directly, the initializers, the destructors, etc.
Although this approach works it's not flexible nor extendable.
The implemented layer should be refactored to allow loading modules (as external libraries) at the startup which will take care of handling the new incoming connection and perform the various initialisation / cleanup operations as required.
This will also require changes to the file organisation, headers, the build system, etc.
As the modules may need to perform I/O operations the logic should be rewritten to allow them to run once the scheduler has been defined, ie., using a new fiber per module to be instantiated.
(the same logic should be implemented also to perform the very initial setup operation, before module loading, e.g. loading the keystore value from the disk and populate the hashtable and the cache with the "hot" keys and any other required operation)
The callback-based architecture implemented in cachegrand to manage the I/O and the timers is extremely complex, because of the various layers of the software:
Switching to the fibers will at the same time avoid the need to pass around the context, as it can be stored and preserved on the stack assigned to the fibers, and also improve the performance as switching forth and back from a fiber takes on a modern CPU less than 20ns meanwhile 5 different hops require more than 25ns.
The hashtable implemented in cachegrand supports to ability to be easily linked to different hashing algorithms.
The hashing algorithms, to be supported, needs just a few lines of code to be implemented in https://github.com/danielealbano/cachegrand/blob/main/src/data_structures/hashtable/mcmp/hashtable_support_hash.c wrapping them in the appropriate ifdef
The repository of XXH3 has to be added as submodule in the 3rdparty folder and the code has to be built as static library via the cmake-based build infrastructure if the required build options are passed.
The cmake-build option has to be called USE_HASH_ALGORITHM_XXH3 meanwhile the cachegrand define has to be called CACHEGRAND_CMAKE_CONFIG_USE_HASH_ALGORITHM_XXH3.
Relevant source code references
https://github.com/danielealbano/cachegrand/blob/356214a25c3fb806d2238829b402f308d3e16e81/src/data_structures/hashtable/mcmp/hashtable_support_hash.h
https://github.com/danielealbano/cachegrand/blob/72b6716e8291482cb2c0dd415d2ecfb9c4e06674/src/data_structures/hashtable/mcmp/hashtable_support_hash.c
https://github.com/danielealbano/cachegrand/blob/41affe228c7b19a7ebdd4a8197216647570e7a67/src/cmake_config.h.in
https://github.com/danielealbano/cachegrand/blob/1ad09fe6aba43fd38e01265f22640abf4d6939cc/src/CMakeLists.txt
https://github.com/danielealbano/cachegrand/blob/356214a25c3fb806d2238829b402f308d3e16e81/tools/cmake/options.cmake
https://github.com/danielealbano/cachegrand/blob/1ad09fe6aba43fd38e01265f22640abf4d6939cc/3rdparty/t1ha.cmake
Add support for IORING_SETUP_TASKRUN_FLAG and IORING_SETUP_COOP_TASKRUN when using a 5.19 kernel
The redis protocol writer implemented in https://github.com/danielealbano/cachegrand/blob/main/src/protocol/redis/protocol_redis_writer.c doesn't have tests, these should be implemented.
The connection timeout exposed in the skel config file https://github.com/danielealbano/cachegrand/blob/main/etc/cachegrand.yaml.skel is not actually supported and the config file validator reports it as error preventing cachegrand from starting with the default config.
The setting has to be removed from the default config file.
The current implementation of the human-readable Redis protocol parser is buggy as it searches for a space and then perform a number of additional operations expecting data to be present afterwards.
The bug is being triggered by the code at line
The test
triggers the issue
Implement a module to support monitoring via prometheus
The only platform currently supported is AMD64 but the adding support for AARCH64 ARMv8 shouldn't be particularly challenging as most of the code has been written with portability in mind.
There are a number of element that have to be taken into account to port cachegrand to AARCH64 ARMv8, below the most relevant ones:
fiber.s
, also the struct fiber
has to be updated to hold the correct registriesImplementing Redis commands currently require writing a lot of scaffolding code because the module doesn't have a knowledge of the type of data it can receive, only if they can be long and therefore be processed as stream of data or if they can be short enough to be read entirely in one go from the buffer.
The redis repository contains a number of json files, one per command, with the specs of the commands themselves (e.g. the command name, the command group, the ACLs required, the parameters it takes, the specifics for the key, etc.).
It's necessary to implement a command generator that will:
Link to the json files
https://github.com/redis/redis/tree/unstable/src/commands
It's necessary to analize the json structure before investigating how to build the generator
Code coverage changes randomly when the tests for spinlock.c run (e.g. https://app.codecov.io/gh/danielealbano/cachegrand/compare/75/changes ).
This is being caused by
if (spins == UINT32_MAX) {
LOG_E(TAG, "Possible stuck spinlock detected for thread %lu in %s at %s:%u",
pthread_self(), src_func, src_path, src_line);
}
The amount of "spins" of the spinlocks during the tests are not always enough, would be necessary to implement a test to ensure that this case is triggered and that the output produced by the log matches the expectation (although it's bad that from within a potentially stuck spinlock the software tries to print out logs as the spinlock stuck-ed maybe log related).
Also the log level should be changed to warning not error as it's not guaranteed to be an error and there can be a lot of contention for real.
cachegrand currently loads the certificates and private keys directly from a local file but this is not a secure approach because the process can be potentially dumped and an attacker would easily have access to the private key.
In addition, the current implementation requires the server to be restarted if the certificate is rotated which is extremely destructive.
To avoid these scenario cachegrand should use the operating system keychain and rely on an external authorized process to update / rotate the certificates / private keys stored in there.
Here an example of how to populate a keychain
key_serial_t keyring = add_key("keyring", "localhost", NULL, 0, 0);
printf("The Keyring id is <%jx>\n", (uintmax_t)keyring);
if (keyring == -1) {
perror("add_key keyring");
exit(EXIT_FAILURE);
}
size_t rsa_cert_len = 0, rsa_key_len = 0;
char *rsa_cert = read_the_rsa_certificate_file(&rsa_cert_len);
char *rsa_key = read_the_rsa_private_key_file(&rsa_key_len);
if (add_key("tls_cert", "public-key", rsa_cert, rsa_cert_len, keyring) == -1) {
perror("add_key tls_cert");
return EXIT_FAILURE;
}
printf("Public key added to the keyring\n");
if (add_key("tls_priv", "private-key", rsa_key, rsa_key_len, keyring) == -1) {
perror("add_key tls_cert");
return EXIT_FAILURE;
}
printf("Private key added to the keyring\n");
Currently cachegrand doesn't provide any kind of over-the-wire encryption which might lead to successful MITM attacks.
To avoid these kind of problems TLS encryption for the connection can be implemented, as it's also supported by the modules curently implemented (e.g. Redis & Prometheus).
To add support for TLS there are multiple options as OpenSSL, mbedtls, kTLS, etc., and although the most common option is OpenSSL for cachegrand kTLS is a better suited option because it's able to provide great performances althoguh losing support for some older TLS versions, e.g. TLS 1.1 and previous, which is not a problem per-se because these shouldn't be used!
kTLS has also a native support for the hardware accelerators which is a great advantage.
Here some useful performance comparisons
https://legacy.netdevconf.info/0x14/pub/slides/25/TLS%20Perf%20Characterization%20slides%20-%20Netdev%200x14%20v2.pdf
https://legacy.netdevconf.info/0x14/pub/papers/29/0x14-paper29-talk-paper.pdf (focuses on offloading)
More information on kTLS
https://docs.kernel.org/networking/tls-offload.html
https://github.com/ktls/af_ktls
https://docs.nvidia.com/networking/display/MLNXOFEDv531001/Kernel+Transport+Layer+Security+(kTLS)+Offloads
Some reference repos for the actual implementation
https://github.com/insanum/ktls_test
Review source code license header before v0.1 release, it may be missing in some C files and it should be added to the headers and tests as well
Implement a http module interface to get statistics and other general information out of cachegrand
Add an module for Prometheus to support the basic scraping of the metrics via the text protocol
Implement redis sets commands (SADD, SREM, SISMEMBERS, SMEMBERS, SMOVE)
Implement zero-copy send for payloads bigger than 4K in the io_uring backend (waiting for ZC network writes support to be merged in io_uring)
Implement the KeyValue append-only database
The io_uring I/O back-end requires a fixed amount of lockable memory (non-swappable, non-movable and long-lived) per client, as it uses this memory to exchange information with the kernel.
Although the default amount of memory available on Linux can easily be increased the default is fairly low and if cachegrand is started on a desktop environment it can easily crash with it as the software running on the desktop environment may use it as well (e.g. gnome, kde, firefox, google chrome, etc.).
If memory is needed but not available the software requiring may crash leading to a crash of the entire desktop environment and/or the browser!
The default config file has the max_clients parameter set to 10000 and it has to be reduced to 100.
Also update the README.md to mention that the memlock memory is accounted per user and it's used by a number of different kind of software, especially desktop environment, and therefore increasing the max_clients too much without having enough memlock memory available may lead to random crashes or malfunction of various kind of software (e.g. gnome, kde, firefox, chrome, etc.)
The Redis protocol is a text-based protocol and relies on line endings (ie. \r\n) to identify the end of the data in a number of cases.
As it's desirable in some cases to limit the length of the data received, cachegrand already exposes 2 ad-hoc settings tunable in the config file:
These 2 settings allow the users to limit the length of the keys using to operate on the internal cache and to limit the length of the whole command to prevent commands way too long that would require too much memory and can be used to perform DDOS or to limit the amount of data being set or retrieved in the cache.
Implement redis lists commands (RPUSH, RPUSHX, LPUSH, LINDEX, LLEN, LPOP, LSET, RPOP)
Implement the memory based storage I/O interface
Implement tests for the worker_network_io_uring_op interface (it's possible to use as reference the worker_storage_io_uring_op tests)
To improve the overall code quality, stability and security cachegrand should be built using extra options to fortify the builds.
Some work has already been started in an ad hoc branch adding the necessary build options
"-Wall"; "-Wextra"; "-Wpedantic"; "-Wformat=2"; "-Wformat-overflow=2"; "-Wformat-truncation=2";
"-Wformat-security"; "-Wnull-dereference"; "-Wstack-protector"; "-Wtrampolines"; "-Walloca"; "-Wvla";
"-Warray-bounds=2"; "-Wimplicit-fallthrough=3"; "-Wtraditional-conversion"; "-Wshift-overflow=2";
"-Wcast-qual"; "-Wstringop-overflow=4"; "-Wconversion"; "-Warith-conversion"; "-Wlogical-op";
"-Wduplicated-cond"; "-Wduplicated-branches"; "-Wformat-signedness"; "-Wshadow"; "-Wstrict-overflow=4";
"-Wundef"; "-Wstrict-prototypes"; "-Wswitch-default"; "-Wswitch-enum"; "-Wstack-usage=1000000";
"-Wcast-align=strict";
"-D_FORTIFY_SOURCE=2";
"-fstack-protector-strong"; "-fstack-clash-protection"; "-fPIE";
"-Wl,-z,relro"; "-Wl,-z,now"; "-Wl,-z,noexecstack"; "-Wl,-z,separate-code";
It's necessary to resolve all the warnings / errors to be able to merge this branch.
Implement packaging for Debian 11 and newer
Use the build docs for reference.
Share as much as possible with the deb package build process
Improve pipeline performances buffering the data to be sent via network_send
Implement redis commands APPEND, properly implement the SET command, implement SET command variations (e.g. SETNX and so on)
Below the full list of commands to implement
Command | Notes |
---|---|
✔ APPEND | |
✔ COPY | Missing DB parameter |
✔ DBSIZE | |
✔ DECR | |
✔ DECRBY | |
✔ DEL | |
✔ EXISTS | |
✔ EXPIRE | |
✔ EXPIREAT | |
✔ EXPIRETIME | |
✔ FLUSHDB | Missing ASYNC parameter |
✔ GET | |
✔ GETDEL | |
✔ GETEX | |
✔ GETRANGE | |
✔ GETSET | |
✔ HELLO | Missing AUTH and SETNAME parameters |
✔ INCR | |
✔ INCRBY | |
✔ INCRBYFLOAT | |
✔ KEYS | |
✔ LCS | Missing IDX, MINMATCHLEN and WITHMATCHLEN parameters |
✔ MGET | |
✔ MSET | |
✔ MSETNX | |
✔ PERSIST | |
✔ PEXPIRE | |
✔ PEXPIREAT | |
✔ PEXPIRETIME | |
✔ PING | |
✔ PSETEX | |
✔ PTTL | |
✔ QUIT | |
✔ RANDOMKEY | |
✔ RENAME | |
✔ RENAMENX | |
✔ SCAN | Missing TYPE parameter |
✔ SET | |
✔ SETEX | |
✔ SETNX | |
✔ SETRANGE | |
✔ SHUTDOWN | |
✔ STRLEN | |
✔ SUBSTR | |
✔ TOUCH | |
✔ TTL | |
✔ UNLINK |
When benchmarking, after hundred of millions of commands, I have noticed some very sporadic protocol parser errors which are most likely caused by an improper accouting of the data processed in combination with the buffer rewind.
A test should be introduced to test the behaviour of cachegrand when the buffer has to be rewound and a line terminator falls in between as it's very likely the issue is being caused by that.
The fiber solution implemented in cachegrand comes with extra complexity and security concerns, although some of these issues have already been addressed (e.g. stack guards), there is also room for wide improvements.
Currently the fiber scheduler implemented in the io_uring I/O interface switch to a client fiber back to the scheduler fiber, this second switch can be entirely avoided if the context of the scheduler is kept on the thread (e.g. with a local thread context) instead that on the stack, de-facto halving the execution time from about 20ns to about 10ns.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.