Coder Social home page Coder Social logo

spcl / rfaas Goto Github PK

View Code? Open in Web Editor NEW
41.0 8.0 16.0 983 KB

rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.

Home Page: https://mcopik.github.io/projects/rfaas/

License: BSD 3-Clause "New" or "Revised" License

CMake 4.64% C++ 92.40% Shell 2.96%
serverless faas faas-platform rdma serverless-framework

rfaas's Introduction

rFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing

A high-performance FaaS platform with RDMA acceleration for function invocations.

License GitHub issues GitHub pull requests

rFaaS vs HPC vs FaaS The cloud paradigm Function-as-a-Service (FaaS) provides an ability to execute stateless and fine-grained functions on elastic and ephemeral resources. However, serverless struggles to achieve the performance needed in high-performance computing: slow invocations, low network bandwidth, and the overheads of the FaaS management system make it difficult to incorporate serverless functions when every millisecond counts. Therefore, we decided to combine the best of both worlds: elasticity of FaaS and high-performance of cluster batch systems. We built a new FaaS platform with RDMA-accelerated network transport.

rFaaS is a serverless platform redesigned to support high-performance and low-latency invocations with a direct RDMA connection. In rFaaS, the centralized schedulers and API gateway are replaced with a decentralized allocation mechanism. Instead of using a traditional cloud trigger, HPC applications query executor servers, obtain resource allocation and establish RDMA connections to remote workers. Every function is invoked by writing input data directly to the memory of the worker. This allows us to achieve a single-digit microsecond hot invocation latency - hot invocations add less than 350 nanoseconds overhead on top of the fastest available network transmission.

To use rFaaS, please read the documentation on software and hardware requirements, installation instructions, and the basic example of using rFaaS. rFaaS comes with a set of benchmark applications and tests. We provide an extended set of C++ serverless functions, including multimedia and ML inference examples from the serverless benchmarking suite SeBS. Finally, you can find more details about rFaaS in the documentation on the system and the client rFaaS library.

Do you have further questions not answered by our documentation? Did you encounter troubles with installing and using rFaaS? Or do you want to use rFaaS in your work and you need new features? Feel free to reach us through GitHub issues or by writing to [email protected].

Paper

When using rFaaS, please cite our arXiv paper preprint, and you can find more details about research work in this paper summary. You can cite our software repository as well, using the citation button on the right.

@inproceedings{copik2023rfaas,
  title={{r}FaaS: Enabling High Performance Serverless with RDMA and Leases},
  author={Marcin Copik and Konstantin Taranov and Alexandru Calotoiu and Torsten Hoefler},
  year={2023},
  series = {IPDPS '23},
  booktitle = {Proceedings of the 37th IEEE Interational Parallel and Distributed Processing Symposium},
  eprint={2106.13859},
}

Requirements

Hardware rFaaS supports SoftROCE and RoCE RDMA NICs with the help of ibverbs. Evaluation and testing with IB fabric is currently in progress.

In future versions, we plan for rFaaS to support Cray interconnect through libfabric and its ugni provider.

Software Currently, rFaaS works only on Linux systems as we realy heavily on POSIX interfaces. We require the following libraries and tools:

  • CMake >= 3.11.
  • C++ compiler with C++17 support.
  • libibverbs with headers installed.
  • librdmacm with headers installed.
  • pistache - HTTP and REST framework.

Furthermore, we fetch and build the following dependencies during CMake build - unless they are found already in the system.

Containers rFaaS supports two types of function executors - a bare-metal process and a Docker container. For Docker, we use the SR-IOV plugin from Mellanox to run Docker-based function executors with virtual NIC device functions. Please follow Mellanox documentation and instructions to install and configure the plugin. rFaaS expects that docker_rdma_sriov binary is available in PATH.

In future versions, we plan to support Singularity containers and offer a simpler, but less secure Docker networking.

Installation

To build rFaaS, run the following CMake configuration:

cmake -DCMAKE_CXX_COMPILER=<your-cxx-compiler> -DCMAKE_BUILD_TYPE=Release <source-dir>
cmake --build .

To enable more verbose logging, change the CMake configuration parameter to: -DCMAKE_BUILD_TYPE=Debug.

The CMake installation has the following optional configuration parameters.

Arguments
WITH_EXAMPLES EXPERIMENTAL Build additional examples (see examples subsection for details on additional dependencies).
WITH_TESTING EXPERIMENTAL Enable testing - requires providing JSON testing configuration as the value of this flag. See testing subsection for details.
CXXOPTS_PATH Path to an existing installation of the cxxopts library; disables the automatic fetch and build of the library.
SPDLOG_PATH Path to an existing installation of the spdlog library; disables the automatic fetch and build of the library.
LIBRDMACM_PATH Path to a installation directory of the librdmacm library.

Usage

To learn how to use rFaaS, please follow the tutorial

For an in-depth analysis of each component and their configuration, please look at the system documentation.

Authors

rfaas's People

Contributors

marchrap avatar mattnappo avatar mcopik avatar taranovk avatar william-mou avatar yuanmxc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rfaas's Issues

Add dependency on read-writer queue

We should add readerwriterqueue as a proper dependency in the CMake build. Furthermore, we want to fetch and build Pistache to avoid the trouble of forcing users to install it manually.

  • Fetch the dependency with CMake.
  • Ensure that include directories are properly configured.
  • Add in the README proper acknowledgment of the other project.

Add testing framework

We need a custom system to execute integration and system tests without mocking.

  • Add device configuration (#1)
  • Add user-based configuration of testing endpoints
  • Integrate gtest with CMake
  • Add test: basic allocation
  • Add test: random-based allocation
  • Add test: simple invocation
  • Add test: warm invocations
  • Add test: async invocations
  • Add test: parallel invocations

Add documentation

We have no documentation whatsoever:

  • Citation + intro
  • Installation & dependencies
  • Supported devices and platforms
  • Platform configuration
  • Pointers to installing software emulation for RDMA (softROCE).
  • Limitations
  • Examples of using our tool
  • Authors

Correct executor shutdown

Currently, we use repetition limit for executor shutdown, as it helps us with designing benchmarks. However, in real deployment, we should use the normal shutdown signal from the client.

Add Docker registry

We are wrapping up the support for using Docker images and rFaaS (#10). However, we only use DockerHub, which has multiple restrictions - throttling and network latency. We want to locally deploy a Docker registry.

  • Add scripts for deploying the Docker registry and generating JSON configuration.
  • Add configuration of Docker registry in the rFaaS executor manager configuration.
  • Add polling Docker images on container allocation in the execution manager - only when the image is not present already.
  • Support registry credentials.

Add support for Singularity containers

Singularity is a containerization technology popular in the HPC. rFaaS should support running functions inside Singularity containers.

Tasks:

  • Add switch for using singularity and Docker.
  • Add JSON configuration (network, CPU affinity, memory).
  • Add configuration of deployment of images.
  • Invoke containers.
  • Correctly handle the shutdown.

Documentation generator

We should automatically generate docs from the code.

  • Add documentation generator
  • Integrate with CMake
  • Integrate with CI
  • Add missing annotations

Add network backend with libfabric

We want to extend rFaaS with support for Cray interconnects. The proper way would be to add a libfabric implementation with the ugni provider.

  • Isolate ibverbs/rdmacm network code and define network lib interface.
  • Initialize ugni with the help of cray-mpi
  • Implement basic P2P communication.
  • Implement and test atomic operations.
  • Evaluate communication between processes from different SLURM jobs (credentials exchange).
  • Extend test suite.

Invocation of user-defined Docker containers

We are currently missing two functionalities - local Docker registries and configuration of container launch.

  • Add Dockerfiles for base images with executor.
  • Pass Docker image name - right now, it's hardcoded.
  • Disable loading .so from memory
  • Add a Docker-based index of the function.

Correct port selection

When running any of the services, they allow to let the port be selected by OS, but they don't change it afterwards and distribute the incorrect default value from the CLI.

[libfabric-sarus branch] --skip-resource-manager flag still requires an ip & port in config

In the libfabric-sarus branch, the executor manager will crash if the resource_manager_address or resource_manager_port fields in config/executor_manager.json are not set. This crash still happens even when the executor manager is run with --skip-resource-manager. I can work on a fix at some point, but I just wanted to document the bug.

Code breaks here: https://github.com/spcl/rFaaS/blob/libfabric-sarus/server/executor_manager/manager.cpp#L28

Compilation fails on archlinux

I used the code of the latest master branch and tried to compile it on my computer, but there was a problem of spdlog dependency. You can see that the version of spdlog installed by my archlinux package manager does not match the code of the current branch. We should use gitmodule to introduce into the current project instead of relying on the system's own.
I'm willing to submit a PR for this.

OS : ArchLinux 6.2.2
spdlog version : 1.11.0-2

(compile error) unimplemented: non-trivial designated initializers not supported

Hi, really excellent work! The research we were doing wanted to perform a performance comparison with rFaaS, but I got an error when compiling.

OS: Ubuntu 18.04.6 LTS
Kernel: Linux 4.15.0-46-generic
MLNX_OFED_LINUX-4.9-3.1.5.0 with 100Gbs Infiniband
g++-7.5.0

I use rm -rf CMakeFiles && rm CMakeCache.txt && cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release /root/rFaaS && cmake --build .

Then I got a compile error:
...
[ 50%] Building CXX object CMakeFiles/executor_manager.dir/server/executor_manager/cli.cpp.o
In file included from /root/rFaaS/server/executor_manager/cli.cpp:16:0:
/root/rFaaS/server/executor_manager/manager.hpp: In member function ‘void rfaas::executor_manager::ResourceManagerConnection::close_lease(int32_t, uint64_t, uint64_t, uint64_t)’:
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported
};
^
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported

/root/rFaaS/server/executor_manager/manager.hpp:82:7: warning: missing initializer for member ‘rfaas::common::LeaseDeallocation::execution_time’ [-Wmissing-field-initializers]
/root/rFaaS/server/executor_manager/manager.hpp:75:49: warning: parameter ‘allocation_time’ set but not used [-Wunused-but-set-parameter]
void close_lease(int32_t lease_id, uint64_t allocation_time, uint64_t execution_time, uint64_t hot_polling_time)
^~~~~~~~~~~~~~~
/root/rFaaS/server/executor_manager/manager.hpp:75:75: warning: parameter ‘execution_time’ set but not used [-Wunused-but-set-parameter]
void close_lease(int32_t lease_id, uint64_t allocation_time, uint64_t execution_time, uint64_t hot_polling_time)
^~~~~~~~~~~~~~
/root/rFaaS/server/executor_manager/manager.hpp:75:100: warning: parameter ‘hot_polling_time’ set but not used [-Wunused-but-set-parameter]
void close_lease(int32_t lease_id, uint64_t allocation_time, uint64_t execution_time, uint64_t hot_polling_tim)
^~~~~~~~~~~~~~~
CMakeFiles/executor_manager.dir/build.make:75: recipe for target 'CMakeFiles/executor_manager.dir/server/executor_manager/cli.cpp.o' failed

How to solve it please?

Free device selection

Currently we require from the user to specify the IP address of the interface - the user should be able to just specify the device name.

Finish resource manager implementation

The current implementation of resource manager doesn't satisfy all of the requirements.

  • Correctly distribute executor data to clients.
  • Preload executor data from file for simplified testing.
  • Distinguish between client and executor connections.
  • Test multiple clients.
  • Test multi-executor allocation
  • Receive accounting data from an executor.

Spin off rdmalib

Internally, we have developed a small library that provides high-level concepts for verbs programming - allocators, server-client abstractions, buffer queues, etc. We should spin it off as a separate project since it has a potential to be used by many other applications, not only by rFaaS.

Add C++ allocator

Currently, we expose a very low-level interface for allocating memory - as shown in this documentation:

  rdmalib::Buffer<char> in(opts.input_size, rdmalib::functions::Submission::DATA_HEADER_SIZE), out(opts.input_size);
  in.register_memory(executor._state.pd(), IBV_ACCESS_LOCAL_WRITE);
  out.register_memory(executor._state.pd(), IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);

While rdmalib::Buffer is applicable, we need a higher-level concept based on the CX++ std::allocator to integrate RDMA-managed memory into user applications.

  • Add allocator implementation in rfaaslib.
  • Encapsulate the memory registration in rdmalib - provide generic enums for local/remote read/write ops + atomics.
  • Support in allocator registration and deregistration of memory.
  • Add test demonstrating standard memory allocation.
  • Add test demonstrating allocation with std::vector and our custom allocator.

Automatic configuration of devices and platform

We should have a script that simplifies the deployment of the platform.

  • Create JSON with device data
  • Add script that checks the max size of inline data
  • Add user-configured JSON for testing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.