Coder Social home page Coder Social logo

spcl / rfaas Goto Github PK

View Code? Open in Web Editor NEW
43.0 8.0 16.0 983 KB

rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.

Home Page: https://mcopik.github.io/projects/rfaas/

License: BSD 3-Clause "New" or "Revised" License

CMake 4.64% C++ 92.40% Shell 2.96%
serverless faas faas-platform rdma serverless-framework

rfaas's Issues

Free device selection

Currently we require from the user to specify the IP address of the interface - the user should be able to just specify the device name.

Compilation fails on archlinux

I used the code of the latest master branch and tried to compile it on my computer, but there was a problem of spdlog dependency. You can see that the version of spdlog installed by my archlinux package manager does not match the code of the current branch. We should use gitmodule to introduce into the current project instead of relying on the system's own.
I'm willing to submit a PR for this.

OS : ArchLinux 6.2.2
spdlog version : 1.11.0-2

[libfabric-sarus branch] --skip-resource-manager flag still requires an ip & port in config

In the libfabric-sarus branch, the executor manager will crash if the resource_manager_address or resource_manager_port fields in config/executor_manager.json are not set. This crash still happens even when the executor manager is run with --skip-resource-manager. I can work on a fix at some point, but I just wanted to document the bug.

Code breaks here: https://github.com/spcl/rFaaS/blob/libfabric-sarus/server/executor_manager/manager.cpp#L28

Spin off rdmalib

Internally, we have developed a small library that provides high-level concepts for verbs programming - allocators, server-client abstractions, buffer queues, etc. We should spin it off as a separate project since it has a potential to be used by many other applications, not only by rFaaS.

Add testing framework

We need a custom system to execute integration and system tests without mocking.

  • Add device configuration (#1)
  • Add user-based configuration of testing endpoints
  • Integrate gtest with CMake
  • Add test: basic allocation
  • Add test: random-based allocation
  • Add test: simple invocation
  • Add test: warm invocations
  • Add test: async invocations
  • Add test: parallel invocations

Add C++ allocator

Currently, we expose a very low-level interface for allocating memory - as shown in this documentation:

  rdmalib::Buffer<char> in(opts.input_size, rdmalib::functions::Submission::DATA_HEADER_SIZE), out(opts.input_size);
  in.register_memory(executor._state.pd(), IBV_ACCESS_LOCAL_WRITE);
  out.register_memory(executor._state.pd(), IBV_ACCESS_LOCAL_WRITE | IBV_ACCESS_REMOTE_WRITE);

While rdmalib::Buffer is applicable, we need a higher-level concept based on the CX++ std::allocator to integrate RDMA-managed memory into user applications.

  • Add allocator implementation in rfaaslib.
  • Encapsulate the memory registration in rdmalib - provide generic enums for local/remote read/write ops + atomics.
  • Support in allocator registration and deregistration of memory.
  • Add test demonstrating standard memory allocation.
  • Add test demonstrating allocation with std::vector and our custom allocator.

Correct port selection

When running any of the services, they allow to let the port be selected by OS, but they don't change it afterwards and distribute the incorrect default value from the CLI.

Documentation generator

We should automatically generate docs from the code.

  • Add documentation generator
  • Integrate with CMake
  • Integrate with CI
  • Add missing annotations

Add Docker registry

We are wrapping up the support for using Docker images and rFaaS (#10). However, we only use DockerHub, which has multiple restrictions - throttling and network latency. We want to locally deploy a Docker registry.

  • Add scripts for deploying the Docker registry and generating JSON configuration.
  • Add configuration of Docker registry in the rFaaS executor manager configuration.
  • Add polling Docker images on container allocation in the execution manager - only when the image is not present already.
  • Support registry credentials.

Add network backend with libfabric

We want to extend rFaaS with support for Cray interconnects. The proper way would be to add a libfabric implementation with the ugni provider.

  • Isolate ibverbs/rdmacm network code and define network lib interface.
  • Initialize ugni with the help of cray-mpi
  • Implement basic P2P communication.
  • Implement and test atomic operations.
  • Evaluate communication between processes from different SLURM jobs (credentials exchange).
  • Extend test suite.

(compile error) unimplemented: non-trivial designated initializers not supported

Hi, really excellent work! The research we were doing wanted to perform a performance comparison with rFaaS, but I got an error when compiling.

OS: Ubuntu 18.04.6 LTS
Kernel: Linux 4.15.0-46-generic
MLNX_OFED_LINUX-4.9-3.1.5.0 with 100Gbs Infiniband
g++-7.5.0

I use rm -rf CMakeFiles && rm CMakeCache.txt && cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Release /root/rFaaS && cmake --build .

Then I got a compile error:
...
[ 50%] Building CXX object CMakeFiles/executor_manager.dir/server/executor_manager/cli.cpp.o
In file included from /root/rFaaS/server/executor_manager/cli.cpp:16:0:
/root/rFaaS/server/executor_manager/manager.hpp: In member function ‘void rfaas::executor_manager::ResourceManagerConnection::close_lease(int32_t, uint64_t, uint64_t, uint64_t)’:
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported
};
^
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported
/root/rFaaS/server/executor_manager/manager.hpp:82:7: sorry, unimplemented: non-trivial designated initializers not supported

/root/rFaaS/server/executor_manager/manager.hpp:82:7: warning: missing initializer for member ‘rfaas::common::LeaseDeallocation::execution_time’ [-Wmissing-field-initializers]
/root/rFaaS/server/executor_manager/manager.hpp:75:49: warning: parameter ‘allocation_time’ set but not used [-Wunused-but-set-parameter]
void close_lease(int32_t lease_id, uint64_t allocation_time, uint64_t execution_time, uint64_t hot_polling_time)
^~~~~~~~~~~~~~~
/root/rFaaS/server/executor_manager/manager.hpp:75:75: warning: parameter ‘execution_time’ set but not used [-Wunused-but-set-parameter]
void close_lease(int32_t lease_id, uint64_t allocation_time, uint64_t execution_time, uint64_t hot_polling_time)
^~~~~~~~~~~~~~
/root/rFaaS/server/executor_manager/manager.hpp:75:100: warning: parameter ‘hot_polling_time’ set but not used [-Wunused-but-set-parameter]
void close_lease(int32_t lease_id, uint64_t allocation_time, uint64_t execution_time, uint64_t hot_polling_tim)
^~~~~~~~~~~~~~~
CMakeFiles/executor_manager.dir/build.make:75: recipe for target 'CMakeFiles/executor_manager.dir/server/executor_manager/cli.cpp.o' failed

How to solve it please?

Automatic configuration of devices and platform

We should have a script that simplifies the deployment of the platform.

  • Create JSON with device data
  • Add script that checks the max size of inline data
  • Add user-configured JSON for testing

Add documentation

We have no documentation whatsoever:

  • Citation + intro
  • Installation & dependencies
  • Supported devices and platforms
  • Platform configuration
  • Pointers to installing software emulation for RDMA (softROCE).
  • Limitations
  • Examples of using our tool
  • Authors

Finish resource manager implementation

The current implementation of resource manager doesn't satisfy all of the requirements.

  • Correctly distribute executor data to clients.
  • Preload executor data from file for simplified testing.
  • Distinguish between client and executor connections.
  • Test multiple clients.
  • Test multi-executor allocation
  • Receive accounting data from an executor.

Add support for Singularity containers

Singularity is a containerization technology popular in the HPC. rFaaS should support running functions inside Singularity containers.

Tasks:

  • Add switch for using singularity and Docker.
  • Add JSON configuration (network, CPU affinity, memory).
  • Add configuration of deployment of images.
  • Invoke containers.
  • Correctly handle the shutdown.

Correct executor shutdown

Currently, we use repetition limit for executor shutdown, as it helps us with designing benchmarks. However, in real deployment, we should use the normal shutdown signal from the client.

Invocation of user-defined Docker containers

We are currently missing two functionalities - local Docker registries and configuration of container launch.

  • Add Dockerfiles for base images with executor.
  • Pass Docker image name - right now, it's hardcoded.
  • Disable loading .so from memory
  • Add a Docker-based index of the function.

Add dependency on read-writer queue

We should add readerwriterqueue as a proper dependency in the CMake build. Furthermore, we want to fetch and build Pistache to avoid the trouble of forcing users to install it manually.

  • Fetch the dependency with CMake.
  • Ensure that include directories are properly configured.
  • Add in the README proper acknowledgment of the other project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.