Coder Social home page Coder Social logo

symbioticlab / infiniswap Goto Github PK

View Code? Open in Web Editor NEW
233.0 19.0 48.0 1.17 MB

Infiniswap enables unmodified applications to efficiently use disaggregated memory.

Makefile 0.64% C 74.90% Shell 10.90% Python 6.17% M4 7.39%
rdma memory-disaggregation

infiniswap's Introduction

Infiniswap: Efficient Memory Disaggregation

Infiniswap is a remote memory paging system designed specifically for an RDMA network. It opportunistically harvests and transparently exposes unused memory to unmodified applications by dividing the swap space of each machine into many slabs and distributing them across many machines' remote memory. Because one-sided RDMA operations bypass remote CPUs, Infiniswap leverages the power of many choices to perform decentralized slab placements and evictions.

Extensive benchmarks on workloads from memory-intensive applications ranging from in-memory databases such as VoltDB and Memcached to popular big data software Apache Spark, PowerGraph, and GraphX show that Infiniswap provides order-of-magnitude performance improvements when working sets do not completely fit in memory. Simultaneously, it boosts cluster memory utilization by almost 50%.

Detailed design and performance benchmarks are available in our NSDI'17 paper.

Prerequisites

The following prerequisites are required to use Infiniswap:

  • Software

    • Operating system: Ubuntu 14.04 (kernel 3.13.0, also tested on 4.4.0/4.11.0)
    • Container: LXC (or any other container technologies) with cgroup (memory and swap) enabled
    • RDMA NIC driver: MLNX_OFED 3.2/3.3/3.4/4.1 (recommend 4.1), and select the right version for your operating system.
  • Hardware

    • Mellanox ConnectX-3/4 (InfiniBand)
    • An empty and unused disk partition

Code Organization

The Infiniswap codebase is organized under three directories.

  • infiniswap_bd: Infiniswap block device (kernel module).
  • infiniswap_daemon: Infiniswap daemon (user-level process) that exposes its local memory as remote memory.
  • setup: scripts for setup and installation.

Important Parameters

There are several important parameters to configure in Infiniswap:

  • Infiniswap block device (in infiniswap_bd/infiniswap.h)

    1. BACKUP_DISK [disk partition]
      It's the name of the backup disk in Infiniswap block device.
      How to check the disk partition status and list?
      "sudo fdisk -l"
    2. STACKBD_SIZE_G [size in GB]
      It defines the size of Infiniswap block device (also backup disk).
    3. MAX_SGL_LEN [num of pages]
      It specifies how many pages can be included in a single swap-out request (IO request).
    4. BIO_PAGE_CAP [num of pages]
      It limits the maximum value of MAX_SGL_LEN.
    5. MAX_MR_SIZE_GB [size]
      It sets the maximum number of slabs from a single Infiniswap daemon. Each slab is 1GB.
    // example, in "infiniswap.h" 
    #define BACKUP_DISK "/dev/sda4"  
    #define STACKBD_SZIE_G 12  // 12GB
    #define MAX_SGL_LEN 32  // 32 x 4KB = 128KB, it's the max size for a single "struct bio" object.
    #define BIO_PAGE_CAP 32
    #define MAX_MR_SIZE_GB 32 //this infiniswap block device can get 32 slabs from each infiniswap daemon.
  • Infiniswap daemon (in infiniswap_daemon/rdma-common.h)

    1. MAX_FREE_MEM_GB [size]
      It is the maximum size (in GB) of remote memory this daemon can provide (from free memory of the local host).
    2. MAX_MR_SIZE_GB [size]
      It limits the maximum number of slabs this daemon can provide to a single infiniswap block device.
      This value should be the same of "MAX_MR_SIZE_GB" in "infiniswap.h".
    3. MAX_CLIENT [number]
      It defines how many infiniswap block devices a single daemon can connect to.
    4. FREE_MEM_EVICT_THRESHOLD [size in GB]
      This is the "HeadRoom" mentioned in our paper.
      When the remaining free memory of the host machines is lower than this threshold, infiniswap daemon will start to evict mapped slabs.
    // example, in "rdma-common.h" 
    #define MAX_CLIENT 32     
    
    /* Followings should be assigned based on 
    * memory information (DRAM capacity, regular memory usage, ...) 
    * of the host machine of infiniswap daemon.    
    */
    #define MAX_FREE_MEM_GB 32    
    #define MAX_MR_SIZE_GB  32    
    #define FREE_MEM_EVICT_THRESHOLD 8    

How to configure those parameters?

  • If you use the provided installation script (setup/install.sh) You can configure those parameters by changing the value of the variables in setup/install.sh before installation. In setup/install.sh, the definition of the variable and which parameter it maps to have been declared. You can edit its value as needed. For example,

    #stackbd (backup) disk size, also the total size of remote memory of this bd
    #(STACKBD_SIZE), default is 12
    stackbd_size=12
  • If you choose to build Infiniswap manually, you need to add configuration options to configure command. You can get the definitions of those options by

    # after ./autogen.sh
    ./configure --help

    See its Optional Features, like:

    --enable-stackbd_size   User defines the size of stackbd (backup) disk which
                            should be >= the size of remote memory, default is
                            12
    

    For example, if your Infiniswap block device has 24GB space in both its backup disk and remote memory, you need to

    ./configure --enable-stackbd_size=24

How to Build and Install

In a simple one-to-one experiment, we have two machines (M1 and M2).
Applications run in container on M1. M1 needs remote memory from M2.
We need to install infiniswap block device on M1, and install infiniswap daemon on M2.

  1. Setup InfiniBand NIC on both machines:
cd setup  
# ./ib_setup.sh <ip>    
# assume all IB NICs are connected in the same LAN (192.168.0.x)
# M1:192.168.0.11, M2:192.168.0.12
sudo ./ib_setup.sh 192.168.0.11
  1. Compile infiniswap daemon on M2:
cd setup
# edit the parameters in install.sh 
./install.sh daemon
  1. Install infiniswap block device on M1:
cd setup
# edit the parameters in install.sh
./install.sh bd

Or, how to manually build Infiniswap?

  • Infiniswap daemon
cd infiniswap_daemon
./autogen.sh
./configure [options] 
make
  • Infiniswap block device
cd infiniswap_bd
./autogen.sh
./configure [options] 
make
sudo make install

If you want to change the parameters of Infiniswap, you can add options when executing configure. Please read how to add configure options for details.

How to Run

  1. Start infiniswap daemon on M2:

    cd infiniswap_daemon   
    # ./infiniswap-daemon <ip> <port> 
    # pick up an unused port number
    ./infiniswap-daemon 192.168.0.12 9400
  2. Prepare server (portal) list on M1:

    # Edit the port.list file (<infiniswap path>/setup/portal.list)
    # portal.list format, the port number of each server is assigned above.  
    Line1: number of servers
    Line2: <server1 ip>:<port>  
    Line3: <server2 ip>:<port>
    Line4: ...
    
    # in this example, M1 only has one server
    1
    192.168.0.12:9400
  3. Disable existing swap partitions on M1:

    # check existing swap partitions
    sudo swapon -s
    
    # disable existing swap partitions
    sudo swapoff <swap partitions>
  4. Create an infiniswap block device on M1:

    cd setup
    # create block device: nbdx-infiniswap0
    # make nbdx-infiniswap0 a swap partition
    sudo ./infiniswap_bd_setup.sh
    # If you have the error: 
    #   "insmod: ERROR: could not insert module infiniswap.ko: Invalid parameters"
    # or get the following message from kernel (dmesg):
    #   "infiniswap: disagrees about version of symbol: xxxx"
    # You need a proper Module.symvers file for the MLNX_OFED driver (kernel module)
    #
    cd infiniswap_bd
    make clean
    cd ../setup
    # Solution 1 (copy the Module.symvers file from MLNX_OFED dkms folder):
    # provide mlnx_ofed_version: 3.2,3.3,3.4,4.1, or not (default is 4.*)
    ./get_module.symvers.sh {mlnx_ofed_version}
    # ./get_module.symvers.sh 4.1
    # Or solution 2 (generate a new Module.symvers file)
    ./create_Module.symvers.sh {mlnx_ofed_version}
    # Then, recompile infiniswap block device from step 3 in "How to Build and Install"
  5. Configure memory limitation of container (LXC)

    # edit "memory.limit_in_bytes" in "config" file of container (LXC)
    
    # For example, this container on M1 can use 5GB local memory at most.
    # Additional memory data will be stored in the remote memory provided by M2.   
    lxc.cgroup.memory.limit_in_bytes = 5G

Now, you can start your applications (in container).
The extra memory data from applications will be stored in remote memory.

FAQ

  1. Does infiniswap support transparent huge page?
    Yes. Infiniswap relies on the swap mechanism in the original Linux kernel. Current kernel (we have tested up to 4.10) splits the huge page into basic pages (4KB) before swapping out the huge page.
    (In mm/vmscan.c, shrink_page_list() calls split_huge_page_to_list() to split the huge page.)
    Therefore, whether transparent huge page is enabled or not makes no difference for infiniswap.

  2. Can we use Docker container, other than LXC?
    Yes. Infiniswap requires container-based environment. However, it has no dependency on LXC. Any container technologies that can limit memory resource and enable swapping should be feasible.
    We haven't tried Docker yet. If you find any problems when running infiniswap in a Docker environment, please contact us.

  3. Invalid parameters error when insert module? There are two ways of compiling infiniswap; using 1) inbox driver 2) Mellanox OFED. When you use inbox driver, you can compile/link against kernel headers/modules. When you use Mellanox OFED, you need to compile/link against OFED headers/modules. This should be handled by configure file, and refer the Makefile that links OFED modules.

  4. Others issues about compatibility

    • lookup_bdev() has different input arguments in the kernel patch. By default, we assume the patch is not installed. If you OS has this patch, you should:
      • If you use setup/install.sh, please set
        # setup/install.sh
        have_lookup_bdev_patch=1  #the default value is 0.
      • Or, if you build infiniswap_bd manually, add --enable-lookup_bdev in the configuration step.

Contact

This work is by Juncheng Gu, Youngmoon Lee, Yiwen Zhang, Mosharaf Chowdhury, and Kang G. Shin. You can email us at infiniswap at umich dot edu, file issues, or submit pull requests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.