Coder Social home page Coder Social logo

zmq-collectives-rs's Introduction

This library implements a SPMD (single program multiple data) model and collective communication algorithms (Robert van de Geijn's Binomial Tree) in Rust using 0MQ. The library provides log2(N) algorithmic performance for each collective operation over N compute hosts.

Collective communication algorithms are used in HPC (high performance computing) / Supercomputing libraries and runtime systems such as MPI and OpenSHMEM.

Documentation for this library can be found on it's wiki.

Algorithms Implemented

  • Broadcast
  • Reduction
  • Scatter
  • Gather
  • Barrier

Configuring Distributed Program Execution

This library requires the use of environment variables to configure distributed runs of SPMD applications. Each of the following environment variables needs to be supplied to correctly run programs:

  • ZMQ_COLLECTIVES_NRANKS
  • ZMQ_COLLECTIVES_RANK
  • ZMQ_COLLECTIVES_ADDRESSES

ZMQ_COLLECTIVES_NRANKS - unsigned integer value indicating how many processes (instances or copies of the program) are running.

ZMQ_COLLECTIVES_RANK - unsigned integer value indicating the process instance this program represents. This is analogous to a user provided thread id. The value must be 0 or less than ZMQ_COLLECTIVES_NRANKS.

ZMQ_COLLECTIVES_ADDRESSES - should contain a ',' delimited list of ip addresses and ports. The list length should be equal to the integer value of ZMQ_COLLECTIVES_NRANKS. An example for a 2 rank application name app is below:

ZMQ_COLLECTIVES_NRANKS=2 ZMQ_COLLECTIVES_RANK=0 ZMQ_COLLECTIVES_ADDRESSES=127.0.0.1:5555,127.0.0.1:5556 ./app

ZMQ_COLLECTIVES_NRANKS=2 ZMQ_COLLECTIVES_RANK=1 ZMQ_COLLECTIVES_ADDRESSES=127.0.0.1:5555,127.0.0.1:5556 ./app

In this example, Rank 0 maps to 127.0.0.1:5555 and Rank 1 maps to 127.0.0.1:5556.

HPC batch scheduling systems like Slurm, TORQUE, PBS, etc. provide mechanisms to automatically define these environment variables when jobs are submitted.

Notes

0MQ uses sockets/file descriptors (same thing) to handle communication and asynchrony control. There is a GNU/Linux kernel configurable ~2063 default limit on the number of file descriptors/sockets a user process is authorized to open during execution. The TcpBackend uses 2 file descriptors/sockets. In 0MQ terms these sockets are ZMQ_ROUTER.

tcp is a "chatty" protocol; tcp requires round trips between clients and servers during the data transmission exchange to ensure data is communicated correctly. The use of this protocol makes it less than ideal for jobs requiring high performance. However, tcp is provided in 0MQ and is universally accessible (tcp is a commodity protocol) and makes for a reasonable place to plant a flag for providing an implementation.

This library requires libzmq. LD_LIBRARY_FLAGS and PKG_CONFIG_PATH needs to point to the directories that the libzmq library has been is installed. As an example, let's say a user has installed libzmq into a directory with the environment variable named:

$LIBZMQ_INSTALL_PREFIX_PATH

libzmq.a or libzmq.so would be installed in the directory: $LIBZMQ_INSTALL_PREFIX_PATH/lib

libzmq.pc can be found in the directory: $LIBZMQ_INSTALL_PREFIX_PATH/lib/pkgconfig

License

Boost Version 1.0

Date

03MAY2021

Author

Christopher Taylor

Dependencies

zmq-collectives-rs's People

Contributors

ct-clmsn avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

opedroso

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.