Coder Social home page Coder Social logo

cluster-smi-docker's Introduction

What is cluster-smi

It is essentially the same as nvidia-smi but for multiple machines. Running cluster-smi and will output something as the following;

user@host $ cluster-smi

Thu Jan 18 21:35:51 2018
+---------+------------------------+---------------------+----------+----------+
| Node    | Gpu                    | Memory-Usage        | Mem-Util | GPU-Util |
+---------+------------------------+---------------------+----------+----------+
| node-00 | 0: TITAN Xp            |  3857MiB / 12189MiB | 31%      | 0%       |
|         | 1: TITAN Xp            | 11689MiB / 12189MiB | 95%      | 0%       |
|         | 2: TITAN Xp            | 10787MiB / 12189MiB | 88%      | 0%       |
|         | 3: TITAN Xp            | 10965MiB / 12189MiB | 89%      | 100%     |
+---------+------------------------+---------------------+----------+----------+
| node-01 | 0: TITAN Xp            | 11667MiB / 12189MiB | 95%      | 100%     |
|         | 1: TITAN Xp            | 11667MiB / 12189MiB | 95%      | 96%      |
|         | 2: TITAN Xp            |  8497MiB / 12189MiB | 69%      | 100%     |
|         | 3: TITAN Xp            |  8499MiB / 12189MiB | 69%      | 98%      |
+---------+------------------------+---------------------+----------+----------+
| node-02 | 0: GeForce GTX 1080 Ti |  1447MiB / 11172MiB | 12%      | 8%       |
|         | 1: GeForce GTX 1080 Ti |  1453MiB / 11172MiB | 13%      | 99%      |
|         | 2: GeForce GTX 1080 Ti |  1673MiB / 11172MiB | 14%      | 0%       |
|         | 3: GeForce GTX 1080 Ti |  6812MiB / 11172MiB | 60%      | 36%      |
+---------+------------------------+---------------------+----------+----------+

A so called ROUTER (no GPU needed) acts a server which gathers all GPU info from the NODEs. There should be only 1 router. A NODE (having GPU(s)) sends its GPU INFO to the ROUTER. A client send (no GPU needed) sends a request to the ROZTER in order to receive GPU INFO of all NODES.

Please find more information on cluster-smi on https://github.com/PatWie/cluster-smi

Prerequisites

Config file

You have to provide a yml config file in order to launch a container. A good way to do that is to create a bind mount from the host system to the container via the -v docker run argument. Please see examples below.

You can find a sample config file in the original cluster-smi-repository.

Docker hub repository

For a ready to run Docker image (which is used in the examples below), You can use the already built image medgift/cluster-smi-docker on Docker Hub. If you wish to build your own image, please refer to the Dockerfile.

How to run cluster-smi-docker - Option 1: without docker-compose

Cluster-smi router:

$ docker run --d --name cluster-smi-router -net=host -v <local-config-file-path>:/cluster-smi.yml medgift/cluster-smi-docker:latest ./cluster-smi-router

Note: No nvidia container toolkit required for the router

Cluster-smi node:

$ docker run -d --name cluster-smi-node --gpus all --net=host -v <local-config-file-path>:/cluster-smi.yml medgift/cluster-smi-docker:latest ./cluster-smi-node

Note: nvidia container toolkit required for the node

Cluster-smi client:

$ docker run --rm --net=host -v <local-config-file-path>:/cluster-smi.yml medgift/cluster-smi-docker:latest ./cluster-smi

Note: No nvidia container toolkit required for the client

If you dont want to use the "docker run ..." syntax you can put the command into a bash script and call the bashscript. You can find an example in additional_files/client-script

How to run cluster-smi-docker - Option 2: with docker-compose

You can also use docker-compose for the node and router, which makes it convenient to run in a "service" mode by enabling "restart always". This will restart the container in case it fails or the machine is rebooted (Docker deamon needs to be started on bootup, which is normally the case by default).

Cluster-smi-router:

Change to additional_scripts/cluster-smi-node directory

$ cd <repo-root>/additional_scripts/cluster-smi-router

Edit the path to your local config file in docker-compose.yml:

volume: 
  - <local_config_file_path>:/cluster-smi.yml

Run docker-compose in deamon mode

$ docker-compose up -d

Cluster-smi-node:

Change to additional_scripts/cluster-smi-node directory

$ cd <repo-root>/additional_scripts/cluster-smi-node

Edit the path to your local config file in docker-compose.yml :

volume: 
  - <local_config_file_path>:/cluster-smi.yml

Run docker-compose in deamon mode

$ docker-compose up -d

Cluster-smi client:

Same as in option 1.

Note: Using docker-compose for the client adds no benefits.

If you dont want to use the "docker run ..." syntax you can put the command into a bash script and call the bashscript. You can find an example in additional_files/client-script

cluster-smi-docker's People

Contributors

ieggel avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

consilium538

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.