Coder Social home page Coder Social logo

slurm17-centos7's Introduction

slurm-centos7

Installation for the Slurm job scheduler on a CentOS 7 cluster.

Prerequisites

Conventions

  • The head node is identified as headnode
  • The compute nodes are identified as node01, node02, and node03

Installation

  1. On each node, confirm the Slurm and Munge UIDs and GIDs set in install-slurm. They must be the same on all of the nodes; modify the values if they are not available.

  2. On each node, run the installation script (this takes 20-30 minutes to complete)

    $ ./install-slurm

  3. On the head node, create a pseudorandom secret key for MUNGE to use on all of the compute nodes

    $ ./create-munge-key

  4. Copy the MUNGE secret key to all of the compute nodes

    $ bpush /etc/munge/munge.key /etc/munge/munge.key

  5. Change the owner of /etc/munge/munge.key to the munge user on all of the nodes

    $ chown munge: /etc/munge/munge.key
    $ bexec chown munge: /etc/munge/munge.key

  6. Enable and start the MUNGE service on all of the nodes

    $ systemctl enable munge
    $ systemctl start munge
    $ bexec systemctl enable munge
    $ bexec systemctl start munge

  7. From any computer, complete the Slurm configuration file generator; edit the fields according to the values below (fields not addressed below should be left as their default value or empty if there is no default value)

    • ControlMachine: headnode
    • NodeNames: node[01-03]
    • CPUs, Sockets, CoresPerSocket, and ThreadsPerCore: Values can be found by listing the CPU information on your machine with the lscpu command
    • StateSaveLocation: /var/spool/slurm
    • SlurmctldLogFile: /var/log/slurm/slurmctld.log
    • SlurmdLogFile: /var/log/slurm/slurmd.log

  8. Click submit at the bottom of the page to generate the configuration file

  9. Copy the configuration file to the head node and save the file to /etc/slurm/slurm.conf

  10. Copy the configuration file to all of the compute nodes

    $ bpush /etc/slurm/slurm.conf /etc/slurm/slurm.conf

  11. Move the cgroup configuration file to /etc/slurm/cgroup.conf (overwrite the existing file created with the install script)

    $ mv files/cgroup.conf /etc/slurm/cgroup.conf

  12. Copy the cgroup configuration file to all of the compute nodes

    $ bpush /etc/slurm/cgroup.conf /etc/slurm/cgroup.conf

  13. Disable and stop the firewalld service on all of the compute nodes

    $ bexec systemctl disable firewalld
    $ bexec systemctl stop firewalld

  14. On the head node, open port 6817 for Slurm

    $ firewall-cmd --permanent --zone=public --add-port=6817/tcp
    $ firewall-cmd --reload

  15. On the head node, enable and start the slurmctld service

    $ systemctl enable slurmctld
    $ systemctl start slurmctld

  16. On all of the compute nodes, enable and start the slurmd service

    $ systemctl enable slurmd
    $ systemctl start slurmd

Testing

Run the tests in slurm-centos7/tests

$ sbatch <file>

Check the Slurm configuration on all of the compute nodes

$ slurmd -C

Confirm that all of the compute nodes are reporting to the head node

$ scontrol show nodes

Run an interactive job

$ srun --pty bash

slurm17-centos7's People

Contributors

zachsnoek avatar hopehpc avatar

Stargazers

xtlys avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.