Coder Social home page Coder Social logo

llnl / scr Goto Github PK

View Code? Open in Web Editor NEW
97.0 22.0 35.0 299.8 MB

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.

Home Page: http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi

License: Other

Shell 4.15% C 69.43% Perl 0.54% Python 22.76% CMake 3.12% HTML 0.02%
scalable checkpoint mpi radiuss data-management

scr's Introduction

Scalable Checkpoint / Restart (SCR) Library

The Scalable Checkpoint / Restart (SCR) library enables MPI applications to utilize distributed storage on Linux clusters to attain high file I/O bandwidth for checkpointing, restarting, and output in large-scale jobs. With SCR, jobs run more efficiently, recompute less work upon a failure, and reduce load on critical shared resources such as the parallel file system.

Users

Instructions to build and use SCR are hosted at scr.readthedocs.io.

For new users, the Quick Start guide shows one how to build and run an example using SCR.

For more detailed build instructions, refer to Build SCR.

User Docs Status

Contribute

As an open source project, we welcome contributions via pull requests, as well as questions, feature requests, or bug reports via issues. Please refer to both our code of conduct and our contributing guidelines.

Developers

Developer documentation is provided at SCR-dev.ReadTheDocs.io.

Developer Docs Status

SCR uses components from ECP-VeloC, which have their own user and developer docs.

A development build is useful for those who wish to modify how SCR works. It checks out and builds SCR and many of its dependencies separately. The process is more complicated than the user build described above, but the development build is helpful when one intends to commit changes back to the project.

For a development build of SCR and its dependencies on SLURM systems, one can use the bootstrap.sh script:

git clone https://github.com/LLNL/scr.git
cd scr

./bootstrap.sh

cd build
cmake -DCMAKE_INSTALL_PREFIX=../install ..
make install

When using a debugger with SCR, one can build with the following flags to disable compiler optimizations:

./bootstrap.sh --debug

cd build
cmake -DCMAKE_INSTALL_PREFIX=../install -DCMAKE_BUILD_TYPE=Debug ..
make install

One can then run a test program:

cd examples
srun -n4 -N4 ./test_api

For developers who may be installing SCR outside of an HPC cluster, who are using Fedora, and who have sudo access, the following steps install and activate most of the necessary base dependencies:

sudo dnf groupinstall "Development Tools"
sudo dnf install cmake gcc-c++ mpi mpi-devel environment-modules zlib-devel pdsh
[restart shell]
module load mpi

Authors

Numerous people have contributed to the SCR project.

To reference SCR in a publication, please cite the following paper:

Additional information and research publications can be found here:

https://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.