Coder Social home page Coder Social logo

calculon's Introduction

DOI

Calculon - Co-design for large scale parallel applications

Running

Run Calculon like this:

$> PYTHONPATH=. ./bin/calculon <args>

Calculon is a hierarchical command line. To see the commands it accepts, use --help or -h:

$> PYTHONPATH=. ./bin/calculon -h

You can also see how to use any command specifically by using --help or -h on the command:

$> PYTHONPATH=. ./bin/calculon llm -h

LLM Example

Run a single calculation for LLM (~1 sec):

$> PYTHONPATH=. ./bin/calculon llm models/megatron-1T.json examples/3072_t4_p64_d12_mbs4_full.json systems/a100_80g.json -

Run a system execution optimizer for LLM (~1 min):

$> PYTHONPATH=. ./bin/calculon llm-optimal-execution models/turing-530B.json 5128 2520 float16 systems/a100_80g.json output.json -m

opt_exe.json will contain the optimal way to run Turing-530B across 5128 A100 GPUs.

To store results from all successful runs from the same experiment, run a special system optimizer (~1 min):

$> PYTHONPATH=. ./bin/calculon llm-all-executions models/turing-530B.json 5128 2520 float16 systems/a100_80g.json all_output.csv

Testing and validation (optional)

To make sure that the current build is working, use

$> make test

To validate Calculon performance modeling against Megatron run on NVIDIA's Selene A100-based supercomputer with results published in "Sequence parallelism" paper, use

$> PYTHONPATH=. ./bin/calculon llm-validation

Publications

  • Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models
    Mikhail Isaev, Nic McDonald, Larry Dennison, Richard Vuduc
    Paper

  • Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training
    Mikhail Isaev, Nic McDonald, Richard Vuduc
    Paper

calculon's People

Contributors

nicmcd avatar michael-isaev avatar quetric avatar

Forkers

avnf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.