Coder Social home page Coder Social logo

score_es_benchmark's Introduction

Self-contained reproducibility package for the following poster:

P.-Y. Chuang, N. Shah, A. Attia, E. Constantinescu, & W. Feng (2023). A Recipe for Transforming High-Productivity DSL Code into High-Performance Code for Exascale Systems. 2023 SciDAC PI Meeting

Disclaimer

The results are mostly computational performance. Thererfore, using different hardware may give different values on the figures.

License Notice

No license. Except for GitHub's implicit permissions, all other uses of this repository require explicit permission granted from the author.

Preparation

Most of dependencies can be managed by Anaconda or its derived tools. Assuming under the top-level repository directory, create an environment named scoring-test with the command in the terminal:

conda env create -n scoring-test -f environment.yaml

And then get into this conda environment:

conda activate scoring-test

Now we need to compile C-extensions needed. This requires both GCC and Clang compilers because we will also compare the two compilers' performances. Once having the GCC and Clang compiler on the host computer, do

source build.sh

Lastly, we need to create the training data (i.e., mock experimental data). Do

python data_generator.py surface
python data_generator.py training

The data will be saved to a folder called data.

Speedups of the loss function alone

This steps will create the figure as below:

First, do the calculation to obtain the results:

mpiexec -n <NPROCS> python earth_mover_perf.py

<NPROCS> is the wanted number of MPI tasks. Note that this is a single-core profiling, so when launching the MPI tasks, make sure they are properly configured so that tasks do not interfere with others. Also, several of the single-core target functions require as much as 30G of memory.

To save the hassle, you can use the script earth_mover_perf.sh to submit the calculation to a Slurm-based cluster. The resources (task topology, memory, etc) are pre-configured, but you may still need to edit it according to the actual configuration of the cluster.

Then, create the figure with

python postprocess_1.py

The figure will be saved to images/earth_mover_speedups.png

Surface of the loss function

The steps will create the following figure

This step is not parallelized by MPI but just the built-in multiprocessing. Do

python earth_mover_surface.py <NPROCS>

That being said, this Python script only works on a single node, and <NPROCS> cannot exceed what this single node has.

Once the calculation is done, create the figure with

python postprocess_2.py

The figure will be saved to images/earth_mover_surface.png

Whole-workflow profiling before and after optimizing the loss

This step will create the following two figures

and

Do the calculations one-by-one with

python stats_workflow_perf.py small numpy
python stats_workflow_perf.py small c
python stats_workflow_perf.py medium numpy
python stats_workflow_perf.py medium c 
python stats_workflow_perf.py large c 
python stats_workflow_perf.py large numpy 

Note that python stats_workflow_perf.py large numpy needs about 80G of memory and takes about 3.5 hours to finish on AMD EPYC 7702. Or, you can use the script stats_workflow_perf.sh to submit a batch job to a Slurm-based cluster. Again, the resources are pre-configured, but you still need to edit it accordingly.

Once the calculations are done, create the figures with

python postprocess_3.py
python postprocess_4.py

The figures will be saved to images/profiling_before.png and images/overall_speedups.png.

Ensembel analysis

This steps will create the following figure:

This calculation is parallelized with MPI. Do

mpiexec -n <NPROCS> python stats_workflow_uq.py

Again, <NPROCS> denotes the number of MPI processes wanted. It took about 70 minutes with 512 CPUs (AMD EPYC 7702). Alternatively, the script stats_workflow_uq.sh can be used for a Slurm-based cluster. It needs to be edited according to the cluster's configuration.

Once the calculation is done, do

python postprocess_5.py

The figure will be saved to images/ensemble_analysis.png.

Contact

Pi-Yueh Chuang [email protected]

score_es_benchmark's People

Contributors

piyueh avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.