This README describes what is included in the artifact for the paper "Isolating GPU Architectural Features using Parallelism-Aware Microbenchmarks". This is split into 3 separate microbenchmarks:
- Cache latency and bandwidth (
1.cache
) - Memory throughput (
2.transpose
) - Arithmetic performance (
3.arithmetic
)
For each of these microbenchmarks, we include the following:
- C++ and CUDA source code
- Build system using
make
- Python helper scripts for data collection
- Gnuplot scripts for visualising the data
- The verbatim results used in the paper itself
In order to compile and run the microbenchmarks in this artifact, the following must be available:
- GNU Make
- Other Make-like tools may also work, but we only support GNU Make at this time
- CUDA-capable GPU
- CUDA toolkit, version 10 or greater
nvcc
must be available- A compatible host-compiler must also be installed for the
nvcc
compiler driver to function
- (Optional) Python 3 to run the helper scripts
- (Optional) Gnuplot to reproduce the plots in the paper
It must be noted that, due to the nature of our microbenchmarks, individual microbenchmarks may require certain features to be present on the GPU. For example, our tensor core benchmarks require Compute Capability 7.0 or greater. A complete list of NVIDIA devices and their Compute Capability is available on the NVIDIA developer's website.
The artifact has been tested on the following devices:
- NVIDIA GTX 1080 Ti (Pascal)
- NVIDIA GTX 2080 Ti (Turing)
- NVIDIA A100 (Ampere)
- NVIDIA GTX Titan X (Maxwell) (part 1 only)
- NVIDIA GTX 1660 Ti (Turing) (parts 2 and 3 only)
- NVIDIA GTX 2060 (Turing) (parts 2 and 3 only)
The execution for the benchmarks can be influenced with compile time arguments
as well as runtime arguments. An explanation of these flags is included with
each individual microbenchmark. In addition to being able to indivudually
compile and run experiments, we also provide run_suite.py
, which
automatically explores (part of) the parameter space for a given
microbenchmark. These Python scripts aid the user in running experiments and
aggregating the results of these experiments.
Important note: The run_suite.py
helper scripts encapsulate both
execution and compilation of the experiments. This means that any manually
compiled executables will be overwritten and lost when the helper script is
run.
Important note: For the time being, the parameter exploration in the
run_suite.py
helper scripts is hard-coded in the Python files. This means
that users wanting to use these helper scripts need to modify them in order to
change the parameter space exploration. By default, the values are designed to be suitable for the NVIDIA GeForce 1080 Ti GPU.
In summary, our microbenchmarks are designed to be used in one of two different
ways, depending on the user's preference. Firstly, we provide run_suite.py
helper scripts which encapsulate the entire compilation-execution-exploration
cycle. Secondly, we provide a more fine-grained approach where the
microbenchmarks can be compiled (and executed) with custom flags in order to
reproduce a single experiment. This allows users to interact with the binaries
with other programming languages and automated experimental setups, such as
bash.
For precise instructions on building and running the microbenchmarks, please
see the individual README.md
files included in the relevant microbenchmark's
subdirectory.