Branch | Status |
---|---|
develop | |
master |
Benchmarks for unified memory handling on GPU
- How fast can system data be accessed?
- System buffer fill
- {aligned, unaligned} x {mapped/managed/system}
- System buffer copy
- {aligned, unaligned} x {mapped/managed/system}
- System buffer fill
- What is the granularity of access
- Interleave modifications to strided regions with multiple GPUs
- {mapped/managed/system}
- Interleave modifications to strided regions with multiple GPUs
- Are system atomics supported?
- {mapped/managed/system}
- How fast can a memory region be created?
- Allocation + {no touch / cpu / gpu / both}
- Page Fault cost
- Triangle Counting
- GEMM
mkdir build
cd buid
cmake ..
make
Benchmarks are in src/benchmarks/*
. Do src/benchmark/[class]/[the-benchmark] --help
to see all of the options.
There are also some utilities in src/
:
src/test-system-allocator
: check to see if the system allocator is working
-
Interactive Job (1 GPU)
srun --partition=gpu-debug --pty --nodes=1 \
--ntasks-per-node=12 --cores-per-socket=12 \
--gres=gpu:v100:1 --mem-per-cpu=1500 \
--time=2:00:00 --wait=0 --export=ALL /bin/bash
- Uses lyra for cli option parsing.
- Uses hunter for package management.
- Uses spdlog for logging.
- Uses Atrox/github-actions-badge for Github Actions status badge
Landaverde, Raphael, et al. "An investigation of unified memory access performance in cuda." 2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2014. pdf
Li, Wenqiang, et al. "An evaluation of unified memory technology on nvidia gpus." 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE, 2015. pdf