UVM Smart

UVM Smart is the first repository to provide both functional and timing simulation support for Unified Virtual Memory. This framework extends GPGPU-Sim v3.2 from UBC. Currently, it supports cudaMallocManaged, cudaDeviceSynchronize, and cudaMemprefetchAsync. It includes 10 benchmarks from various benchmark suites (Rodinia, Parboil, Lonestar, Parboil, HPC Challenge). These benchmarks are modified to use UVM APIs.

If you use or build on this framework, please cite the following papers based on the functionalities you are leveraging.

Please cite the following paper when using prefetches and page eviction policies.

Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2019. Interplay between hardware prefetcher and page eviction policy in CPU-GPU unified virtual memory. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). ACM, New York, NY, USA, 224-235.
Please cite the following paper when using access counter based delayed migration, LFU eviction, cold vs hot data structure classification, and page migration and pinning.

Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2020. Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription. In Proceedings of the 34th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2020). [TO APPEAR], New Orleans, Louisiana, USA.

Features

A fully-associative last-level TLB and hence TLB look up is performed in a single core cycle,
A multi-threaded page table (last level shared) walker (configurable page table walk latency),
Workflow for replayable far-fault management (configurable far-fault handling latency),
PCIe transfer latency based on equation derived from curve fitting transfer latency vs transfer size,
PCIe read and write stage queues and transactions (serialized transfers and queueing delay for transaction processing),
Prefetchers (Tree-based neighborhood, Sequential-local 64KB, Random 4KB, On-demand migration),
Page replacement policies (Tree-based neighborhood, Sequential-local 64KB, LRU 4KB, Random 4KB, LRU 2MB, LFU 2MB),
32-bit access registers per 64KB (basic block),
Delayed migration based on access-counter threshold,
Rounding up managed allocation and maintaining large page (2MB) level full-binary trees.

Note that currently we do not support heterogeneous systems for CPU-GPU or multi-GPU collaborative workloads. This means CPU page table (validation/invalidation, CPU-mmeory page swapping) is not simulated.

How to use?

Simple hassle-free. No need to worry about dependencies. Use the Dockerfile in the root directory of the repository.

sudo docker build -t gpgpu_uvmsmart:latest .
sudo docker run --name <container_name> -it gpgpu_uvmsmart:latest
cd /root/gpgpu-sim_UVMSmart/benchmarks/Managed/<benchmark_folder>
vim gpgpusim.config
./run > <output_file>
sudo docker cp <container_name>:/root/gpgpu-sim_UVMSmart/benchmarks/Managed/<benchmark_folder>/<output_file> .

How to configure?

Currently, we support architectural support for GeForceGTX 1080Ti with PCIe 3.0 16x. The additional configuration items are added to GeForceGTX1080Ti under configs. Change the respective parameters to simulate desired configuration.

What are included?

A set of micro-benchmarks to determine semantics of prefetcher implemented in NVIDIA UVM kernel module (can be found in micro-benchmarks under root).
A micro-benchmark to find out transfer bandwidth for respective transfer size (cudaMemcpy host to device).
A set of benchmarks both with copy-then-execute model (in Unmanaged under benchmarks folder) and unified virtual memory (in Managed under benchmarks folder).
Specification of the working set, iterations, and number of kernels launched for managed versions of the benchmarks.
Output log, scripts to plot, and the derived plots for ISCA and IPDPS papers in Results under benchmarks folder.

parcolab / gpgpu-sim_uvmsmart Goto Github PK

gpgpu-sim_uvmsmart's Introduction

UVM Smart

Features

How to use?

How to configure?

What are included?

Copyright Notice

gpgpu-sim_uvmsmart's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent