Coder Social home page Coder Social logo

parcolab / gpgpu-sim_uvmsmart Goto Github PK

View Code? Open in Web Editor NEW

This project forked from debashisganguly/gpgpu-sim_uvmsmart

0.0 1.0 0.0 249.92 MB

License: Other

Makefile 1.12% Python 4.80% Cuda 6.47% C 1.08% C++ 84.30% Shell 0.08% Lex 1.01% Yacc 1.11% Dockerfile 0.02% SWIG 0.01%

gpgpu-sim_uvmsmart's Introduction

UVM Smart

UVM Smart is the first repository to provide both functional and timing simulation support for Unified Virtual Memory. This framework extends GPGPU-Sim v3.2 from UBC. Currently, it supports cudaMallocManaged, cudaDeviceSynchronize, and cudaMemprefetchAsync. It includes 10 benchmarks from various benchmark suites (Rodinia, Parboil, Lonestar, Parboil, HPC Challenge). These benchmarks are modified to use UVM APIs.

If you use or build on this framework, please cite the following papers based on the functionalities you are leveraging.

  1. Please cite the following paper when using prefetches and page eviction policies.

    Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2019. Interplay between hardware prefetcher and page eviction policy in CPU-GPU unified virtual memory. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). ACM, New York, NY, USA, 224-235.

  2. Please cite the following paper when using access counter based delayed migration, LFU eviction, cold vs hot data structure classification, and page migration and pinning.

    Debashis Ganguly, Ziyu Zhang, Jun Yang, and Rami Melhem. 2020. Adaptive Page Migration for Irregular Data-intensive Applications under GPU Memory Oversubscription. In Proceedings of the 34th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2020). [TO APPEAR], New Orleans, Louisiana, USA.

Features

  1. A fully-associative last-level TLB and hence TLB look up is performed in a single core cycle,
  2. A multi-threaded page table (last level shared) walker (configurable page table walk latency),
  3. Workflow for replayable far-fault management (configurable far-fault handling latency),
  4. PCIe transfer latency based on equation derived from curve fitting transfer latency vs transfer size,
  5. PCIe read and write stage queues and transactions (serialized transfers and queueing delay for transaction processing),
  6. Prefetchers (Tree-based neighborhood, Sequential-local 64KB, Random 4KB, On-demand migration),
  7. Page replacement policies (Tree-based neighborhood, Sequential-local 64KB, LRU 4KB, Random 4KB, LRU 2MB, LFU 2MB),
  8. 32-bit access registers per 64KB (basic block),
  9. Delayed migration based on access-counter threshold,
  10. Rounding up managed allocation and maintaining large page (2MB) level full-binary trees.

Note that currently we do not support heterogeneous systems for CPU-GPU or multi-GPU collaborative workloads. This means CPU page table (validation/invalidation, CPU-mmeory page swapping) is not simulated.

How to use?

Simple hassle-free. No need to worry about dependencies. Use the Dockerfile in the root directory of the repository.

sudo docker build -t gpgpu_uvmsmart:latest .
sudo docker run --name <container_name> -it gpgpu_uvmsmart:latest
cd /root/gpgpu-sim_UVMSmart/benchmarks/Managed/<benchmark_folder>
vim gpgpusim.config
./run > <output_file>
sudo docker cp <container_name>:/root/gpgpu-sim_UVMSmart/benchmarks/Managed/<benchmark_folder>/<output_file> .

How to configure?

Currently, we support architectural support for GeForceGTX 1080Ti with PCIe 3.0 16x. The additional configuration items are added to GeForceGTX1080Ti under configs. Change the respective parameters to simulate desired configuration.

What are included?

  1. A set of micro-benchmarks to determine semantics of prefetcher implemented in NVIDIA UVM kernel module (can be found in micro-benchmarks under root).
  2. A micro-benchmark to find out transfer bandwidth for respective transfer size (cudaMemcpy host to device).
  3. A set of benchmarks both with copy-then-execute model (in Unmanaged under benchmarks folder) and unified virtual memory (in Managed under benchmarks folder).
  4. Specification of the working set, iterations, and number of kernels launched for managed versions of the benchmarks.
  5. Output log, scripts to plot, and the derived plots for ISCA and IPDPS papers in Results under benchmarks folder.

Copyright Notice

Copyright (c) 2019 Debashis Ganguly, Department of Computer Science, School of Computing and Information, University of Pittsburgh All rights reserved

gpgpu-sim_uvmsmart's People

Contributors

debashisganguly avatar ronianz avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.