Coder Social home page Coder Social logo

awesome-gpu's Introduction

Awesome-GPU

Resources Management

Papers

  1. ASPLOS'17-Locality-Aware CTA Clustering for Modern GPUs
  2. ASPLOS'17-Dynamic Resource Management for Efficient Utilization of Multitasking GPUs
  3. HPCA'17-Dynamic GPGPU Power Management Using Adaptive Model Predictive Control
  4. ISCA'16-Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

Parallelism

Papers

  1. HPCA'17-Controlled Kernel Launch for Dynamic Parallelism in GPUs
  2. ISCA'16-LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs
  3. ISCA'16-Virtual Thread Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit
  4. Berkeley TechRpts'16-Understanding Latency Hiding on GPUs

Slides

  1. GTC'17-COOPERATIVE GROUPS

Cache

Papers

  1. ISCA'16-APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs
  2. SC'15-Adaptive and Transparent Cache Bypassing for GPUs

Algorithm

Papers

  1. HPCA'17-Towards Pervasive and User Satisfactory CNN across GPU Microarchitectures
  2. ASPLOS'14-Paraprox: Pattern-Based Approximation for Data Parallel Applications

Slides

  1. GTC'18-CUTLASS: CUDA TEMPLATE LIBRARY FOR DENSE LINEAR ALGEBRA AT ALL LEVELS AND SCALES

Software

  1. CUTLASS

Performance Analysis

Papers

  1. GTC'18-Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking
  2. PLDI'18-GPU Code Optimization using Abstract Kernel Emulation and Sensitivity Analysis
  3. CGO'18-CUDAAdvisor: LLVM-based runtime profiling for modern GPUs
  4. CCGRID'18-Exposing Hidden Performance Opportunities in High Performance GPU Applications
  5. Euro-Par'15-Identifying Optimization Opportunities Within Kernel Execution in GPU Codes
  6. SC'13-Effective sampling-driven performance tools for GPU-accelerated supercomputers
  7. ISPASS'12-Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures
  8. ICPP'11-Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs
  9. ISPASS'10-Demystifying GPU Microarchitecture through Microbenchmarking
  10. ISPASS'10-Visualizing Complex Dynamics in Many-Core Accelerator Architectures
  11. ISPASS'09-Analyzing CUDA Workloads Using a Detailed GPU Simulator

Books

  1. Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
  2. Monitoring Heterogeneous Applications with the OpenMP Tools Interface

Slides

  1. ECP'19-Performance Tuning of Scientific Codes with the Roofline Model
  2. GTC'18-VOLTA Architecture and performance optimization
  3. SC'10-Fundamental_Optimizations

Software

  1. Vampir|Score-P
  2. TAU
  3. PAPI
  4. Allinea MAP
  5. Open|SpeedShop
  6. HPCToolkit
  7. NVIDIA Nsight Systems
  8. NVIDIA Nsight Compute

Compiler

  1. LLVM'17-Implementing implicit OpenMP data sharing on GPUs
  2. CGO'16-gpucc: An Open-Source GPGPU Compiler
  3. LLVM'16-Offloading Support for OpenMP in Clang and LLVM
  4. PMBS'15-Performance Analysis of OpenMP on a GPU using a CORAL Proxy Application
  5. LLVM'15-Integrating GPU Support for OpenMP Offloading Directives into Clang
  6. LLVM'14-Coordinating GPU Threads for OpenMP 4.0 in LLVM

GPU Binaries

Papers

  1. CGO'19-Decoding CUDA binary
  2. ISCA'15-Flexible software profiling of GPU architectures

Slides

  1. SASSI

Documentations

White Papers

  1. Ampere-NVIDIA A100 Tensor Core GPU Architecture
  2. Turing-NVIDIA TURING GPU ARCHITECTURE
  3. Volta-NVIDIA TESLA V100
  4. Pascal-NVIDIA TESLA P100
  5. Kepler-NVIDIA’s Next Generation CUDA Compute Architecture: Kepler
  6. Fermi-NVIDIA’s Next Generation CUDA Compute Architecture: Fermi

APIs

  1. CUDA Toolkit Documentation-CUDA Toolkit Documentation

GTC

  1. GTC-GPU Technology Conference

awesome-gpu's People

Contributors

jokeren avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.