Coder Social home page Coder Social logo

gpuprofiler's Introduction

GPUProfiler Github All Releases

Latest Release

The latest release of GPUProfiler is v1.07a3 on 06-27-2021

https://github.com/JeremyMain/GPUProfiler/releases/tag/v1.07a3

The first release of GPUProfiler for Linux v1.07b can be found on the GPUProfiler for Linux project page

All Releases

https://github.com/JeremyMain/GPUProfiler/releases

DISCLAIMER:

I am an NVIDIA employee however GPUProfiler has been developed and released independently of my employment at NVIDIA. GPUProfiler is not an NVIDIA product, nor is it supported or endorsed by NVIDIA. It is provided as a binary-only Freeware closed source project.

GPUProfiler was created to accelerate analysis of resource utilization within physical environments to allow for better resource sizing for virtual GPU environments and troubleshoot performance issues.

Why?

I needed a small tool to understand existing system configuration and performance metrics that impact the sizing decision making process.

After several years of attempting to extract configuration information, utilization metrics and software configuration information from partners or customers, I realized that having a small tool that could be easily shared with partners and customers to help understand what resources are being highly utilized and the context of that utilization information.

GPUProfiler is not a source code profiler but a resource and utilization profile that can provide a snapshot of a system and select resource utilization metrics over a period of time.

I included capturing many of the important system details to correlate utilization information within a sea of desktop, workstation and server hardware. These include: CPU type, number of logical cores, frequency, system memory, OS, GPU and driver version.

A small number of metrics using native calls or the NVIDIA APIs NVAPI or NVML were used to minimize measurement impact. When an NVIDIA GPU is not detected, all other metrics can be collected for comparing CPU only and GPU workloads. I initially attempted to stay away from using WMI based counters due to additional overhead incurred. However remoting protocol vendors only provide protocol metrics via WMI, therefore to enable capture of selected protocol metrics I am now using WMI when a native query is not available.

The tool was to be small, easy to run, collect, save and ultimately share with our partners. It has in my opinion become invaluable in sizing for vGPU environments and for troubleshooting both physical and virtual environments.

Main Features

System information.

In this example we have a virtual machine that is running on VMWare vSphere. For physical machines it will collect the manufacturer, Model and BIOS information. The host name of the machine and the OS and build number. Next is the CPU model, number of logical cores and frequency as well as the system memory.

If detected, the NVIDIA GPU model or vGPU type, driver mode (WDDM or TCC), GPU memory and for physical machines the GPU VBIOS information. Finally the GPU driver version and hypervisor agent version.

Collection Options

Collection options include the sample interval and duration when used with the “start” button. The monitor button allows for continuous monitoring of a 5 minute interval. While monitoring, you can stop the monitor mode and save that in GPD format as well. New will delete the current data an allow you to start collecting again. The save button will allow the current data to be saved in GPD format and the export button will export to a CSV file.

Display Options

The display options will show or hide selected metrics. Each metric has it’s own hotkey bindings, the first press will bold the line and the second keypress will hide the metric. C for CPU, R for RAM, G for GPU, F for Framebuffer, E for Video Encode, D for Video Decode and P for Protocol metrics. Selecting the checkbox will hide or show the selected counter utilization. The hotkey B will allow all lines to be bolded in three steps. If a metric does not have data in a loaded file the checkbox will be disabled. The current release version does not have Network support enabled and is always disabled.

Process Utilization

While collecting data, the current process that are using the GPU are displayed in the process utilization list. Process that do not have user query permission will not be displayed but will be part of the TOTAL utilization. The elements shown are GPU, GPU Memory Controller, encoder and decoder utilization. The process utilization information is not serialized in the GPD file.

Utilization Graphs

Within the utilization graphs, you may zoom in or out using the left mouse button or scroll using the right mouse button, scroll wheel or page up/down keys. Control + scroll wheel will zoom into or out of the center of the graph.

Protocol Graphs

The protocol graph will be show in physical or virtual environments where a remote session is detected. The current protocol (HDX, PCoIP or Blast) remoted FPS as well as the protocol latency. The protocol latency is a function of the network latency and the latency encode/decoding the protocol stream. When the protocol latency is highly variant, consider investigating the endpoint configuration to confirm it’s ability to support the protocol at the desired number of displays and resolutions.

Value Inspector

Within the utilization graphs, the hotkey V can be used to show or hide the Value inspector, or U for the Utilization inspector. This will display the collected utilization values at a certain period of time.

Utilization Histogram

For the visible range, and analysis button will show a resource utilization histogram and average value over the range. In future version, the method to display this information will need to be optimized to support multi-GPU configurations and new metrics.

Tool Window Mode

Double clicking the graph area will change the display mode to be a tool view, and via the options menu it can be selected to run as always on top if desired. Double clicking the graph area again will revert back to the standard display mode.

Quick Start

Start with asking your customer to download GPUProfiler from the GitHub page, this allows me to understand the number of downloads of the tool. For customer security reasons that a direct download is not possible, please provide the tool as you see fit.

The customer would then extract the binary from the archive and start GPUProfiler.

NOTE: The binary is not yet signed and on first execution a security dialog will be displayed.

Next determine the duration that you would like to collect data and click the “start” button.

Have the customer start their workload and select the “stop” button if the workload procedure has completed before the duration has been reached. Save the GPD datafile with a descriptive name that identifies the workload or unique configuration being measured.

Compress the .GPD file and share with you. The customer may also wish to review the results and can do so by loading the GPD file in GPUProfiler as well as export the collected data to CSV format for further analysis.

Here is what GPUProfiler looks like after it has collected data from a vGPU VM that was running Dassault Systems Catia V5. In this case the remoting protocol was VMware Horizon Blast protocol.

There is no requirements to view the resulting GPUProfiler data file on the host where it was generated, nor do you need to have a GPU to view the data files. A console based version of GPUProfiler for Linux or Linux kernel derived hypervisors is in limited beta and will be released shortly.

Documentation

Development

(Last updated June 2021) Here is a brief look at some of the new additions in the new feature branch v1.07b

License State / Type

Collection of the current license state and license edition for vGPU customers. If a license is not detected it will display “Unlicensed”.

Displays and Resolutions

The number of connected displays and current resolutions. This will be updated each time a display is added or resolution is changed and will show the current display and resolutions when used with the value inspector.

Display Capture (NVFBC/NVIFR) and Encoder (NVENC) Metrics

For protocols that use the NVIDIA capture SDKs NVFBC or NVIFR, the capture FPS and latency are collected. The hardware video encoder SDK NVENC metrics for encoder FPS and encoder latency are also collected. This was added for analyzing CloudXR encoder requirements and impact of multiple vGPU VMs on a single GPU.

Protocol Metrics

The protocol network Tx, RX and loss information (if available) will be available in a separate graph below the protocol metrics. Current supported protocols are Citrix HDX Thinwire, Teradici PCoIP, VMware Blast.

Network Metrics

The overall network utilization (Tx/Rx) will be collected to support protocols that do not expose WMI metrics as well as to monitor the general network requirements. This is supported for both bare-metal as well as virtual environments

Bug Reports

Please submit a bug report for issues that occur or feature requests that would make the application more useful.

License

This software is provided as Freeware, this is a closed source project.

gpuprofiler's People

Contributors

jeremymain avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpuprofiler's Issues

Add GPU clock rate to advanced display settings

GPU utilization is a relative value in relation to the GPU clock. Correlating the GPU clock with the reported utilization for instances where intermittent non-sustained GPU work does not trigger the boost clock and could be misinterpreted as a requirement for more GPU resource than is actually required for the workload.

Add command line option to pass a label command to the active profiling session

When using GPUProfiler with a batch file, allow calling the GPUProfiler executable with a command line option to simply add a user defined label to the profile timeline during profiling.

This will be useful when using batch files to automate testing of different configurations.
Because it will be a simple command I could either create a small EXE to perform this or add this to the main application.

This is not an implementation sample, just a mockup to illustrate the feature.
image

Insert label during profiling via hotkey

When profiling is being performed, using a global hot key to pop-up a dialog to capture the label, (Ex: "Model load start") and insert that label into the graph output.
Storing of the label data was planned and part of the .GPD file format

prioritize use of NVML over NVAPI where supported

To enable more detailed performance metrics, I will prioritize the use of the NVML API over NVAPI.
There is a limitation in NVML that it only supports x64 build only, therefore the x86 build will lack the ability to use NVML.
Viewing .GPD files will be unaffected by this limitation

VM agent version

When used within a VM, capture the agent version information and save within the GPD file.

Document GPD file format

Define what information is collected and saved in a GPD file.
This would be useful for users that may wish to share profile data but are reluctant to due to not knowing the scope of the collected data.

Add "monitor" mode

When the tool is being used to simply monitor for demo purposes or for performing an initial investigation where the entire sample term data is not intended to being saved, the mode would allow endless monitoring of the resource states.
When the monitor mode is stopped, an option to save the data would be possible within the current visible range.

advanced GPU selection and informations

hello Jeremy, thanks A LOT to build a nice GUI instead of the sh** nvidia-smi

now a lot of options i need are not there:

  • multi-GPU support and selection. with more than 1 GPU, i must be able to pick the 1 and want to analysis
  • missing GPU infos:
  • bus-ID
  • driver model : WDDM version or TCC
  • GPU boost : enable or disable
  • GPU clock : real clock (if boost is enable, then always the max)
  • Vram clock
  • [don't know if this available ]: a SEPARATE %usage of GPU and Vram... in nvidia-smi both are reported into the single GPU%, so you don't know which one is starving first...

regards,
fred

Add timestamp collection at profile start and add to output (GPD/CSV) resource utilization inspector

When profiling for long periods of time, users may encounter periods where they notice some performance difference in their normal application usage that may wish to correlate with the data collected during the profiling run.

Having the ability to show in the graph the actual time the data was sampled at would simplify pinpointing when the event occurred.

Adding the label insertion support via hot-key would also be a useful addition to support this end-user assisted workload profiling.

Zoom-full after profile stop

Prior versions displayed the entire intended data graph even if the data collection process was stopped early. Now the graph will display the entire graph data on early stop or if the view is zoomed during collection.

histogram handling of not utilized (0) and fully utilized (100)

I had noticed that the histogram calculation was not including the 100% utilized values in the 90 ~ 99% bar graph.

Additionally, the 0 values for a resource will not be added to the histogram bucket.
When a resource is not being used rather than showing the 0~10 as 100% probability it will now display nothing as no utilization data exists.

Before: v1.02 ~ v1.03
image

After:
image

Compare the CPU utilization (issue with 100%) in the histogram and the GPU utilization as well as the encoder/decoder utilization for the improved 0% handling

GPU Boost state

Fred's request
Record at profile time of GPU boost is enabled and save within the GPD file

Alternative display "Dark Mode"

adding an alternative display mode with a darker color palette
image

The biggest challenge for completing this is simply getting the Win32 controls to adhere to the new palette.

Add Memory controller, Bus utilization information

Fred's request:

SEPARATE %usage of GPU and Vram... in nvidia-smi both are reported into the single GPU%, so you don't know which one is starving first...

JJM:
[ nvidia-smi -q ] does list the various utilization data for SM, memory controller, bus, encoder and decoder.

GPU BUS ID information

Fred's request:

Add the GPU's BUS ID information to the system state information captured and saved in the GPD file

Option to temporarily bold display graph lines

By using a keyboard accelerator display the lines thicker to aid in situations where fine-details may be lost (during presentations, using a projector, etc.)

candidate would be the 'B' key.

Question: should there be two or three thickness levels ?

Graph only view

double click on the graph output and the window will only display the graph output.
Double clicking again will return the display to the standard view

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.