jeremymain / gpuprofiler Goto Github PK

View Code? Open in Web Editor NEW

268.0 20.0 19.0 3.32 MB

GPUProfiler - Understand your application and workflow resource requirements

gpu monitoring windows

gpuprofiler's Introduction

GPUProfiler

Latest Release

The latest release of GPUProfiler is v1.07a3 on 06-27-2021

https://github.com/JeremyMain/GPUProfiler/releases/tag/v1.07a3

The first release of GPUProfiler for Linux v1.07b can be found on the GPUProfiler for Linux project page

All Releases

https://github.com/JeremyMain/GPUProfiler/releases

DISCLAIMER:

I am an NVIDIA employee however GPUProfiler has been developed and released independently of my employment at NVIDIA. GPUProfiler is not an NVIDIA product, nor is it supported or endorsed by NVIDIA. It is provided as a binary-only Freeware closed source project.

GPUProfiler was created to accelerate analysis of resource utilization within physical environments to allow for better resource sizing for virtual GPU environments and troubleshoot performance issues.

Why?

I needed a small tool to understand existing system configuration and performance metrics that impact the sizing decision making process.

After several years of attempting to extract configuration information, utilization metrics and software configuration information from partners or customers, I realized that having a small tool that could be easily shared with partners and customers to help understand what resources are being highly utilized and the context of that utilization information.

GPUProfiler is not a source code profiler but a resource and utilization profile that can provide a snapshot of a system and select resource utilization metrics over a period of time.

I included capturing many of the important system details to correlate utilization information within a sea of desktop, workstation and server hardware. These include: CPU type, number of logical cores, frequency, system memory, OS, GPU and driver version.

A small number of metrics using native calls or the NVIDIA APIs NVAPI or NVML were used to minimize measurement impact. When an NVIDIA GPU is not detected, all other metrics can be collected for comparing CPU only and GPU workloads. I initially attempted to stay away from using WMI based counters due to additional overhead incurred. However remoting protocol vendors only provide protocol metrics via WMI, therefore to enable capture of selected protocol metrics I am now using WMI when a native query is not available.

The tool was to be small, easy to run, collect, save and ultimately share with our partners. It has in my opinion become invaluable in sizing for vGPU environments and for troubleshooting both physical and virtual environments.

Main Features

System information.

In this example we have a virtual machine that is running on VMWare vSphere. For physical machines it will collect the manufacturer, Model and BIOS information. The host name of the machine and the OS and build number. Next is the CPU model, number of logical cores and frequency as well as the system memory.

If detected, the NVIDIA GPU model or vGPU type, driver mode (WDDM or TCC), GPU memory and for physical machines the GPU VBIOS information. Finally the GPU driver version and hypervisor agent version.

Collection Options

Collection options include the sample interval and duration when used with the “start” button. The monitor button allows for continuous monitoring of a 5 minute interval. While monitoring, you can stop the monitor mode and save that in GPD format as well. New will delete the current data an allow you to start collecting again. The save button will allow the current data to be saved in GPD format and the export button will export to a CSV file.

Display Options

The display options will show or hide selected metrics. Each metric has it’s own hotkey bindings, the first press will bold the line and the second keypress will hide the metric. C for CPU, R for RAM, G for GPU, F for Framebuffer, E for Video Encode, D for Video Decode and P for Protocol metrics. Selecting the checkbox will hide or show the selected counter utilization. The hotkey B will allow all lines to be bolded in three steps. If a metric does not have data in a loaded file the checkbox will be disabled. The current release version does not have Network support enabled and is always disabled.

Process Utilization

While collecting data, the current process that are using the GPU are displayed in the process utilization list. Process that do not have user query permission will not be displayed but will be part of the TOTAL utilization. The elements shown are GPU, GPU Memory Controller, encoder and decoder utilization. The process utilization information is not serialized in the GPD file.

Utilization Graphs

Within the utilization graphs, you may zoom in or out using the left mouse button or scroll using the right mouse button, scroll wheel or page up/down keys. Control + scroll wheel will zoom into or out of the center of the graph.

Protocol Graphs

The protocol graph will be show in physical or virtual environments where a remote session is detected. The current protocol (HDX, PCoIP or Blast) remoted FPS as well as the protocol latency. The protocol latency is a function of the network latency and the latency encode/decoding the protocol stream. When the protocol latency is highly variant, consider investigating the endpoint configuration to confirm it’s ability to support the protocol at the desired number of displays and resolutions.

Value Inspector

Within the utilization graphs, the hotkey V can be used to show or hide the Value inspector, or U for the Utilization inspector. This will display the collected utilization values at a certain period of time.

Utilization Histogram

For the visible range, and analysis button will show a resource utilization histogram and average value over the range. In future version, the method to display this information will need to be optimized to support multi-GPU configurations and new metrics.

Tool Window Mode

Double clicking the graph area will change the display mode to be a tool view, and via the options menu it can be selected to run as always on top if desired. Double clicking the graph area again will revert back to the standard display mode.

Quick Start

Start with asking your customer to download GPUProfiler from the GitHub page, this allows me to understand the number of downloads of the tool. For customer security reasons that a direct download is not possible, please provide the tool as you see fit.

The customer would then extract the binary from the archive and start GPUProfiler.

NOTE: The binary is not yet signed and on first execution a security dialog will be displayed.

Next determine the duration that you would like to collect data and click the “start” button.

Have the customer start their workload and select the “stop” button if the workload procedure has completed before the duration has been reached. Save the GPD datafile with a descriptive name that identifies the workload or unique configuration being measured.

Compress the .GPD file and share with you. The customer may also wish to review the results and can do so by loading the GPD file in GPUProfiler as well as export the collected data to CSV format for further analysis.

Here is what GPUProfiler looks like after it has collected data from a vGPU VM that was running Dassault Systems Catia V5. In this case the remoting protocol was VMware Horizon Blast protocol.

There is no requirements to view the resulting GPUProfiler data file on the host where it was generated, nor do you need to have a GPU to view the data files. A console based version of GPUProfiler for Linux or Linux kernel derived hypervisors is in limited beta and will be released shortly.

Documentation

Development

(Last updated June 2021) Here is a brief look at some of the new additions in the new feature branch v1.07b

License State / Type

Collection of the current license state and license edition for vGPU customers. If a license is not detected it will display “Unlicensed”.

Displays and Resolutions

The number of connected displays and current resolutions. This will be updated each time a display is added or resolution is changed and will show the current display and resolutions when used with the value inspector.

Display Capture (NVFBC/NVIFR) and Encoder (NVENC) Metrics

For protocols that use the NVIDIA capture SDKs NVFBC or NVIFR, the capture FPS and latency are collected. The hardware video encoder SDK NVENC metrics for encoder FPS and encoder latency are also collected. This was added for analyzing CloudXR encoder requirements and impact of multiple vGPU VMs on a single GPU.

Protocol Metrics

The protocol network Tx, RX and loss information (if available) will be available in a separate graph below the protocol metrics. Current supported protocols are Citrix HDX Thinwire, Teradici PCoIP, VMware Blast.

Network Metrics

The overall network utilization (Tx/Rx) will be collected to support protocols that do not expose WMI metrics as well as to monitor the general network requirements. This is supported for both bare-metal as well as virtual environments

Bug Reports

Please submit a bug report for issues that occur or feature requests that would make the application more useful.

License

This software is provided as Freeware, this is a closed source project.

gpuprofiler's People

Contributors

Stargazers

Watchers

Forkers

kumaran-git schoenemeyer bajorgensen imacwink lp249839965 qiudaowen clayne mchandrakandh heruix li-jun-fei thomaspreischl edvbv xiami303 danyow mitchellbrunelle pandalive hellojavaworld123 lenky0401

gpuprofiler's Issues

Add GPU clock rate to advanced display settings

GPU utilization is a relative value in relation to the GPU clock. Correlating the GPU clock with the reported utilization for instances where intermittent non-sustained GPU work does not trigger the boost clock and could be misinterpreted as a requirement for more GPU resource than is actually required for the workload.

Add GPU Clock information

Fred's request:

GPU clock : real clock (if boost is enable, then always the max)

Option to minimize the Window?

Can you please add a minimize button for the Window?

Network utilization option

I have had multiple requests to add network (send/recieve) utilization
Adding it here for tracking.

Add command line option to pass a label command to the active profiling session

When using GPUProfiler with a batch file, allow calling the GPUProfiler executable with a command line option to simply add a user defined label to the profile timeline during profiling.

This will be useful when using batch files to automate testing of different configurations.
Because it will be a simple command I could either create a small EXE to perform this or add this to the main application.

This is not an implementation sample, just a mockup to illustrate the feature.

Insert label during profiling via hotkey

When profiling is being performed, using a global hot key to pop-up a dialog to capture the label, (Ex: "Model load start") and insert that label into the graph output.
Storing of the label data was planned and part of the .GPD file format

prioritize use of NVML over NVAPI where supported

To enable more detailed performance metrics, I will prioritize the use of the NVML API over NVAPI.
There is a limitation in NVML that it only supports x64 build only, therefore the x86 build will lack the ability to use NVML.
Viewing .GPD files will be unaffected by this limitation

Multi GPU display support for workstation use

Currently capturing all GPUs utilization information but only displaying the first detected GPU.

Add video encode / decode utilization metrics (where supported)

To better understand the relationship between GPU (SM) and the video encode/decode engines I will add these metrics. Only GPUs that are supported by NVML will allow this data to be captured.

VM agent version

When used within a VM, capture the agent version information and save within the GPD file.

Make the resource plots less visually "busy" for long profile runs

Plotting each and every collected datapoint results in a very busy plot

but I have been experimenting with some improvements to better visualize the overall utilization within the bounds of the visible region.

Display data tooltip when selecting graph data

Looking at graphs is interesting but when you need to know the actual values clicking on the line data will display a tooltip with the utilization and value (where supported).

Document GPD file format

Define what information is collected and saved in a GPD file.
This would be useful for users that may wish to share profile data but are reluctant to due to not knowing the scope of the collected data.

Insert/remove label(s) via mouse operations

Post profiling segmentation of the workload events to classify resource signatures for the operations.

Resource three-state plot display state [ normal | bold | hidden ]

Using the key for each of the resources, toggle between normal, bold and hidden.
This can be used to draw attention to a single or set of utilization data. This is independent of the overall three state "bold" of all profile data

[feature request] CLI scriptable versions

Windows and Linux.
This way they can be started and managed administratively via GPOs
With possibility as light weight (no graphical output) persistent daemons.

Graph data display state during display mode changes and operational mode changes

Confirm all of the states are correctly reset on particular operations

Open
New

However when a file is dropped into the application, retain the current display options

Hide histograms of resources that have no data, not supported or 0 values only

Hide Analysis histograms where no data is available.

In this example both have no non-zero utilization data and should not be displayed until data exists. This applies for all resource utilization data.

Time axis zoom via CTRL+mouse wheel with mouse position weighting

Previously when zooming via the mouse wheel, the zoom area was weighting the range equally without respect to the current mouse position. This detracts from the tool usability during the analysis phase but fixed in the next release.

Remoting protocol setting detection

Detect the protocol specific settings and store those settings in the GPD file.

Add "monitor" mode

When the tool is being used to simply monitor for demo purposes or for performing an initial investigation where the entire sample term data is not intended to being saved, the mode would allow endless monitoring of the resource states.
When the monitor mode is stopped, an option to save the data would be possible within the current visible range.

Remoting protocol delivered frame rate / current framerate

Finding a non-WMI method to capture this information has been elusive.

Save option state values

Save the current limited number of preferences and reload on next launch.

Add a new resource graph display mode: One graph per resource

By double clicking on the graph, change to
Default mode => minimal view => separate graphs for each resource

In-graph utilization inspector - Not calculating text scaling correctly

For those using text scaling, the utilization inspector is not calculating the vertical spacing between the visible utilization elements correctly.

Current driver model - WDDM | TCC

Fred's request:
report and store in the GPD file the current (at profile time) the driver model / version:
WDDM version or TCC

VRAM clock information

Fred's request:
Capture VRAM clock information

Multi-GPU profiling and selective display

Fred's request:

Multi-GPU support and selection.
With more than 1 GPU, I must be able to pick the 1 and want to analyze.

advanced GPU selection and informations

hello Jeremy, thanks A LOT to build a nice GUI instead of the sh** nvidia-smi

now a lot of options i need are not there:

multi-GPU support and selection. with more than 1 GPU, i must be able to pick the 1 and want to analysis
missing GPU infos:

bus-ID
driver model : WDDM version or TCC
GPU boost : enable or disable
GPU clock : real clock (if boost is enable, then always the max)
Vram clock
[don't know if this available ]: a SEPARATE %usage of GPU and Vram... in nvidia-smi both are reported into the single GPU%, so you don't know which one is starving first...

regards,
fred

Sample interval - invalid range detection

When entering an invalid range the UI does not enforce a valid range setting leading to an application exception error.

Add timestamp collection at profile start and add to output (GPD/CSV) resource utilization inspector

When profiling for long periods of time, users may encounter periods where they notice some performance difference in their normal application usage that may wish to correlate with the data collected during the profiling run.

Having the ability to show in the graph the actual time the data was sampled at would simplify pinpointing when the event occurred.

Adding the label insertion support via hot-key would also be a useful addition to support this end-user assisted workload profiling.

Zoom-full after profile stop

Prior versions displayed the entire intended data graph even if the data collection process was stopped early. Now the graph will display the entire graph data on early stop or if the view is zoomed during collection.

Make resource utilization histograms more readable

For diverse utilization over a profile period, the resulting histogram scale of 0 ~ 100 is difficult to visually determine the differences between the different values.

histogram handling of not utilized (0) and fully utilized (100)

I had noticed that the histogram calculation was not including the 100% utilized values in the 90 ~ 99% bar graph.

Additionally, the 0 values for a resource will not be added to the histogram bucket.
When a resource is not being used rather than showing the 0~10 as 100% probability it will now display nothing as no utilization data exists.

Before: v1.02 ~ v1.03

After:

Compare the CPU utilization (issue with 100%) in the histogram and the GPU utilization as well as the encoder/decoder utilization for the improved 0% handling

Remove requirement to install VC2010 redistibutables (via static linking)

Most people want to use the tool as a portable application, without an installer.
Shifting to static library linking will increase the application size to 2.5MB

Remoting protocol detection

Add the current remoting protocol to the GPD data file

GPU Boost state

Fred's request
Record at profile time of GPU boost is enabled and save within the GPD file

Alternative display "Dark Mode"

adding an alternative display mode with a darker color palette

The biggest challenge for completing this is simply getting the Win32 controls to adhere to the new palette.

Add Memory controller, Bus utilization information

Fred's request:

SEPARATE %usage of GPU and Vram... in nvidia-smi both are reported into the single GPU%, so you don't know which one is starving first...

JJM:
[ nvidia-smi -q ] does list the various utilization data for SM, memory controller, bus, encoder and decoder.

List all Labels via list-box, auto zoom to label range

In a list box display all of the labels, their duration and some metrics about each period.

Check for GPD file association on first start and ask to register GPUProfiler as the default application

GPUProfiler supports drag-and-drop of GPD files, adding the file association will make viewing previously captured data simpler.

GPU BUS ID information

Fred's request:

Add the GPU's BUS ID information to the system state information captured and saved in the GPD file

Option to temporarily bold display graph lines

By using a keyboard accelerator display the lines thicker to aid in situations where fine-details may be lost (during presentations, using a projector, etc.)

candidate would be the 'B' key.

Question: should there be two or three thickness levels ?

How can we provide more Information to you?
What do you need?

Best Regards

Toggle "Legend" displayed | hidden

Selecting the "L" key, the legend can be displayed or hidden. Useful when being used in the minimal view mode.

Normal:

Legend hidden:

jeremymain / gpuprofiler Goto Github PK

gpuprofiler's Introduction

GPUProfiler

Latest Release

All Releases

DISCLAIMER:

Why?

Main Features

System information.

Collection Options

Display Options

Process Utilization

Utilization Graphs

Protocol Graphs

Value Inspector

Utilization Histogram

Tool Window Mode

Quick Start

Documentation

Development

License State / Type

Displays and Resolutions

Display Capture (NVFBC/NVIFR) and Encoder (NVENC) Metrics

Protocol Metrics

Network Metrics

Bug Reports

License

gpuprofiler's People

Contributors

Stargazers

Watchers

Forkers

gpuprofiler's Issues

Recommend Projects

Recommend Topics

Recommend Org