Intel® VTune™ Profiler

User Guide

ID 766319
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Configure GPU Analysis from Command Line

Use the -knob option for configuring Intel® VTune™ Profilerto profile applications that use a Graphics Processing Unit (GPU) for rendering, video processing, and computations. GPU analysis monitors overall GPU activity (graphics, media, and compute), collects Intel® HD Graphics and Intel® Iris® Graphics hardware metrics, and then shows this data correlated with CPU processes and threads.

The following knobs are supported for GPU analysis:

Knob Name

Supported Analysis Types

Description

enable-gpu-usage=true | false

runss, runsa

Analyze frame rate and usage of Processor Graphics engines.

gpu-counters-mode=none |overview | global-local-accesses | compute-extended | full-compute | render-basic

gpu-hotspots, graphics-rendering, gpu-offload, runss, runsa

Analyze performance data from Processor Graphics based on the GPU Metrics Reference.

  • overview - track general GPU memory accesses such as Memory Read/Write Bandwidth, GPU L3 Misses, Sampler Busy, Sampler Is Bottleneck, and GPU Memory Texture Read Bandwidth. These metrics can be useful for both graphics and compute-intensive applications.

  • global-local-accesses - include metrics that distinguish accessing different types of data on a GPU: Untyped Memory Read/Write Bandwidth, Typed Memory Read/Write Transactions, SLM Read/Write Bandwidth, Render/GPGPU Command Streamer Loaded, and GPU EU Array Usage. This metrics are useful for compute-intensive workloads on the GPU.

  • compute-extended - analyze GPU activity on the Intel processor code name Broadwell. This metrics set is disabled for other systems.

  • full-compute - collect both overview and compute-basic metrics with the allow-multiple-runs option enabled to analyze all types of EUs array stalled/idle issues in the same view.

  • render-basic (preview) - collect Pixel Shader, Vertex Shader, and Output Merger metrics.

This option is available only for supported platforms with the Intel Graphics Driver installed.

gpu-sampling-interval=<value in us>

gpu-hotspots, runss, runsa

Set the interval between GPU samples between 10 and 1000 microseconds. Default is 1000us. An interval of less than 100us is not recommended.

enable-gpu-runtimes=true | false

gpu-hotspots, runss, runsa

Capture the execution time of OpenCL™ kernels and Intel Media SDK programs on a GPU, identify performance-critical GPU computing tasks, and analyze the performance per GPU hardware metrics.

NOTE:

OpenCL kernels analysis is currently supported for Windows and Linux target systems with Intel HD Graphics and Intel Iris Graphics. Intel® Media SDK Program Analysis Configuration is supported for Linux targets only and should be started with root privileges.

Examples

Example 1: Running Analysis for an Intel Media SDK Application

This example starts vtune as root and launches the GPU Compute/Media Hotspots analysis for an Intel Media SDK application running on Linux:

vtune  -collect gpu-hotspots -knob enable-gpu-runtimes=true -r quadrant_r001 -- BitonicSort

To analyze a remote Linux target from the Windows system, the same example looks as follows:

vtune -target-system=ssh:user1@172.16.254.1 -collect gpu-hotspots -knob enable-gpu-runtimes=true -r quadrant_r001 -- BitonicSort.exe

Example 2: Running Analysis with OpenCL Kernels Tracing

Perform GPU Compute/Media Hotspots or custom analysis, enabling the enable-gpu-usage knob to analyze GPU usage of a processor graphics engine, using the Overview gpu-counters-mode counter set, which is available only on a supported platform with an Intel Graphics Driver installed. Enable tracing of OpenCL kernels execution with the enable-gpu-runtimes option.

For example, to run GPU Compute/Media Hotspots analysis, collect GPU hardware metrics and trace OpenCL kernels on the BitonicSort application (-g is the option of the application), enter:

vtune -collect gpu-hotspots -knob gpu-counters-mode=overview -knob enable-gpu-runtimes=true -- BitonicSort -g

GPU Analysis on Android* System

You can enable GPU analysis for algorithm analysis types on Android systems with Intel HD Graphics and Intel Iris Graphics by using the following knobs:

  • enable-gpu-usage to analyze frame rate and usage of Intel HD Graphics and Intel Iris Graphics engines based on ftrace events

  • gpu-counters-mode to analyze performance data from Intel HD Graphics and Intel Iris Graphics based on the preset counter sets

  • gpu-sampling interval to specify a data collection interval between GPU samples

This example runs the GPU Compute/Media Hotspots analysis and monitors GPU usage.

host>./vtune -collect gpu-hotspots -target-system=android -r quadrant_r001 -target-process com.intel.fluid -knob enable-gpu-usage=true -knob gpu-counters-mode=overview

See Also