Intel® Advisor User Guide

ID 766448
Date 10/31/2024
Public
Document Table of Contents

Window: GPU Roofline Insights Summary

After running GPU Roofline Insights Perspective, use the GPU Roofline Insights Summary window to view the most important information about the execution of your code on a GPU and on a CPU devices.

Customize the window layout using drop-downs in the upper-right corner of each pane.

Create a snapshot of your GPU Roofline result using the button. For details, see Create a Read-only Result Snapshot..

NOTE:
Families of Intel® Xe graphics products starting with Intel® Arc™ Alchemist (formerly DG2) and newer generations feature GPU architecture terminology that shifts from legacy terms. For more information on the terminology changes and to understand their mapping with legacy content, see GPU Architecture Terminology for Intel® Xe Graphics.

Program Metrics Pane

View the most important metrics for parts of your application executed on a GPU and on a CPU. This pane tells you how well your application uses the GPU resources and how much space for improvement your application has. This pane is broken into the following sub-sections:

  • GPU Time: view total elapsed time of all compute tasks executed on a GPU device.
  • FPU Utilization: view the average percentage of GPU time when both floating-point units (FPUs) are used.
  • EU Threading Occupancy: view the percentage of cycles on all execution units (EUs) and thread slots when a slot has a thread scheduled.
  • EU IPC Rate: view the average rate of instructions per cycle (IPC) for execution units when two FPUs are used.
  • CPU Time: view total elapsed time for a part of your application executed on a CPU.
  • Thread Count: view the number of threads used for execution of your application on a CPU.

Open the drop-down menus below the main program metrics to view detailed information about GFLOPS, GINTOPS, and arithmetic intensity for INT and FLOP operation types.

OP/S and Bandwidth Pane

View metrics for all compute tasks and functions/loops of your application against the hardware-imposed performance ceilings on preview Roofline charts for GPU and CPU. Explore how many FLOPS and INTOPS per second can be executed on different memory levels. For details about using GPU Roofline Chart, see Examine Bottlenecks on GPU Roofline Chart.

Filter operations by type by switching between INT and FLOAT in the upper-right corner of preview Roofline charts.

Open the drop-down menus below the preview Roofline charts to view detailed information about GFLOPS, GINTOPS, and arithmetic intensity for INT and FLOP operation types on different memory levels (CARM, L3, SLM, GTI). For a GPU Roofline chart, view instruction mix diagram showing a total number of instructions united by their types (FLOAT, INT, STORE, LOAD, and MOVE).

Hover over a dot on a Roofline chart to view metrics for the selected function/loop. Click a dot to open it in source code and view it on a GPU Roofline chart.

Top Hotspots Pane

View key metrics (elapsed time, FLOPS, GINTOPS) for top five most time-consuming compute tasks on a GPU and functions/loops on a CPU that are the best candidates for optimization. Click the function name to open it in source code and view it on a GPU Roofline chart.

Performance Characteristics Pane

View the execution time details for GPU- and CPU-executed parts of your application. This pane can tell you how well your application uses GPU resources on each memory level. Hover over the histogram to see the fractions of active, stalled, and idle EU arrays.

Platform Information Pane

View the system information including software and hardware summary.

Collection Information Pane

View information about Survey and Characterization data collection. Use drop-downs to show/hide information for each analysis type.