Intel® VTune™ Profiler

User Guide

ID 766319
Date 12/20/2024
Public
Document Table of Contents

Hotspots Report

Use the hotspots command line report to identify program units (for example: functions, modules, or objects) that take the most processor time (Hotspots analysis), underutilize available CPUs or have long waits (Threading analysis), and so on.

Use the hotspots report to view hottest GPU computing tasks (or their instances) identified with the gpu-hotspots or gpu-offload analysis.

The report displays the hottest program units in the descending order by default, starting from the most performance-critical unit. The command-line reports provide the same data that is displayed in the default GUI analysis viewpoint.

NOTE:

To display a list of available groupings for a Hotspots report, enter vtune -report hotspots -r <result_dir> group-by=?. If you do not specify a result directory, the latest result is used by default.

Examples

Example 1: Hotspots Report with Module Grouping

This example opens the Hotspots report for the r001hs Hotspots analysis result and groups the data by module.

 vtune -report hotspots -r	r001hs -group-by module


Module             CPU Time  
-----------------  -------- 
analyze_locks      10.080s
KERNELBASE          0.679s
ntdl                0.164s
...

Example 2: Hotspots Report with Limited Items

This example displays the Hotspots report for the r001hs analysis result including only the top two functions with the highest CPU Time values. Functions having insignificant impact on performance are excluded from output.

 vtune -report hotspots -r	r001hs -limit 2 
  

Function          CPU Time
----------------  --------
grid_intersect      5.489s                   
sphere_intersect    3.590s

Example 3: Report per OpenCL Kernels

This example shows how to view the collected data per OpenCL kernels submitted and executed on the GPU:

vtune -report hotspots -group-by=computing-task -r r000gh
Computing Task       Work Size:Global  Computing Task:Total Time  Data Transferred:Size  EU Array:Active(%)  L3 <-> GTI Total Bandwidth, GB/sec
-------------------  ----------------  -------------------------  ---------------------  ------------------  ----------------------------------
AdvancePaths                    65536                    13.170s                                      25.0%                              22.928
Init                            65536                     0.006s                                      34.4%                              45.802
Intersect                       65536                    49.139s                                      61.5%                              23.149
Sampler                         65536                     6.525s                                      76.4%                              11.745
InitFrameBuffer                362432                     0.000s                                       4.7%                              17.456
clEnqueueReadBuffer                                       1.045s                  3 GB                 1.5%                               8.840

Example 4: Report Grouped per SYCL Task Instances

This example filters and groups the collected data by SYCL task instances:

vtune -report hotspots -group-by=computing-instance -r r000gh

Computing Task       Instance            Work Size:Global  Computing Task:Total Time  Data Transferred:Size  GPU Time
-------------------  ------------------  ----------------  -------------------------  ---------------------  --------
CopyVector2          2                            6553600                     0.190s                           0.190s
clEnqueueReadBuffer  1                                                        0.034s                400 MB     0.034s