Visible to Intel only — GUID: GUID-AF6BBD6C-E3CD-4636-839D-D268FE4D2FCB
Visible to Intel only — GUID: GUID-AF6BBD6C-E3CD-4636-839D-D268FE4D2FCB
Hotspots Report
Use the hotspots command line report to identify program units (for example: functions, modules, or objects) that take the most processor time (Hotspots analysis), underutilize available CPUs or have long waits (Threading analysis), and so on.
Use the hotspots report to view hottest GPU computing tasks (or their instances) identified with the gpu-hotspots or gpu-offload analysis.
The report displays the hottest program units in the descending order by default, starting from the most performance-critical unit. The command-line reports provide the same data that is displayed in the default GUI analysis viewpoint.
To display a list of available groupings for a Hotspots report, enter vtune -report hotspots -r <result_dir> group-by=?. If you do not specify a result directory, the latest result is used by default.
Examples
Example 1: Hotspots Report with Module Grouping
This example opens the Hotspots report for the r001hs Hotspots analysis result and groups the data by module.
vtune -report hotspots -r r001hs -group-by module
Module CPU Time
----------------- --------
analyze_locks 10.080s
KERNELBASE 0.679s
ntdl 0.164s
...
Example 2: Hotspots Report with Limited Items
This example displays the Hotspots report for the r001hs analysis result including only the top two functions with the highest CPU Time values. Functions having insignificant impact on performance are excluded from output.
vtune -report hotspots -r r001hs -limit 2
Function CPU Time
---------------- --------
grid_intersect 5.489s
sphere_intersect 3.590s
Example 3: Report per OpenCL Kernels
This example shows how to view the collected data per OpenCL kernels submitted and executed on the GPU:
vtune -report hotspots -group-by=computing-task -r r000gh
Computing Task Work Size:Global Computing Task:Total Time Data Transferred:Size EU Array:Active(%) L3 <-> GTI Total Bandwidth, GB/sec
------------------- ---------------- ------------------------- --------------------- ------------------ ----------------------------------
AdvancePaths 65536 13.170s 25.0% 22.928
Init 65536 0.006s 34.4% 45.802
Intersect 65536 49.139s 61.5% 23.149
Sampler 65536 6.525s 76.4% 11.745
InitFrameBuffer 362432 0.000s 4.7% 17.456
clEnqueueReadBuffer 1.045s 3 GB 1.5% 8.840
Example 4: Report Grouped per SYCL Task Instances
This example filters and groups the collected data by SYCL task instances:
vtune -report hotspots -group-by=computing-instance -r r000gh
Computing Task Instance Work Size:Global Computing Task:Total Time Data Transferred:Size GPU Time
------------------- ------------------ ---------------- ------------------------- --------------------- --------
CopyVector2 2 6553600 0.190s 0.190s
clEnqueueReadBuffer 1 0.034s 400 MB 0.034s