Intel® VTune™ Profiler

User Guide

ID 766319
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Callstacks Report

Intel® VTune™ Profiler collects call stack information during User-Mode Sampling and Tracing Collection or Hardware Event-based Sampling Collection with stack collection enabled. Use the callstacks report to see how the hot functions are called. This report type focuses on call sequences, beginning from the functions that take most CPU time.

You can use the -column option to filter the callstacks report and focus on the specific metric, for example:

vtune -report -callstacks -r r001ah -column="CPI Rate" 
  

NOTE:

To display a list of columns available for callstacks report, enter: vtune -report callstacks -r <result_dir> column=?

Examples

Example 1: Callstacks Report with Limited Items

The following example generates a callstacks report for the most recent analysis result and limits the number of functions and function stacks to 5 items.

vtune -report callstacks -limit 5

On Windows*:

Function        Function Stack     CPU Time  Module             Function (Full)                  Source File        Start Address
--------------  -----------------  --------  -----------------  -------------------------------  -----------------  -------------
grid_intersect                       5.436s  analyze_locks.exe  grid_intersect                   grid.cpp           0x40d340
                intersect_objects    1.918s  analyze_locks.exe  intersect_objects(struct ray *)  intersect.cpp      0x402840
                shader                   0s  analyze_locks.exe  shader(struct ray *)             shade.cpp          0x404730
                trace                    0s  analyze_locks.exe  trace(struct ray *)              trace_rest.cpp     0x402370
                render_one_pixel         0s  analyze_locks.exe  render_one_pixel                 analyze_locks.cpp  0x401db0
...

On Linux*:

Function              Function Stack     CPU Time  Module                 Function (Full)           Source File        Start Address
--------------------  -----------------  --------  ---------------------  ------------------------  -----------------  -------------
initialize_2D_buffer                      22.746s  tachyon_find_hotspots  initialize_2D_buffer      find_hotspots.cpp       0x4018f0
                      render_one_pixel    22.746s  tachyon_find_hotspots  render_one_pixel          find_hotspots.cpp       0x401950
                      draw_trace               0s  tachyon_find_hotspots  draw_trace(void)          find_hotspots.cpp       0x401d70
                      thread_trace             0s  tachyon_find_hotspots  thread_trace(thr_parms*)  find_hotspots.cpp       0x401ef0
                      trace_shm                0s  tachyon_find_hotspots  trace_shm                    trace_rest.cpp       0x410a20
                      trace_region             0s  tachyon_find_hotspots  trace_region                 trace_rest.cpp       0x410aa0
                      rt_renderscene           0s  tachyon_find_hotspots  rt_renderscene(void*)               api.cpp       0x402360
                      tachyon_video            0s  tachyon_find_hotspots  tachyon_video                     video.cpp       0x402240
                      main                     0s  tachyon_find_hotspots  main                              video.cpp       0x4013e0
                      __libc_start_main        0s  libc.so.6              __libc_start_main              libc-start.c        0x21dd0
                      _start                   0s  tachyon_find_hotspots  _start                            [Unknown]       0x40149c
                                                                                                                                                                
grid_intersect                             7.282s  tachyon_find_hotspots  grid_intersect                     grid.cpp       0x408930
                      intersect_objects    2.756s  tachyon_find_hotspots  intersect_objects(ray*)       intersect.cpp       0x40a400
                      shader                   0s  tachyon_find_hotspots  shader(ray*)                      shade.cpp       0x40eae0
...

Example 2: Callstacks Report with Callstack Grouping

This example generates a callstacks report for the r001tr result that is grouped by function call stacks.

vtune -report callstacks -r r001tr -group-by callstack

On Windows*:

Function/Function Stack                    Wait Time  Module             Function (Full)
-----------------------------------------  ---------  -----------------  -----------------------------------------
tbb::internal::acquire_binsem_using_event    20.005s  tbb.dll            tbb::internal::acquire_binsem_using_event

func@0x10003350                              13.857s  gdiplus.dll        func@0x10003350
func@0x1000c1f0                                   0s  gdiplus.dll        func@0x1000c1f0
BaseThreadInitThunk                               0s  KERNEL32.DLL       BaseThreadInitThunk
func@0x6b2dacf0                                   0s  ntdll.dll          func@0x6b2dacf0
func@0x6b2daccf                                   0s  ntdll.dll          func@0x6b2daccf

video::main_loop                             10.111s  analyze_locks.exe  video::main_loop(void)
main                                              0s  analyze_locks.exe  main
WinMain                                           0s  analyze_locks.exe  WinMain
_tmainCRTStartup                                  0s  analyze_locks.exe  _tmainCRTStartup
[Unknown stack frame(s)]                          0s  [Unknown]          [Unknown stack frame(s)]
BaseThreadInitThunk                               0s  KERNEL32.DLL       BaseThreadInitThunk
func@0x6b2dacf0                                   0s  ntdll.dll          func@0x6b2dacf0
...

On Linux*:

Function/Function Stack          Wait Time  Module                 Function (Full)                                                                                                                                            
-------------------------------  ---------  ---------------------  -----------------------------------------------------------
draw_task::operator()              98.698s  tachyon_analyze_locks  draw_task::operator()(tbb::blocked_range<int> const&) const                                                                                               
tbb::interface6::internal               0s  tachyon_analyze_locks  tbb::interface6::internal                      
execute<tbb::interface6::internal       0s  tachyon_analyze_locks  execute::interface6::internal                        
[TBB parallel_for on draw_task]         0s  tachyon_analyze_locks  tbb::interface6::internal::execute(void)                                            
[TBB Dispatch Loop]                     0s  libtbb.so.2            tbb::internal::local_wait_for_all(tbb::task&, tbb::task*)                                        
...