Intel® VTune™ Profiler

User Guide

ID 766319
Date 10/31/2024
Public
Document Table of Contents

Pane: Timeline

Use the Timeline pane to see metrics over time, at the thread or platform level. Use this information to identify patterns and anomalies in the data.

In the Timeline pane, you can hover, zoom-in, and filter the data at interesting points in time to get more detail. Typically the Timeline pane is located at the bottom of the window. For the views which focus on the distribution of metrics distribution over time, the Timeline pane can occupy the upper or central part of the window. The analysis type and viewpoint manage the data presented in the Timeline pane.

The Timeline pane typically provides the following data:

Toolbar- Navigation control to zoom in/out the view on areas of interest. For more details on the Timeline controls, go to Managing Timeline View.

Platform metrics- Depending on the analysis type, the Timeline pane can present several areas with platform specific metrics like GPU engine usage, computing queue for OpenCL™ applications, bandwidth data, or power consumption. The most detailed analysis of the platform metrics is available with the Timeline pane in the Platform window.

Application metrics per grouping level-Depending on the viewpoint, the data may be represented by threads, modules, processes, cores, packages, and other units monitored by the data collector during the analysis run. For most of the viewpoints, the Thread grouping is the default. For some viewpoints, you can change the grouping level using the drop-down menu in the Legend area.

Grouping of hierarchical information in the Timeline pane displays in groups of the top five most critical results. You can expand the grouping to load additional results.

Note that the CPU Time metric value provided in the Thread area is applicable to a particular thread where 100% is the maximum possible utilization for a thread. For example, for the selection above 94.2% of CPU Time utilization means that the thread was active 94.2% of time and 5.8% it was waiting.

Selected metrics-Data on the most representative metrics displays in separate rows. This data demonstrates the overall application performance over time (for example, CPU Usage or GPU HW metrics) or system-wide execution (for example, GPU Usage). See Reference for Performance Metrics for detailed metrics description.

Note that the CPU Utilization metric in the Timeline pane is calculated as a sum of CPU time per each thread where 100% is the maximum possible utilization per CPU. For example, at the moment selected in the picture below the application utilized 1.91 of logical CPU cores (if every CPU is 100%, then 191% is 1.91) out of 4, and 0.23 of CPU was used by the application threads for overhead or spinning. This means that the application utilized only 1.68 of CPUs effectively.

Legend-Types of data presented on the timeline. Filter in/out any type of data presented in the timeline by selecting/deselecting corresponding check boxes. The list of performance metrics presented in the view depend on the selected analysis type and viewpoint.

VTune Profiler also uses special indicators to classify the presented data on the timeline:

  • Markers. Color markers indicate an area on the timeline when a particular task/ frame/event/etc. was executed. Hover over a marker to see the execution details for the selected element. The following markers are available:

    • Frame markers show frame duration. Available for applications using frames.

    • User task markers provide information on a task executed at this particular moment of time. Available for applications using Task API.

    • CPU sample markers indicate exact points where profiling samples happened during hardware event-based stack sampling collection. Use the markers density to estimate the data resolution. For example, the VTune Profiler interpolates the sampling data where accuracy depends on number of samples. In this case, the CPU Samples markers show more accurate information discovering the sporadic CPU utilization for the thread.

      Sample markers also help understand how exactly filtering and Spin/Overhead time calculation works. VTune Profiler filters or classifies samples as a whole, so when you do time filtering it is important to know whether the sample point got into the selected time interval or not. No data interpolation is done for sampling data when filtering or classifying sample metrics.

    • VSync markers for vertical synchronization. If your application uses vertical synchronization, you can select the VSync timeline option, estimate the correlation between VSync events and application frames, identify frames missing VSync events and explore possible reasons.

    • Sampling point markers point at which a data sample was read during energy analysis. Hovering over it gives the value(s) read at that time.

    • Wake-up object markers for energy analysis that show processor wake-ups on the timeline. Hover over a yellow marker to see the time when the selected wake-up happened and the name of the wake-up object.

    • Slow tasks markers show the duration of tasks (I/O Wait, Ftrace*, Atrace*, and so on) that is categorized as slow (according to the thresholds set up in the Summary window)

    • I/O APIs markers

  • Context switches. The time threads are spending on context switches. Hover over a context switch area to see the details on its duration, reason, and affected CPU. If you choose the Context Switch Time option in the Call Stack pane and select a context switch in the Timeline pane, the Call Stack pane shows a call sequence at which a preceding thread execution quantum was interrupted.

  • Transitions. The execution flow between threads where one thread signals to another thread waiting to receive that signal. For example, one thread attempts to acquire a lock held by another thread, which then releases it. The release acts like a signal to the waiting thread. Hover over a transition for more details. Double-click a transition to open the source code.

  • Memory transfers. OpenCL routines responsible for transferring data from the host system to a GPU are marked with cross-diagonal hatching on a computing queue:

  • Synchronizations. OpenCL routines responsible for synchronization are marked with vertical hatching on a computing queue:

  • Scaling indicators. For GPU metrics and bandwidth graphs, the VTune Profiler provides maximum Y-axis values used to scale the graphs. Color of such a value corresponds to the color of the relevant metric in the legend. For example, for the GPU L3 Cache Misses and Memory Access metrics, maximum Y value for the selected scale is 20.153 GB/sec for GPU Memory Read Bandwidth and for the GPU Memory Write Bandwidth, and 521849224.729 Misses/sec for GPU L3 Misses.

Tooltips-Hover over a chart element to get statistics on this metric/program unit for the selected moment of time.

For the GPU analysis of applications using OpenCL software technology, the Timeline pane in the Graphics window provides the following tabs:

  • Platform tab that focuses on a per-thread and per-process distribution of the CPU and GPU hardware metrics collected during the analysis run.

  • Architecture Diagram tab that is provided for OpenCL application analysis collected with the Analyze Processor Graphics hardware events option on systems with Intel® HD Graphics and Intel® Iris® Graphics. This tabs helps better understand the distribution of the GPU hardware metrics per architecture blocks for the period the selected OpenCL kernel was running.

NOTE:

Collecting energy analysis data with Intel® SoC Watch is available for target Android*, Windows*, or Linux* devices. Import and viewing of the Intel SoC Watch results is supported with any version of Intel® VTune™ Profiler.