Intel® VTune™ Profiler

User Guide

ID 766319
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

User-Mode Sampling and Tracing Collection

When profiling application execution, the Intel® VTune™ Profiler takes snapshots of how that application utilizes the processors in the system. A thread is considered active at a specific moment if it is ready to execute or is executing (not blocking). The snapshots of the number of running threads at the moment provide a hint to the degree of parallelism of the application as well as how this application utilizes processor resources. VTune Profiler classifies utilization into the ranges: Idle, Poor, Ok, and Ideal.

The user-mode sampling and tracing collector interrupts a process, collects the value of all active instruction addresses and captures a calling sequence for each of these samples. Sampled instruction pointers along with their calling sequences (stacks) are stored in data collection files. Statistically collected IP samples with calling sequences enable the viewer to display a call graph or/and the most time-consuming paths. Use this data to understand the control flow for statistically important code sections.

On Linux* the user-mode sampling and tracing collector embeds an agent library into the profiled application. The agent sets up the OS timer for each thread in the application. Upon timer expiration, the application receives the SIGPROF or another runtime signal that is handled by the collector.

Average overhead of the user-mode sampling and tracing collector is about 5% when sampling is using the default interval of 10ms.

VTune Profiler uses the user-mode sampling and tracing collector to collect data for the following analysis types:

You can also create a custom analysis type based on the user-mode sampling and tracing collection.

Collecting Stack Data

When collecting data, the VTune Profiler analyzes no more than one stack per configured interval. It unwinds stacks each 10 milliseconds of thread execution. But the VTune Profiler may decide to skip or emulate stack unwinding for performance reasons. In this case, when processing the collected data during finalization, the VTune Profiler tries to find matching stacks in the history for events without stacks.

This approach reduces stack unwinding overhead but may provide incorrect stacks due to wrong matches. In such cases, the VTune Profiler displays pseudo nodes in the bottom-up/top-down trees marked as [Guessed frame(s)], and [Skipped frame(s)]. See Troubleshooting to learn how to overcome these problems.

VTune Profiler may also display [Unknown frame(s)] nodes if it could not locate symbol files for system or application modules when unwinding the stack. See Resolving Unknown Frame(s) for more details.