Custom Analysis Options

User Guide

Intel® VTune™ Profiler User Guide

Download PDF

ID 766319

Date 11/07/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Custom Analysis Options

If you create a copy of a predefined analysis type, a new custom configuration inherits all options available for the original analysis and makes them editable.

This is a list of all available custom configuration options (knobs) in the alphabetical order:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A
Analyze I/O waits check box	Analyze the percentage of time each thread and CPU spends in I/O wait state.
Analyze interrupts check box	Collect interrupt events that alter a normal execution flow of a program. Such events can be generated by hardware devices or by CPUs. Use this data to identify slow interrupts that affect your code performance.
Analyze loops check box	Extend loops analysis to collect advanced loops information, such as instructions set usage and display analysis results by loops and functions.
Analyze memory bandwidth check box	Collect events required to compute memory bandwidth.
Analyze memory consumption check box (for Linux targets only)	Collect and analyze information about memory objects with the highest memory consumption.
Analyze memory objects check box (for Linux* targets only)	Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects.
Analyze OpenMP regions check box	Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size.
Analyze PCIe bandwidth check box	Collect the events required to compute PCIe bandwidth. As a result, you will be able to analyze the distribution of the read/write operations on the timeline and identify where your application could be stalled due to approaching the bandwidth limits of the PCIe bus. In the Device class drop-down menu, you can choose a device class where you need to analyze PCIe bandwidth: processing accelerators, mass storage controller, network controller, or all classes of the devices (default). NOTE: This analysis is possible only on the Intel microarchitecture code name Haswell EP and later.
Analyze power usage check box	Track power consumption by processor over time to see whether it can cause CPU throttling.
Analyze Processor Graphics hardware events drop-down menu	Analyze performance data from Intel HD Graphics and Intel Iris Graphics (further: Intel Graphics) based on the predefined groups of GPU metrics.
Analyze system-wide context switches check box	Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization).
Analyze user tasks, events, and counters check box	Analyze tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size.
Analyze user histogram check box	Analyze the histogram specified in your code via the Histogram API. This option increases both overhead and result size.
Analyze user synchronization check box	Enable User synchronization API profiling to analyze thread synchronization. This option causes higher overhead and increases result size.
C
Chipset events field	Specify a comma-separated list of chipset events (up to 5 events) to monitor with the hardware event-based sampling collector.
Collect context switches check box	Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization). NOTE: The types of the context switches (preemption or synchronization) cannot be identified if the analysis uses Perf* based driverless collection.
Collect CPU sampling data menu	Choose whether to collect information about CPU samples and related call stacks.
Collect highly accurate CPU time check box (for Windows targets only)	Obtain more accurate CPU time data. This option causes more runtime overhead and increases result size. Administrator privileges are required.
Collect I/O API data menu	Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.
Collect Parallel File System counters check box	Enable collection of the Parallel File System counters to analyze Lustre* file system performance statistics, including Bandwidth, Package Rate, Average Packet Size, and others.
Collect signalling API data menu	Choose whether to collect information about synchronization objects and call stacks for signaling calls. This analysis option helps identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size.
Collect stacks check box	Enable advanced collection of call stacks and thread context switches to analyze performance, parallelism, and power consumption per execution path.
Collect synchronization API data menu	Choose whether to collect information about synchronization wait calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size.
Collect thread affinity check box	Analyze thread pinning to sockets, physical cores, and logical cores. Identify incorrect affinity that utilizes logical cores instead of physical cores and contributes to poor physical CPU utilization. NOTE: Affinity information is collected at the end of the thread lifetime, so the resulting data may not show the whole issue for dynamic affinity that is changed during the thread lifetime.
CPU Events table	Specify hardware events to collect using the check boxes in the first column. By default, the table lists all events available for the target platform with events used for the original analysis configuration pre-selected. You may use the Search functionality to find events of interest. To get more details on an event, select it in the table and click the Explain button. Modify the Sample After value for an event to control the number of events after which the VTune Profiler interrupts the event data collection. The Sample After value depends on the target duration. Based on the duration value, the VTune Profiler adjusts the Sample After value with a multiplier.
CPU sampling interval, ms field	Specify an interval between collected CPU samples in milliseconds.
D
Disable alternative stacks for signal handlers check box (available for Linux targets)	Disable using alternative stacks for signal handlers. Consider this option for profiling standard Python 3 code on Linux.
E
Enable driverless collection check box	Use driverless Perf*-based hardware event-based collection when possible.
Evaluate max DRAM bandwidth check box	Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds.
Event mode drop-down list	Limit event-based sampling collection to USER (user events) or OS(system events) mode. By default, all event types are collected.
G
GPU Profiling mode drop-down menu	Select a profiling mode to either characterize GPU performance issues based on GPU hardware metric presets or enable a source analysis to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues. Use the Computing task of interest table to specify the kernels of interest and narrow down the GPU analysis to specific kernels minimizing the collection overhead. If required, modify the instance step for each kernel, which is a sampling interval (in the number of kernels).
GPU sampling interval, ms field	Specify an interval between GPU samples.
GPU Utilization check box (for Linux* targets available with Intel HD Graphics and Intel Iris® Graphics only)	Analyze GPU usage and identify whether your application is GPU or CPU bound.
L
Limit PMU collection to counting check box	Enable to collect counts of events instead of default detailed context data for each PMU event (such as code or hardware context). Counting mode introduces less overhead but gives less information.
Linux Ftrace events / Android framework events field	Use the kernel events library to select Linux Ftrace* and Android* framework events to monitor with the collector. The collected data show up as tasks in the Timeline pane. You can also apply the task grouping level to view performance statistics in the grid.
M
Managed runtime type to analyze menu	Choose a type of the managed runtime to analyze. Available options are: for Windows targets: combined Java* and .NET* analysis; combined Java, .NET and Python* analysis; Python only analysis for Linux targets: Java only analysis; combined Java and Python analysis; Python only analysis
Minimal memory object size to track, in bytes spin box (for Linux targets only)	Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation.
P
Profile with Hardware Tracing check box	Enable driver-less hardware tracing collection to explore CPU activities of your code at the microsecond level and triage latency issues.
S
Stack size, in bytes field	Specify the size of a raw stack (in bytes) to process. Unlimited size value in GUI corresponds to 0 value in the command line. Possible values are numbers between 0 and 2147483647.
Stack type drop-down menu	Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms.
Stack unwinding mode menu	Choose whether collection requires online (during collection) or offline (after collection) stack unwinding. Offline mode reduces analysis overhead and is typically recommended.
Stitch stacks check box	For applications using Intel® oneAPI Threading Building Blocks(oneTBB ) or OpenMP* with Intel runtime libraries, restructure the call flow to attach stacks to a point introducing a parallel workload.
T
Trace GPU Programming APIs check box	Capture the execution time of OpenCL™ kernels, SYCL tasks and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics.
U
Uncore sampling interval, ms field	Specify an interval (in milliseconds) between uncore event samples.
Use precise multiplexing check box	Enable a fine-grain event multiplexing mode that switches events groups on each sample. This mode provides more reliable statistics for applications with a short execution time. You can also consider applying the precise multiplexing algorithm if the MUX Reliability metric value for your results is low.

NOTE:

You may generate the command line for this configuration using the Command Line... button at the bottom.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® VTune™ Profiler User Guide

Custom Analysis Options

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

See Also

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® VTune™ Profiler User Guide

Custom Analysis Options

A B CDE F G H I J K LM N O P Q R STU V W X Y Z

See Also

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z