Visible to Intel only — GUID: GUID-071A297B-20E0-45DE-B65D-1B2939F36E3C
Visible to Intel only — GUID: GUID-071A297B-20E0-45DE-B65D-1B2939F36E3C
Custom Analysis Options
If you create a copy of a predefined analysis type, a new custom configuration inherits all options available for the original analysis and makes them editable.
This is a list of all available custom configuration options (knobs) in the alphabetical order:
A |
|
---|---|
Analyze I/O waits check box |
Analyze the percentage of time each thread and CPU spends in I/O wait state. |
Analyze interrupts check box |
Collect interrupt events that alter a normal execution flow of a program. Such events can be generated by hardware devices or by CPUs. Use this data to identify slow interrupts that affect your code performance. |
Analyze loops check box |
Extend loops analysis to collect advanced loops information, such as instructions set usage and display analysis results by loops and functions. |
Analyze memory bandwidth check box |
Collect events required to compute memory bandwidth. |
Analyze memory consumption check box (for Linux targets only) |
Collect and analyze information about memory objects with the highest memory consumption. |
Analyze memory objects check box (for Linux* targets only) |
Enable the instrumentation of memory allocation/de-allocation and map hardware events to memory objects. |
Analyze OpenMP regions check box |
Instrument the OpenMP* regions in your application to group performance data by regions/work-sharing constructs and detect inefficiencies such as imbalance, lock contention, or overhead on performing scheduling, reduction, and atomic operations. Using this option may cause higher overhead and increase the result size. |
Analyze PCIe bandwidth check box |
Collect the events required to compute PCIe bandwidth. As a result, you will be able to analyze the distribution of the read/write operations on the timeline and identify where your application could be stalled due to approaching the bandwidth limits of the PCIe bus. In the Device class drop-down menu, you can choose a device class where you need to analyze PCIe bandwidth: processing accelerators, mass storage controller, network controller, or all classes of the devices (default).
NOTE:
This analysis is possible only on the Intel microarchitecture code name Haswell EP and later. |
Analyze power usage check box |
Track power consumption by processor over time to see whether it can cause CPU throttling. |
Analyze Processor Graphics hardware events drop-down menu |
Analyze performance data from Intel HD Graphics and Intel Iris Graphics (further: Intel Graphics) based on the predefined groups of GPU metrics. |
Analyze system-wide context switches check box |
Analyze detailed scheduling layout for all threads on the system and identify the nature of context switches for a thread (preemption or synchronization). |
Analyze user tasks, events, and counters check box |
Analyze tasks, events, and counters specified in your code via the ITT API. This option causes a higher overhead and increases the result size. |
Analyze user histogram check box |
Analyze the histogram specified in your code via the Histogram API. This option increases both overhead and result size. |
Analyze user synchronization check box |
Enable User synchronization API profiling to analyze thread synchronization. This option causes higher overhead and increases result size. |
C |
|
Chipset events field |
Specify a comma-separated list of chipset events (up to 5 events) to monitor with the hardware event-based sampling collector. |
Collect context switches check box |
Analyze detailed scheduling layout for all threads in your application, explore time spent on a context switch and identify the nature of context switches for a thread (preemption or synchronization).
NOTE:
The types of the context switches (preemption or synchronization) cannot be identified if the analysis uses Perf* based driverless collection. |
Collect CPU sampling data menu |
Choose whether to collect information about CPU samples and related call stacks. |
Collect I/O API data menu |
Choose whether to collect information about I/O calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. |
Collect Parallel File System counters check box |
Enable collection of the Parallel File System counters to analyze Lustre* file system performance statistics, including Bandwidth, Package Rate, Average Packet Size, and others. |
Collect signalling API data menu |
Choose whether to collect information about synchronization objects and call stacks for signaling calls. This analysis option helps identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size. |
Collect stacks check box |
Enable advanced collection of call stacks and thread context switches to analyze performance, parallelism, and power consumption per execution path. |
Collect synchronization API data menu |
Choose whether to collect information about synchronization wait calls and related call stacks. This analysis option helps identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. |
Collect thread affinity check box |
Analyze thread pinning to sockets, physical cores, and logical cores. Identify incorrect affinity that utilizes logical cores instead of physical cores and contributes to poor physical CPU utilization.
NOTE:
Affinity information is collected at the end of the thread lifetime, so the resulting data may not show the whole issue for dynamic affinity that is changed during the thread lifetime. |
CPU Events table |
|
CPU sampling interval, ms field |
Specify an interval between collected CPU samples in milliseconds. |
D |
|
Disable alternative stacks for signal handlers check box (available for Linux targets) |
Disable using alternative stacks for signal handlers. Consider this option for profiling standard Python 3 code on Linux. |
E |
|
Enable driverless collection check box |
Use driverless Perf*-based hardware event-based collection when possible. |
Evaluate max DRAM bandwidth check box |
Evaluate maximum achievable local DRAM bandwidth before the collection starts. This data is used to scale bandwidth metrics on the timeline and calculate thresholds. |
Event mode drop-down list |
Limit event-based sampling collection to USER (user events) or OS(system events) mode. By default, all event types are collected. |
G |
|
GPU Profiling mode drop-down menu |
Select a profiling mode to either characterize GPU performance issues based on GPU hardware metric presets or enable a source analysis to identify basic blocks latency due to algorithm inefficiencies, or memory latency due to memory access issues. Use the Computing task of interest table to specify the kernels of interest and narrow down the GPU analysis to specific kernels minimizing the collection overhead. If required, modify the instance step for each kernel, which is a sampling interval (in the number of kernels). |
GPU sampling interval, ms field |
Specify an interval between GPU samples. |
GPU Utilization check box (for Linux* targets available with Intel HD Graphics and Intel Iris® Graphics only) | Analyze GPU usage and identify whether your application is GPU or CPU bound. |
L |
|
Limit PMU collection to counting check box |
Enable to collect counts of events instead of default detailed context data for each PMU event (such as code or hardware context). Counting mode introduces less overhead but gives less information. |
Linux Ftrace events / Android framework events field |
Use the kernel events library to select Linux Ftrace* and Android* framework events to monitor with the collector. The collected data show up as tasks in the Timeline pane. You can also apply the task grouping level to view performance statistics in the grid. |
M |
|
Managed runtime type to analyze menu |
Choose a type of the managed runtime to analyze. Available options are:
|
Minimal memory object size to track, in bytes spin box (for Linux targets only) |
Specify a minimal size of memory allocations to analyze. This option helps reduce runtime overhead of the instrumentation. |
P |
|
Profile with Hardware Tracing check box |
Enable driver-less hardware tracing collection to explore CPU activities of your code at the microsecond level and triage latency issues. |
S |
|
Stack size, in bytes field |
Specify the size of a raw stack (in bytes) to process. Unlimited size value in GUI corresponds to 0 value in the command line. Possible values are numbers between 0 and 2147483647. |
Stack type drop-down menu |
Choose between software stack and hardware LBR-based stack types. Software stacks have no depth limitations and provide more data while hardware stacks introduce less overhead. Typically, software stack type is recommended unless the collection overhead becomes significant. Note that hardware LBR stack type may not be available on all platforms. |
Stack unwinding mode menu |
Choose whether collection requires online (during collection) or offline (after collection) stack unwinding. Offline mode reduces analysis overhead and is typically recommended. |
Stitch stacks check box |
For applications using Intel® oneAPI Threading Building Blocks(oneTBB ) or OpenMP* with Intel runtime libraries, restructure the call flow to attach stacks to a point introducing a parallel workload. |
T |
|
Trace GPU Programming APIs check box |
Capture the execution time of OpenCL™ kernels, SYCL tasks and Intel Media SDK programs on a GPU, identify performance-critical GPU tasks, and analyze the performance per GPU hardware metrics. |
U |
|
Uncore sampling interval, ms field |
Specify an interval (in milliseconds) between uncore event samples. |
Use precise multiplexing check box |
Enable a fine-grain event multiplexing mode that switches events groups on each sample. This mode provides more reliable statistics for applications with a short execution time. You can also consider applying the precise multiplexing algorithm if the MUX Reliability metric value for your results is low. |
To run a configuration from the command line, use the Command Line... button at the bottom of the UI.