Visible to Intel only — GUID: GUID-A73890AF-AFCB-4A3B-AF9E-299CE6FB59BD
Visible to Intel only — GUID: GUID-A73890AF-AFCB-4A3B-AF9E-299CE6FB59BD
Run Command Line Analysis
Default Installation Paths
Whether you downloaded Intel® VTune™ Profiler as a standalone component or with the Intel® oneAPI Base Toolkit, the default path for your <install-dir> is:
Operating System | Path to <install-dir> |
---|---|
Windows* OS |
|
Linux* OS |
|
macOS* |
/opt/intel/oneapi/ |
Run Predefined Analysis
The predefined analysis configurations already have most of the knobs (configuration options) set by default for your convenience. To run a predefined performance analysis, use the -collect action:
vtune-collect <analysis_type> [-target-system=<system>] [-knob <knobName=knobValue>] [--] <target>
where:
<analysis_type> is the type of analysis to run. To see the list of available analysis types, enter:
vtune -help collect
-target-system is an option targeted for remote analysis and specifies a remote Linux* system or a Android* device
-knob is a configuration option that modifies the analysis
[knobName=knobValue] is the name of the specified knob and its value
<target> is the path and name of the application to analyze. If you need to analyze a process, use the -target-process or -target-pid option to specify the process name or ID. For a system-wide analysis, no target specification is required.
Intel® VTune™ Profiler supports the following predefined analysis types:
Analysis Type |
Description |
---|---|
performance-snapshot | Get an overview of issues that affect application performance on your target system. |
hotspots | Analyze application flow and identify sections of code that take a long time to execute (hotspots). |
anomaly-detection (preview) | Identify performance anomalies in frequently recurring intervals of code like loop iterations. Perform fine-grained analysis at the microsecond level. |
threading | Collect data on how an application is using available logical CPU cores, discover where parallelism is incurring synchronization overhead, identify where an application is waiting on synchronization objects or I/O operations, and discover how waits affect application performance. |
hpc-performance | Identify opportunities to optimize CPU, memory, and FPU utilization for compute-intensive or throughput applications. The HPC Performance Characterization analysis type is a starting point for understanding the performance landscape of your application. Use this analysis type to improve application performance by increasing the number of floating-point operations per second (GFLOPS) and reducing the overall application run time. The analysis collects data related to CPU, memory, and FPU utilization. Additional scalability metrics are available for applications that use OpenMP* or MPI runtime libraries. |
memory-consumption | Analyze memory consumption by your Linux application, its distinct memory objects and their allocation stacks. |
uarch-exploration (former general-exploration) | Collect hardware events for analyzing a typical client application. This analysis calculates a set of predefined ratios used for the metrics and facilitates identifying hardware-level performance problems. |
memory-access | Identify memory-related issues, like NUMA problems and bandwidth-limited accesses, and attribute performance events to memory objects (data structures), which is provided due to instrumentation of memory allocations/de-allocations and getting static/global variables from symbol information. |
sgx-hotspots (deprecated) | Analyze hotspots inside security enclaves for systems with the Intel® Software Guard Extensions (Intel® SGX) feature enabled. This analysis type uses the INST_RETIRED.PREC_DIST hardware event that emulates precise clockticks and helps identify performance-critical program units inside enclaves. |
tsx-exploration (deprecated) | Collect events that help understand Intel® Transactional Synchronization Extensions (Intel® TSX) behavior and causes of transactional aborts. |
tsx-hotspots (deprecated) | Monitor the UOPS_RETIRED.ALL_PS hardware event that emulates precise clockticks and identify performance-critical program units inside transactions. |
gpu-hotspots (preview) | Identify Graphics Processing Unit (GPU) tasks with high GPU utilization and estimate the effectiveness of this utilization. This analysis type is intended for analysis of applications that use a GPU for rendering, video processing, and computations with explicit support of Intel® Media SDK and OpenCL™ software technology. |
gpu-offload | Explore code execution on various CPU and GPU cores on your platform, correlate CPU and GPU activity, and identify whether your application is GPU or CPU bound. |
vpp | Get a holistic view of system behavior. Gain insights into platform-level configuration, utilization, and imbalance issues that relate to compute, memory, storage, IO and interconnects. |
graphics-rendering (preview) |
Analyze the CPU/GPU utilization of your code running on the Xen virtualization platform. Explore GPU usage per GPU engine and GPU hardware metrics that help understand where performance improvements are possible. If applicable, this analysis also detects OpenGL-ES API calls and displays them on the timeline. |
Analyze the CPU/FPGA interaction issues via exploring OpenCL kernels running on FPGA, identify the most time-consuming FPGA kernels. |
|
io | Monitor utilization of the IO subsystems, CPU and processor buses. This analysis type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware. |
system-overview | Monitor a general behavior of your target system and identify platform-level factors that limit performance. |
Run Custom Analysis
If you need to run a modified version of the predefined analysis type, you may use the -collect-with action option to specify a data collection type and required configuration options (knobs):
vtune -collect-with <collection_type> [-target-system=<system>] [-knob <knobName=knobValue>] [--] <target>
where
<collection_type> is the type of analysis to run. To see the list of available collection types, enter:
vtune -help collect-with
-target-system is an option targeted for remote analysis and specifies a remote Linux* system or a Android* device
<-knob> is an option that configures the analysis
[knobName=knobValue] is the name of specified knob and its value
<target> is the path and name of the application to analyze. If you need to analyze a process, use the -target-process or -target-pid option to specify the process name or ID. For a system-wide analysis, no target specification is required.
Intel® VTune™ Profiler supports the following collection types:
Collector | Description |
---|---|
runsa | Profile your application using the counter overflow feature of the Performance Monitoring Unit (PMU). |
runss | Profile the application execution and take snapshots of how that application utilizes the processors in the system. The collector interrupts a process, collects the value of all active instruction addresses and captures a calling sequence for each of these samples. |
Next Steps
When the collection is complete, the VTune Profiler saves the data as an analysis result in the default or specified result directory. You can either view the result in the GUI or generate a formatted analysis report.