Visible to Intel only — GUID: GUID-E4992DA6-7DF7-4F7E-B361-3CAD9F686DF4
Top-down Microarchitecture Analysis Method
OpenMP* Code Analysis Method
Custom Data Collection for Performance Analysis (NEW)
Software Optimization for Intel® GPUs (NEW)
Core Utilization in DPDK Apps
PCIe Traffic in DPDK Apps
DPDK Event Device Profiling
Effective Utilization of Intel® Data Direct I/O Technology
Compile a Portable Optimized Binary with the Latest Instruction Set
Profiling High Bandwidth Memory Performance on Intel® Xeon® CPU Max Series
Profiling Windows* Applications for Hybrid CPU Platforms (NEW)
Profiling Machine Learning Applications (NEW)
Profiling Single-Node Kubernetes* Applications (NEW)
Analyzing Hot Code Paths Using Flame Graphs (NEW)
Improving Hotspot Observability in a C++ Application Using Flame Graphs
Measuring Performance Impact of NUMA in Multi-Processor Systems
Profiling Games built with Unity* (NEW)
Profiling Games built with Unreal Engine* (NEW)
Profiling Java Applications as a Remote User (NEW)
Profiling JavaScript* Code in Node.js*
Analyzing CPU and FPGA (Intel® Arria® 10 GX) Interaction
Profiling a .NET* Core Application
Profiling Applications in Amazon Web Services* (AWS) EC2 Instances
Enabling Performance Profiling in GitLab* CI
Configuring a Hyper-V* Virtual Machine for Hardware-Based Hotspots Analysis
Profiling an Application for Performance Anomalies (NEW)
Profiling an OpenMP* Offload Application running on a GPU (NEW)
Profiling a SYCL* Application running on a GPU
Profiling an FPGA-driven SYCL* Application
Profiling Hardware Without Intel Sampling Drivers
Profiling MPI Applications
Profiling Docker* Containers
Profiling a Remote Target Through a Proxy Server (NEW)
Profiling in a Singularity* Container
Profiling Linux*, Android*, and QNX* System Boot Time
Using Intel® VTune™ Profiler Server with Visual Studio Code and Intel® DevCloud for oneAPI (NEW)
Using Intel® VTune™ Profiler Server in HPC Clusters
Using the Command-Line Interface to Analyze the Performance of a SYCL* Application running on a GPU (NEW)
Cache-Related Latency Issues in Segmented Cache Environment
False Sharing
Frequent DRAM Accesses
Poor Port Utilization
Page Faults
Instruction Cache Misses
Inefficient Synchronization
Inefficient TCP/IP Synchronization
OS Thread Migration
OpenMP* Imbalance and Scheduling Overhead
Processor Cores Underutilization: OpenMP* Serial Time
Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps
PMDK Application Overhead
Visible to Intel only — GUID: GUID-E4992DA6-7DF7-4F7E-B361-3CAD9F686DF4
Tuning Recipes
These recipes explore typical application performance problems that you can detect with Intel® VTune™ Profiler or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.
- Cache-Related Latency Issues in Segmented Cache Environment
This recipe demonstrates how to use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores. - False Sharing
This recipe explores profiling a memory-bound linear_regression application using the General Exploration and Memory Access analyses of the Intel® VTune™ Amplifier. - Frequent DRAM Accesses
This recipe explores profiling a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses of the Intel® VTune™ Profiler to understand the cause of the frequent DRAM accesses. - Poor Port Utilization
Profile a core-bound matrix application using the Microarchitecture Exploration analysis in Intel® VTune™ Profiler. Understand the cause for poor port utilization and use Intel® Advisor to benefit from compiler vectorization. - Page Faults
This recipe helps identify and measure page faults impact on target application performance by using Intel® VTune™ Profiler's Microarchitecture Exploration, System Overview, and Memory Consumption analyses. - Instruction Cache Misses
This recipe explores profiling a front-end-bound application using the General Exploration analysis of the Intel® VTune™ Amplifier and using a PGO option to reduce ICache misses. - Inefficient Synchronization
This recipe shows how to locate inefficient synchronization in your code by running the Advanced Hotspots analysis of the Intel® VTune™ Amplifier with the stack collection enabled. - Inefficient TCP/IP Synchronization
This recipe shows how to locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis of the Intel® VTune™ Amplifier with the task collection enabled. - OS Thread Migration
This recipe provides steps to identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler. - OpenMP* Imbalance and Scheduling Overhead
Follow this recipe to detect and fix frequent parallel bottlenecks of OpenMP programs, such as imbalance on barriers and scheduling overhead. - Processor Cores Underutilization: OpenMP* Serial Time
This recipe shows how to identify a fraction of serial execution in an application parallelized with OpenMP, discover additional opportunities for parallelization, and improve scalability of the application. - Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps
This recipe shows how to detect and fix scheduling overhead for an Intel TBB application. - PMDK Application Overhead
This recipe shows how to detect and fix an overhead on memory accesses for a PMDK-based application.