Visible to Intel only — GUID: GUID-E4992DA6-7DF7-4F7E-B361-3CAD9F686DF4
Analyzing Uncore Perfmon Events for Use of Intel® Data Direct I/O Technology in Intel® Xeon® Processors (NEW)
Top-down Microarchitecture Analysis Method
OpenMP* Code Analysis Method
Custom Data Collection for Performance Analysis (NEW)
Software Optimization for Intel® GPUs (NEW)
Core Utilization in DPDK Apps
PCIe Traffic in DPDK Apps
DPDK Event Device Profiling
Effective Utilization of Intel® Data Direct I/O Technology
Compile a Portable Optimized Binary with the Latest Instruction Set
Profiling Artificial Intelligence Applications (NEW)
Profiling Data Parallel Python* Applications (NEW)
Profiling OpenVINO™ Applications (NEW)
Profiling Large Language Models on Intel® Core™ Ultra 200V (NEW)
Profiling Applications in Performance Monitoring Unit (PMU)-Enabled Google Cloud* Virtual Machines (NEW)
Analyzing Uncore Perfmon Events for Use of Intel® Data Direct I/O Technology in Intel® Xeon® Processors (NEW)
Profiling High Bandwidth Memory Performance on Intel® Xeon® CPU Max Series (NEW)
Profiling Windows* Applications for Hybrid CPU Platforms (NEW)
Viewing Analysis Results on a Web Browser (NEW)
Profiling Single-Node Kubernetes* Applications (NEW)
Analyzing Hot Code Paths Using Flame Graphs (NEW)
Improving Hotspot Observability in a C++ Application Using Flame Graphs
Profiling Games built with Unity* (NEW)
Profiling Games built with Unreal Engine* (NEW)
Profiling Java Applications as a Remote User (NEW)
Profiling JavaScript* Code in Node.js*
Analyzing CPU and FPGA (Intel® Arria® 10 GX) Interaction
Profiling a .NET* Core Application
Profiling Applications in Amazon Web Services* (AWS) EC2 Instances
Enabling Performance Profiling in GitLab* CI
Configuring a Hyper-V* Virtual Machine for Hardware-Based Hotspots Analysis
Profiling an Application for Performance Anomalies (NEW)
Profiling an OpenMP* Offload Application running on a GPU
Profiling a SYCL* Application running on a GPU
Profiling an FPGA-driven SYCL* Application
Profiling Hardware Without Intel Sampling Drivers
Profiling MPI Applications
Profiling Docker* Containers
Profiling a Remote Target Through a Proxy Server (NEW)
Profiling in an Apptainer* Container
Profiling Linux*, Android*, and QNX* System Boot Time
Using Intel® VTune™ Profiler Server with Visual Studio Code and Intel® DevCloud for oneAPI (NEW)
Using Intel® VTune™ Profiler Server in HPC Clusters
Using the Command-Line Interface to Analyze the Performance of a SYCL* Application running on a GPU (NEW)
Cache-Related Latency Issues in Segmented Cache Environment
False Sharing
Frequent DRAM Accesses
Poor Port Utilization
Page Faults
Instruction Cache Misses
OS Thread Migration
OpenMP* Imbalance and Scheduling Overhead
Processor Cores Underutilization: OpenMP* Serial Time
Scheduling Overhead in an Intel® oneAPI Threading Building Blocks Application
PMDK Application Overhead
Visible to Intel only — GUID: GUID-E4992DA6-7DF7-4F7E-B361-3CAD9F686DF4
Tuning Recipes
These recipes explore typical application performance problems that you can detect with Intel® VTune™ Profiler or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.
- Cache-Related Latency Issues in Segmented Cache Environment
This recipe demonstrates how to use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores. - False Sharing
This recipe explores profiling a memory-bound linear_regression application using the General Exploration and Memory Access analyses of the Intel® VTune™ Amplifier. - Frequent DRAM Accesses
This recipe explores profiling a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses of the Intel® VTune™ Profiler to understand the cause of the frequent DRAM accesses. - Poor Port Utilization
Profile a core-bound matrix application using the Microarchitecture Exploration analysis in Intel® VTune™ Profiler. Understand the cause for poor port utilization and use Intel® Advisor to benefit from compiler vectorization. - Page Faults
Identify and measure the impact of page faults on application performance. Use Microarchitecture Exploration, System Overview, and Memory Consumption analyses in Intel® VTune™ Profiler. - Instruction Cache Misses
Profile an application bound on the front-end and reduce ICache misses using the Microarchitecture Exploration analysis with the PGO option. - OS Thread Migration
Identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler. - OpenMP* Imbalance and Scheduling Overhead
Follow this recipe to detect and fix frequent parallel bottlenecks of OpenMP programs, such as imbalance on barriers and scheduling overhead. - Processor Cores Underutilization: OpenMP* Serial Time
Use this recipe to identify a fraction of serial execution in an application that was parallelized with OpenMP. Discover additional opportunities for parallelization, and improve the scalability of the application. - Scheduling Overhead in an Intel® oneAPI Threading Building Blocks Application
Detect and fix scheduling overhead in an Intel® oneAPI Threading Building Blocks (oneTBB) application. - PMDK Application Overhead
Find and fix an overhead on memory accesses for a PMDK-based application.