Visible to Intel only — GUID: GUID-E4992DA6-7DF7-4F7E-B361-3CAD9F686DF4
Visible to Intel only — GUID: GUID-E4992DA6-7DF7-4F7E-B361-3CAD9F686DF4
Tuning Recipes
These recipes explore typical application performance problems that you can detect with Intel® VTune™ Profiler or its predecessor, Intel® VTune™ Amplifier. Use the guidance in these recipes to optimize performance.
Recipe |
Description |
---|---|
Cache-Related Latency Issues in Segmented Cache Environment | Use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores. |
False Sharing | Profile a memory-bound linear_regression application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler. |
Frequent DRAM Accesses | Profile a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses in Intel® VTune™ Profiler. Understand the cause for frequent DRAM accesses. |
Poor Port Utilization | Profile a core-bound matrix application using the Microarchitecture Exploration analysis. Understand the cause for poor port utilization. |
Page Faults | Identify and measure the impact of page faults on target application performance. Use the Microarchitecture Exploration, System Overview, and Memory Access analyses in Intel® VTune™ Profiler. |
Instruction Cache Misses | Profile a front-end-bound application using the Microarchitecture Exploration analysis in Intel® VTune™ Profiler. Use a PGO option to reduce ICache misses. |
Inefficient Synchronization | Locate inefficient synchronization in your code by running the Advanced Hotspots analysis with the stack collection enabled. |
Inefficient TCP/IP Synchronization | Locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis in Intel® VTune™ Profiler, with the task collection enabled. |
OS Thread Migration | Identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler. |
OpenMP* Imbalance and Scheduling Overhead | Detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead. |
Processor Cores Underutilization: OpenMP* Serial Time | Identify a fraction of serial execution in an application parallelized with OpenMP. Find additional opportunities for parallelization, and improve the scalability of the application. |
Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps | Detect and fix scheduling overhead for an Intel® TBB application. |
PMDK Application Overhead | Detect and fix an overhead on memory accesses for a PMDK-based application. |
- Cache-Related Latency Issues in Segmented Cache Environment
This recipe demonstrates how to use Cache Allocation Technology (CAT) to handle cache-related latency issues (cache misses) when you split a cache between cores. - False Sharing
This recipe explores profiling a memory-bound linear_regression application using the General Exploration and Memory Access analyses of the Intel® VTune™ Amplifier. - Frequent DRAM Accesses
This recipe explores profiling a memory-bound matrix application using the Microarchitecture Exploration and Memory Access analyses of the Intel® VTune™ Profiler to understand the cause of the frequent DRAM accesses. - Poor Port Utilization
This recipe explores profiling a core-bound matrix application using the Microarchitecture Exploration analysis (formerly, General Exploration) of the Intel® VTune™ Amplifier to understand the cause of the poor port utilization and Intel® Advisor to benefit from compiler vectorization. - Page Faults
This recipe helps identify and measure page faults impact on target application performance by using Intel® VTune™ Profiler's Microarchitecture Exploration, System Overview, and Memory Consumption analyses. - Instruction Cache Misses
This recipe explores profiling a front-end-bound application using the General Exploration analysis of the Intel® VTune™ Amplifier and using a PGO option to reduce ICache misses. - Inefficient Synchronization
This recipe shows how to locate inefficient synchronization in your code by running the Advanced Hotspots analysis of the Intel® VTune™ Amplifier with the stack collection enabled. - Inefficient TCP/IP Synchronization
This recipe shows how to locate inefficient TCP/IP synchronization in your code by running the Locks and Waits analysis of the Intel® VTune™ Amplifier with the task collection enabled. - OS Thread Migration
This recipe provides steps to identify OS thread migration on the NUMA architecture with the Hotspots analysis in Intel® VTune™ Profiler. - OpenMP* Imbalance and Scheduling Overhead
This recipe shows how to detect and fix frequent parallel bottlenecks of OpenMP programs such as imbalance on barriers and scheduling overhead. - Processor Cores Underutilization: OpenMP* Serial Time
This recipe shows how to identify a fraction of serial execution in an application parallelized with OpenMP, discover additional opportunities for parallelization, and improve scalability of the application. - Scheduling Overhead in Intel® Threading Building Blocks (Intel® TBB) Apps
This recipe shows how to detect and fix scheduling overhead for an Intel TBB application. - PMDK Application Overhead
This recipe shows how to detect and fix an overhead on memory accesses for a PMDK-based application.