Identify and Address Remote NUMA Performance Impact

Platform Analysis in Intel® VTune™ Profiler collects coarse-grained, system-level metrics to identify hardware bottlenecks and inefficient use of hardware. The example workload in this video has steady memory use, but it is memory bound and has a high percentage of remote non-uniform memory accesses (NUMA) where data being accessed is in DRAM off the other socket on a dual-socket system. This also generates large cross-socket Intel® Ultra Path Interconnect (Intel® UPI) traffic with many spikes to satisfy the remote NUMA accesses.

By assigning the affinity of all threads in the workload to cores on a single socket, you can optimize memory access performance for this workload.

Intel VTune Profiler visualizes memory access latency and types of memory accesses across the platform. It's easier to identify the root cause of a performance issue in a memory bound workload and find a solution. Test your multisocket server workload with Intel VTune Profiler today.