Optimize Memory & Cache Use with Intel® VTune™ Profiler

This video demos using Intel® VTune™ Profiler with other Intel® oneAPI Base Toolkit components like the Intel® oneAPI DPC+/C+ Compiler. The demo provides a hands-on example of the benefits of using Intel VTune Profiler to optimize application performance.

This scenario looks at memory and cache accesses. A hot spot analysis identifies a DRAM memory access that's taking too long. Further microarchitecture analysis points to cache misses and L3 cache accesses. The resolution uses built-in compiler prefetch commands to improve data access times. Intel VTune Profiler provides various views to aid with your analysis:

  • The platform diagram shows DRAM bus utilization.
  • The bandwidth utilization histogram reports memory accesses of varying bandwidth.
  • The latency histogram shows memory accesses and their latency and allows you to identify their origin.

This dataset then allows you to identify the remedy and reduce high-latency cache use.