Intel® VTune™ Profiler Functionality on Google Cloud

author-image

By

Introduction

Intel® VTune™ Profiler is a performance profiling tool that delivers software and hardware performance analysis through its graphical and command line interface. Hardware analysis relies on the availability of performance monitoring unit (or PMU) counters (PMCs). Google Cloud has enabled PMCs on two general-purpose machine types: C3 metal and C4. For more information, see: For more information, see: https://cloud.google.com/compute/docs/general-purpose-machines

C3 metal machine types such as c3-standard-192-metal are preconfigured to allow full access to the PMU, enabling all VTune functionality for CPU performance analysis.

C4 machine types have different levels of PMU availability, and must be configured for the VM. More information here: https://cloud.google.com/compute/docs/pmu-overview

Based on the PMU settings for Google Cloud VMs, these are the four general categories of VTune collections:

  1. Software (user-mode hotspots and threading) - these collections are generally software-based and do not rely on the availability of hardware events
  2. Basic Hardware (software collections plus event-based hotspots and threading) - these collections are hardware-based and require the availability of some hardware events
  3. Microarchitecture (enhanced hardware collections plus microarchitecture exploration and HPC characterization) – these collections are hardware-based and require the availability of most hardware events
  4. All (enhanced hardware collections plus memory access and bandwidth analysis) - this collection is hardware-based and requires the availability of events that occur outside of the CPU (uncore events), and generally only available on metal VMs

VMs Tested

Note that these lists may not include all PMU-enabled VMs available in Google Cloud.

General Purpose C4

VTune Profiler Functionality by VM Type
VM VTune Profiler Collections Supported PMU Option Intel® Xeon® Scalable Processor Code Name
c4-standard-48 Basic Hardware standard Emerald Rapids
c4-standard-96 Microarchitecture enhanced Emerald Rapids
c4-standard-192 Microarchitecture enhanced Emerald Rapids
c4-highmem-48 Basic Hardware standard Emerald Rapids
c4-highmem-96 Microarchitecture enhanced Emerald Rapids
c4-highmem-192 Microarchitecture enhanced Emerald Rapids
c4-highcpu-48 Basic Hardware standard Emerald Rapids
c4-highcpu-96 Microarchitecture enhanced Emerald Rapids
c4-highcpu-192 Microarchitecture enhanced Emerald Rapids

 

General Purpose C3 Metal

VTune Profiler Functionality by VM Type
VM VTune Profiler Collections Supported PMU Option Intel® Xeon® Scalable Processor Code Name
c3-standard-192-metal All Not Required Sapphire Rapids
c3-highmem-192-metal All Not Required Sapphire Rapids
c3-highcpu-192-metal All Not Required Sapphire Rapids

VM Description

The VMs were tested using a standard Ubuntu 22.04 image. A complete list of supported OSes is here: https://www.intel.com/content/www/us/en/developer/articles/system-requirements/vtune-profiler-system-requirements.html

Performance Monitoring Unit (PMU)

The PMU is on-chip hardware that monitors micro-architectural events such as cache misses, cache hits, and elapsed cycles. It also analyzes how the operating system or application performs on the processor. The PMU consists of two main types of events, hardware and software. The hardware event includes instructions, CPU cycles, and cache references, and the software event includes context switches and page faults.

VTune Profiler has two high-level ways of collecting these events in Linux*:

  • Linux Perf* tool - an interface that provides access to the PMU and its features. Perf also provides modes such as event-based sampling (EBS) which records when a threshold number of events is reached. Perf is already installed on the default kernel. See this guide on system configuration for VTune collections using Linux Perf: https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2024-2/profiling-hardware-without-sampling-drivers.html
  • VTune Profiler's drivers - provided as part of the VTune Profiler package and loaded if PMU access is detected. If VTune Profiler is unable to use its drivers, it will collect using Linux perf. The VTune drivers are only supported on metal VMs at this time.

Intel VTune Profiler - Application Performance Snapshot

Application Performance Snapshot (APS) is a utility packaged with VTune Profiler for Linux*. It provides the ability to quickly visualize MPI and OpenMP imbalances, efficiency of memory accesses, floating point unit (FPU), I/O and memory data in your application. After analyzing this data, it displays ways to perform additional analysis with VTune Profiler.

APS has the same limitations as VTune Profiler hardware analysis types. It can only run when PMU events are accessible.