Intel® VTune™ Profiler

User Guide

ID 766319
Date 3/31/2025
Public
Document Table of Contents

XPU Offload View

Use the XPU Offload viewpoint to assess and optimize the performance of AI or ML workloads on Intel Neural Processing Units (NPUs) and Graphical Processing Units(GPUs).

When the XPU Offload analysis executes, Intel® VTune™ Profiler collects NOC metric set data about the DDR bandwidth between the NPU and DDR memory. Once data collection completes, Intel® VTune™ Profiler prepares the results and displays them in the Summary window.

XPU Offload Summary

The Summary window displays NPU performance data, starting with these sections:

  • NPU Device Load - This section indicates the amount of data transferred between the NPU and DDR memory.
  • NPU Top Compute Tasks - This section captures the total amount of time when tasks got executed on the NPU.

Next, see the list of Top Tasks to review the various host tasks which offloaded work onto the NPU.

XPU Exploration Compute Window

Continue your examination of host tasks by switching to the Bottom-up window. In the Grouping pull down menu, select the Task Domain / Task Type / Function / Call Stack grouping.

See the execution of device tasks from the instant they started. This is the instant when the task was appended to the Computing Queue.

In the Computing Queue section, the portion of the graph above the dotted line indicates duration when the task was executed on the NPU.

The portion of the graph below the dotted line indicates the duration for which the task was waiting in the queue for execution on the NPU. Tasks are removed from the Computing Queue when they finish executing on the NPU.