Visible to Intel only — GUID: GUID-55B76678-0837-4780-909A-60526DF2A39B
Visible to Intel only — GUID: GUID-55B76678-0837-4780-909A-60526DF2A39B
XPU Offload Analysis
Use the XPU Offload analysis to profile and optimize artificial intelligence (AI) workloads running on Intel architectures like Graphics Processing Units(GPUs) and Neural Processing Units(NPUs).
XPUs are the collection of Neural Processing Units(NPUs), Graphical Processing Units(GPUs) and CPU device cores. GPUs are a popular hardware choice for compute-intensive or graphics-intensive applications. An NPU can accelerate the performance of AI workloads that have been explicitly offloaded onto it by an operating system. NPUs are uniquely designed to improve the performance of AI and machine-learning(ML) workloads.
Use the Intel® Distribution of OpenVINO™ toolkit to offload popular ML models (like speech or image recognition tasks) to Intel NPUs. Then use the XPU Offload analysis to profile AI and ML workloads. Collect performance data and optimize the performance of these AI/ML applications.
Default Settings for XPU Data Collection
When you run XPU Offload analysis to collect data for an XPU device, Intel® VTune™ Profiler collects the following information in the Time-based mode:
Time-based mode |
|
---|---|
Data collection |
Intel® VTune™ Profiler collects metrics system-wide, similar to CPU uncore metrics. |
Size of typical workload |
Large |
Execution time of instance |
>5 ms |
Sampling interval |
1 ms |
Benefits |
Use this mode for larger workloads. Optimize applications with reasonable efficiency and reduced overhead. |
Usage considerations |
Less overhead for application. This mode requires Level Zero backend to be installed, with normal NPU drivers. However, the mode does not require the application to use Level Zero to collect metrics, except for computing tasks. |
Configure and Run Analysis
In the the VTune Profiler user interface, in the Accelerators group of the Analysis Tree, select XPU Offload(preview).
In the WHAT pane, specify the path to the AI/ML application in the Application bar.
If necessary, specify relevant Application parameters as well.
In the HOW pane, select your Target Devices.
Set these collection options as needed:
- Trace computing programming APIs - Set this option to analyze SYCL, Level-Zero, OpenCL™, and Intel® Video Processing Library(Intel® VPL) programs that run on Intel architectures (like GPUs or NPUs). Selecting this option can impact CPU performance.
- Collect host stacks - Set this option to analyze call stacks that are executed on the CPU and also identify critical paths. Examine the CPU-side stacks for GPU and NPU-computing tasks to investigate the efficiency of your XPU offload. When results display, use the Call Stack mode in the filter bar to sort through SYCL*, Level-Zero, or OpenCL™ runtime call stacks.
- Show GPU performance insights - Set this option to collect metrics based on the analysis of Processor Graphics events. Use these GPU performance metrics to estimate the efficiency of hardware usage and learn about next steps.
Click the
Start button to run the analysis.
The XPU Offload analysis profiles these metrics related to the performance of your GPU:
Performance Metric | Description |
---|---|
EU Array | The EU Array metric shows the breakdown of GPU core array cycles, where:
|
EU Threads Occupancy | This metric shows the normalized sum of all cycles on all cores and thread slots when a slot has a thread scheduled. |
Computing Threads Started | This metric shows the number of threads started across all EUs for compute work. |
To run the XPU Offload analysis from the command line, type:
$ vtune -collect xpu-offload [-knob <knob_name=knob_option>] -- <target> [target_options]
To generate the command line for any analysis configuration, use the Command Line button at the bottom of the user interface.
Once VTune Profiler completes data collection, the results of the XPU Offload analysis appear in the XPU Offload viewpoint.