Visible to Intel only — GUID: GUID-D241ADFB-4A7D-4F02-808D-5F4F88B7B17C
Visible to Intel only — GUID: GUID-D241ADFB-4A7D-4F02-808D-5F4F88B7B17C
Measure Kernel Performance
The Profiler instruments and connects performance counters in a daisy chain throughout the pipeline generated for the kernel program. The host then reads data collected by these counters. For example, in PCI Express® (PCIe®)-based systems, the host reads the Profiler data over the PCIe interface.
Consider the following SYCL example code:
// Vector Add Kernel
h.single_task<VectorAdd>([=]() {
for (int i = 0; i < kSize; ++i) {
r[i] = a[i] + b[i];
}
});
The profiler instruments and connects performance counters in a daisy chain throughout the pipeline generated for the kernel as shown in Figure 1. The host then reads the data collected by these counters. For example, in PCI Express® (PCIe)-based systems, the host reads the data via the PCIe control register access (CRA) or control and status register (CSR) port.
Applications that use many pipes or memory accesses might stall frequently to enable the completion of memory transfers. The dynamic profiler collects various performance metrics such as stall, occupancy, idle, and bandwidth data at these points in the pipeline to help identify memory or pipe operations that create stalls.