Visible to Intel only — GUID: fuy1517866482040
Ixiasoft
Visible to Intel only — GUID: fuy1517866482040
Ixiasoft
4. Profiling Your Kernel to Identify Performance Bottlenecks
Consider the following OpenCL kernel program:
__kernel void add (__global int * a,
__global int * b,
__global int * c)
{
int gid = get_global_id(0);
c[gid] = a[gid]+b[gid];
}
As shown in the figure below, the Profiler instruments and connects performance counters in a daisy chain throughout the pipeline generated for the kernel program. The host then reads the data collected by these counters. For example, in PCI Express® (PCIe®)-based systems, the host reads the data via the PCIe control register access (CRA) or control and status register (CSR) port.
Work-item execution stalls might occur at various stages of an pipeline. Applications with large amounts of memory accesses or load and store operations might stall frequently to enable the completion of memory transfers. The Profiler helps identify the load and store operations or channel accesses that cause the majority of stalls within a kernel pipeline.
For usage information on the , refer to the Profiling Your OpenCL Kernel section of the Standard Edition Programming Guide.