Visible to Intel only — GUID: ewa1399053428262
Ixiasoft
Visible to Intel only — GUID: ewa1399053428262
Ixiasoft
5. Profiling Your Kernel to Identify Performance Bottlenecks
Consider the following OpenCL kernel program:
__kernel void add (__global int * a,
__global int * b,
__global int * c)
{
int gid = get_global_id(0);
c[gid] = a[gid]+b[gid];
}
As shown in the figure below, the Profiler instruments and connects performance counters in a daisy chain throughout the pipeline generated for the kernel program. The host then reads the data collected by these counters. For example, in PCI Express® (PCIe®)-based systems, the host reads the data via the PCIe control register access (CRA) or control and status register (CSR) port.
Work-item execution stalls might occur at various stages of an Intel® FPGA SDK for OpenCL™ pipeline. Applications with large amounts of memory accesses or load and store operations might stall frequently to enable the completion of memory transfers. The Profiler helps identify the load and store operations or channel accesses that cause the majority of stalls within a kernel pipeline.
- Best Practices for Profiling Your Kernel
- Instrumenting the Kernel Pipeline with Performance Counters (-profile)
- Obtaining Profiling Data During Runtime
- Reducing Area Resource Use While Profiling
- Temporal Performance Collection
- Performance Data Types
- Interpreting the Profiling Information
- Profiler Analyses of Example OpenCL Design Scenarios
- Intel FPGA Dynamic Profiler for OpenCL Limitations