Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide
A newer version of this document is available. Customers should click here to go to the newest version.
5. Profiling Your Kernel to Identify Performance Bottlenecks
Consider the following OpenCL kernel program:
__kernel void add (__global int * a,
__global int * b,
__global int * c)
{
int gid = get_global_id(0);
c[gid] = a[gid]+b[gid];
}
As shown in the figure below, the Profiler instruments and connects performance counters in a daisy chain throughout the pipeline generated for the kernel program. The host then reads the data collected by these counters. For example, in PCI Express® (PCIe®)-based systems, the host reads the data via the PCIe control register access (CRA) or control and status register (CSR) port.
Work-item execution stalls might occur at various stages of an Intel® FPGA SDK for OpenCL™ pipeline. Applications with large amounts of memory accesses or load and store operations might stall frequently to enable the completion of memory transfers. The Profiler helps identify the load and store operations or channel accesses that cause the majority of stalls within a kernel pipeline.
- Best Practices for Profiling Your Kernel
- Instrumenting the Kernel Pipeline with Performance Counters (-profile)
- Obtaining Profiling Data During Runtime
- Reducing Area Resource Use While Profiling
- Temporal Performance Collection
- Performance Data Types
- Interpreting the Profiling Information
- Profiler Analyses of Example OpenCL Design Scenarios
- Intel FPGA Dynamic Profiler for OpenCL Limitations