Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide
A newer version of this document is available. Customers should click here to go to the newest version.
Visible to Intel only — GUID: npn1500574869050
Ixiasoft
Visible to Intel only — GUID: npn1500574869050
Ixiasoft
8.6. Minimizing the Memory Dependencies for Loop Pipelining
Loop dependencies might introduce bottlenecks for single work-item kernels due to latency associated with the memory accesses. The offline compiler defers a memory operation until a dependent memory operation completes. This can impact the loop initiation interval (II). The offline compiler indicates the memory dependencies in the optimization report.
- Ensure that the offline compiler does not assume false dependencies.
When the static memory dependence analysis fails to prove that dependency does not exist, the offline compiler assumes that a dependency exists and modifies the kernel execution to enforce the dependency. Impact of the dependency enforcement is lower if the memory system is stall-free.
- Write after read operations with data dependency on a load-store unit can take just two clock cycles (II=2). Other stall-free scenarios can take up to seven clock cycles.
- Read after write (control dependency) operation can be fully resolved by the offline compiler.
- Override the static memory dependence analysis by adding the line #pragma ivdep before the loop in your kernel code if you are sure that it carries no dependencies.