Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

ID 683342
Date 4/22/2019
Public
Document Table of Contents

5.2.3. Specifying a Loop Initiation interval (II)

The initiation interval, or II, is the number of clock cycles between the launch of successive loop iterations.
Use the ii pragma to direct the Intel® FPGA SDK for OpenCL™ Offline Compiler to attempt to set the initiation interval (II) for the loop that follows the pragma declaration. If the offline compiler cannot achieve the specified II for the loop, then the compilation errors out.

The ii pragma applies to single work-item kernels (that is, single-threaded kernels) in which loops are pipelined. Refer to the Single Work-Item Kernel versus NDRange Kernel section of the Intel® FPGA SDK for OpenCL™ Best Practices Guide for information on loop pipelining, and on kernel properties that drive the offline compiler's decision on whether to treat a kernel as single-threaded.

The higher the II value, the longer the wait before the subsequent loop iteration starts executing. Refer to the Reviewing Your Kernel's report.html File section of the Intel® FPGA SDK for OpenCL™ Best Practices Guide for information on II, and on the compiler reports that provide you with details on the performance implications of II on a specific loop.

For some loops in your kernel , specifying a higher II value with the ii pragma than the value the compiler chooses by default can increase the maximum operating frequency (fmax) of your kernel without a decrease in throughput.

A loop is a good candidate to have the ii pragma applied to it if the loop meets the following conditions:
  • The loop is pipelined because the kernel is single-threaded.
  • The loop is not critical to the throughput of your kernel .
  • The running time of the loop is small compared to other loops it might contain.
To specify a loop initiation interval for a loop, specify the pragma before the loop as follows:
#pragma ii <desired_initiation_interval>
The <desired_initiation_interval> parameter is required and is an integer that specifies the number of clock cycles to wait between the beginning of execution of successive loop iterations.

Example

Consider a case where your kernel has two distinct, pipelineable loops: a short-running initialization loop that has a loop-carried dependence and a long-running loop that does the bulk of your processing. In this case, the compiler does not know that the initialization loop has a much smaller impact on the overall throughput of your design. If possible, the compiler attempts to pipeline both loops with an II of 1.

Because the initialization loop has a loop-carried dependence, it will have a feedback path in the generated hardware. To achieve an II with such a feedback path, some clock frequency might be sacrificed. Depending on the feedback path in the main loop, the rest of your design could have run at a higher operating frequency.

If you specify #pragma ii 2 on the initialization loop, you tell the compiler that it can be less aggressive in optimizing II for this loop. Less aggressive optimization allows the compiler to pipeline the path limiting the fmax and could allow your overall kernel design to achieve a higher fmax.

The initialization loop takes longer to run with its new II. However, the decrease in the running time of the long-running loop due to higher fmax compensates for the increased length in running time of the initialization loop.