Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Optimize Loops With Loop Speculation

Loop speculation is an optimization technique that enables more efficient loop pipelining by allowing future iterations to be initiated before determining whether the loop was exited already. Consider the following simple loop example:

while (m*m*m < N) {
  m+=1;
}

Logically, the exit condition (m*m*m < N) for an iteration must be evaluated before determining whether you need to initiate another iteration or not. This means that, in the absence of speculation, the loop II cannot be lower than the number of cycles it takes to compute this exit condition. Speculated iterations are iterations that launch before the exit condition computation has completed. However, all operations with side-effects, such as stores to memory, are predicated by the exit condition. This means that operations with side-effects still waits for the exit condition to be computed. Loop speculation is beneficial when the exit condition is the bottleneck preventing from achieving a lower II. In the loop shown above, the exit condition contains two multiplications that cannot complete within a single clock cycle. However, loop speculation allows this loop to achieve II=1.

For example, for a given iteration i with exit condition Ei, the number of speculated iterations s is the number of iterations after i has been initiated but before Ei has been evaluated. By default, this number of speculated iterations is determined by the compiler on a per-loop basis, and can be found in the per-loop details of the Loop Analysis report.

The speculated_iterations attribute allows you to directly control the number of speculated iterations for a loop. If the exit condition calculation is the bottleneck to lowering II (as shown in the Loop Analysis report), increasing the number of speculated iterations may improve the II (this is not guaranteed as other bottlenecks may be uncovered). For more information, refer to speculated_iterations Attribute.

Speculated iterations introduce some overhead in nested loops since a new invocation of a loop may not begin until all speculated iterations of its previous invocation have completed. In cases where a loop body with low latency is expected to be frequently invoked, (for example, an inner loop with a short trip count), use the speculated_iterations attribute to reduce the number of speculated iterations. You can estimate the amount of this overhead by multiplying the number of speculated iterations with II of the loop (as shown in the Loop Analysis report). Using the speculated_iterations attribute can reduce this overhead, but be aware that choosing an attribute value that is too low may increase the II (due to not having enough time to evaluate the exit condition).

Consider the following example:

while (m*m*m < N) {
  m+=1;
}
dst[0] = m;  

In this example, the exit condition that has two multiplies and a compare is the bottleneck preventing II=1. The compiler's choice of four speculated iterations result in II=2 since the exit condition takes seven cycles (each multiply takes three cycles and the compare takes one cycle) and four speculated iterations times two-cycle II gives eight cycles to cover this evaluation.

[[intel::speculated_iterations(7)]]
while (m*m*m < N) {
  m+=1;
}
dst[0] = m;  

Then, the speculated iterations are increased to seven to cover the seven-cycle exit condition calculation allowing us to achieve II=1.

[[intel::speculated_iterations(0)]]
while (m*m*m < N) {
  m+=1;
}
dst[0] = m; 
By setting the speculated_iterations attribute to 0, you can verify that the II has increased to 7, which matches the exit condition bottleneck.
NOTE:

For additional information, refer to the FPGA tutorial sample Speculated Iterations listed in the Intel® oneAPI Samples Browser on Linux* or Windows*, or access the code sample in GitHub.