When a loop is unrolled, each iteration of the loop is replicated in hardware and executes simultaneously if the iterations are independent. Unrolling loops trades an increase in FPGA area use for a reduction in the latency of your component.
Consider the following basic loop with three stages and three iterations. Each stage represents the operations that occur in the loop within one clock cycle.
Figure 31. Basic loop with three stages and three iterations
If each stage of this loop takes one clock cycle to execute, then this loop has a latency of nine cycles.
The following figure shows the loop from
Figure 31 unrolled three times.
Figure 32. Unrolled loop with three stages and three iterations
Three iterations of the loop can now be completed in only three clock cycles, but three times as many hardware resources are required.
You can control how the compiler unrolls a loop with the #pragma unroll directive, but this directive works only if the compiler knows the trip count for the loop in advance or if you specify the unroll factor. In addition to replicating the hardware, the compiler also reschedules the circuit such that each operation runs as soon as the inputs for the operation are ready.
For an example of using the #pragma unroll directive, see the best_practices/resource_sharing_filter tutorial.