Visible to Intel only — GUID: ubc1566592137029
Ixiasoft
Visible to Intel only — GUID: ubc1566592137029
Ixiasoft
5.2.9. Loop Interleaving Control (max_interleaving Pragma)
As an example, consider the loop nest in the following code snippet:
// Loop j is pipelined with ii=1
for (int j = 0; j < M; j++) {
int a[N];
// Loop i is pipelined with ii=2
for (int i = 1; i < N; i++) {
a[i] = foo(i)
}
}
In this example, the inner i loop is pipelined with a loop II of 2. Under normal pipelining, this means that the inner loop hardware only achieves 50% utilization since one i iteration is initiated every other cycle. To take advantage of these idle cycles, the compiler interleaves a second invocation of the i loop from the next iteration of the outer j loop. Here, a loop invocation means to start pipelined execution of a loop body. In this example, since the i loop resides inside the j loop, and the j loop has a trip count of M, the i loop is invoked M times. Since the j loop is an outermost loop, it is invoked once. The following table illustrates the difference between normal pipelined execution of the i loop and interleaved execution for this example where N=5:
Cycle | Pipelined | Interleaved |
---|---|---|
0 | (0,0) | (0,0) |
1 | --- | (1,0) |
2 | (0,1) | (0,1) |
3 | --- | (1,1) |
4 | (0,2) | (0,2) |
5 | --- | (1,2) |
6 | (0,3) | (0,3) |
7 | --- | (1,3) |
8 | (0,4) | (0,4) |
9 | --- | (1,4) |
10 | (1,0) | (2,0) |
11 | --- | (3,0) |
12 | (1,1) | (2,1) |
13 | --- | (3,1) |
14 | (1,2) | (2,2) |
15 | --- | (3,2) |
16 | (1,3) | (2,3) |
17 | --- | (3,3) |
18 | (1,4) | (2,4) |
19 | --- | (3,4) |
The table shows the values (j,i) for each inner loop iteration that is initiated at each cycle. At cycle 0, both modes of execution initiate the (0,0)th iteration of the i loop. Under normal pipelined execution, no i loop iteration is initiated at cycle 1. Under interleaved execution, the (1,0)th iteration of the innermost loop, that is, the first iteration of the next (j=1) invocation of the i loop is initiated. By cycle 10, interleaved execution has initiated all of the iterations of both the j=0 invocation of the i loop and the j=1 invocation of the i loop. This represents twice the efficiency of the normal pipelined execution.
In some cases, you may decide that the performance benefit from interleaving is not equal to the area cost associated with enabling interleaving. In these cases, you may want to limit or restrict the amount of interleaving to reduce FPGA area utilization. To limit the number of interleaved invocations of an inner loop that can be executed simultaneously, annotate the inner loop with the max_interleaving pragma. The annotated loop must be contained inside another pipelined loop. The required parameter ( n) specifies an upper bound on the degree of interleaving allowed, that is, how many invocations of the containing loop can execute the annotated loop at a given time.
Specify the max_interleaving pragma in one of the following ways:
- #pragma max_interleaving 1
The compiler restricts the annotated (inner) loop to be invoked only once per outer loop iteration. That is, all iterations of the inner loop travels the pipeline before the next invocation of the inner loop can occur.
- #pragma max_interleaving 0
The compiler allows the pipeline to contain a number of simultaneous invocations of the inner loop equal to the loop initiation interval (II) of the inner loop. For example, an inner loop with an II of 2 can have iterations from two invocations in the pipeline at a time. This behavior is the default behavior for the compiler if you do not specify the max_interleaving pragma.
// Loop j is pipelined with ii=1
for (int j = 0; j < M; j++) {
int a[N];
// Loop i is pipelined with ii=2
#pragma max_interleaving 1
for (int i = 1; i < N; i++) {
a[i] = foo(i)
}
…
}