Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

max_interleaving Attribute

Use the max_interleaving attribute to maximize the throughput and hardware resource occupancy of pipelined inner loops in a loop nest by issuing new inner loop iterations as frequently as possible (minimizing the loop initiation interval). When the compiler cannot achieve a loop II of 1 for an inner loop, the compiler configures the loop nest to interleave iterations of one invocation of the inner loop with iterations of other invocations of the inner loop.

Syntax

[[intel::max_interleaving(n)]]

The Intel® oneAPI DPC++/C++ Compiler restricts the annotated (inner) loop to be invoked at most n times per outer loop iteration. When this attribute is specified with n=0, the compiler allows the pipeline to contain a number of simultaneous invocations of the annotated loop equal to the loop initiation interval (II) of that loop. For example, an annotated inner loop with an II of 2 can have iterations from two invocations in the pipeline at a time. This behavior is the default behavior for the compiler if you do not specify the max_interleaving attribute.

As an example, consider the loop nest in the following code snippet:

int a[N];
// ...
// Loop j is pipelined with ii=1
for (int j = 0; j < M; j++) {
  // Loop i is pipelined with ii=2
  for (int i = 0; i < N; i++) {
    a[i] += foo(i);
  }
}

In this example, the inner i loop is pipelined with a loop II of 2. Under normal pipelining, this means that the inner loop hardware only achieves 50% utilization since one i iteration is initiated every other cycle. To take advantage of these idle cycles, the compiler interleaves a second invocation of the i loop from the next iteration of the outer j loop. Here, a loop invocation means to start pipelined execution of a loop body. In this example, since the i loop resides inside the j loop, and the j loop has a trip count of M, the i loop is invoked M times. Since the j loop is an outermost loop, it is invoked once. The following table illustrates the difference between normal pipelined execution of the i loop and interleaved execution for this example where N=5:

Difference Between Normal Pipelined Execution and Interleaved Execution
Cycle Pipelined Interleaved
0 (0,0) (0,0)
1 --- (1,0)
2 (0,1) (0,1)
3 --- (1,1)
4 (0,2) (0,2)
5 --- (1,2)
6 (0,3) (0,3)
7 --- (1,3)
8 (0,4) (0,4)
9 --- (1,4)
10 (1,0) (2,0)
11 --- (3,0)
12 (1,1) (2,1)
13 --- (3,1)
14 (1,2) (2,2)
15 --- (3,2)
16 (1,3) (2,3)
17 --- (3,3)
18 (1,4) (2,4)
19 --- (3,4)

The table shows the values (j,i) for each inner loop iteration that is initiated at each cycle. At cycle 0, both modes of execution initiate the (0,0)th iteration of the i loop. Under normal pipelined execution, no i loop iteration is initiated at cycle 1. Under interleaved execution, the (1,0)th iteration of the innermost loop, that is, the first iteration of the next (j=1) invocation of the i loop is initiated. By cycle 10, interleaved execution has initiated all of the iterations of both the j=0 invocation of the i loop and the j=1 invocation of the i loop. This represents twice the efficiency of the normal pipelined execution.

In some cases, you may decide that the performance benefit from interleaving is not equal to the area cost associated with enabling interleaving. In these cases, you may want to limit or restrict the amount of interleaving to reduce FPGA area utilization. To limit the number of interleaved invocations of an inner loop that can be executed simultaneously, annotate the inner loop with the [[intel::max_interleaving(n)]] attribute. The annotated loop must be contained inside another pipelined loop. The required parameter (n) specifies an upper bound on the degree of interleaving allowed, that is, how many invocations of the containing loop can execute the annotated loop at a given time.

Specify the [[intel::max_interleaving(n)]] attribute in one of the following ways:

  • [[intel::max_interleaving(1)]]

    The compiler restricts the annotated (inner) loop to be invoked only once per outer loop iteration. That is, all iterations of the inner loop travel the pipeline before the next invocation of the inner loop can occur.

  • [[intel::max_interleaving(0)]]

    The compiler allows the pipeline to contain a number of simultaneous invocations of the inner loop equal to the loop initiation interval (II) of the inner loop. For example, an inner loop with an II of 2 can have iterations from two invocations in the pipeline at a time. This behavior is the default behavior for the compiler if you do not specify the [[intel::max_interleaving(n)]] attribute.

Example

In the following code snippet, the compiler restricts the pipelined execution of the i loop. A new invocation of the i loop corresponds only to the subsequent iteration of the j loop.

// Loop j is pipelined with ii=1
for (int j = 0; j < M; j++) {
  int a[N];
  // Loop i is pipelined with ii=2
 [[intel::max_interleaving(1)]]  
  for (int i = 1; i < N; i++) {
    a[i] = foo(i)
  }
  …
}
NOTE:

For additional information, refer to the FPGA tutorial sample max_interleaving listed in the Intel® oneAPI Samples Browser on Linux* or Windows*, or access the code sample in GitHub.