Intel® High Level Synthesis Compiler Pro Edition: Reference Manual

ID 683349
Date 4/01/2024
Public
Document Table of Contents

13.7. Intel® HLS Compiler Pro Edition Loop Pragmas

Use the Intel® HLS Compiler loop pragmas to control how the compiler pipelines the loops in your component.

Table 43.   Intel® HLS Compiler Pro Edition Loop Pragmas Summary
Pragma Description
disable_loop_pipelining Prevents compiler from pipelining a loop,
ii Forces a loop to have a loop initiation interval (II) of a specified value.
ivdep Ignores memory dependencies between iterations of this loop.
loop_coalesce Tries to fuse all loops nested within this loop into a single loop.
loop_fuse Directs the compiler to try and fuse pairs of adjacent loops.
max_concurrency Limits the number of iterations of a loop that can simultaneously execute at any time.
max_interleaving Controls whether iterations of a pipelined inner loop in a loop nest from one invocation of the inner loop can be interleaved in the component data pipeline with iterations from other invocations of the inner loop.
nofusion Prevents the annotated loop from being fused with adjacent loops.
speculated_iterations Specifies the number of clock cycles that a loop exit condition can take to compute.
unroll Unrolls the loop completely or by a number of times.

disable_loop_pipelining Loop Pragma

Syntax
#pragma disable_loop_pipelining
Description
Tells the compiler to not pipeline this loop.

Disable loop pipelining for a loop when the loop-carried dependencies cause the loop iterations to effectively execute sequentially. With loop pipelining disabled, the Intel® HLS Compiler can generate a simpler datapath and reduce the FPGA area utilization of your component.

Example:
#pragma disable_loop_pipelining
for (int i = 1; i < N; i++) {
    int j = a[i-1];
    // Memory dependency induces a high-latency loop feedback path
    a[i] = foo(j)
}

ii Loop Pragma

Syntax
#pragma ii N
Description
Forces the loop to which you apply this pragma to have a loop initiation interval (II) of <N>, where <N> is a positive integer value.

Forcing a loop II value can have an adverse effect on the fMAX of your component because using this pragma to get a lower loop II combines pipeline stages together and creates logic with a long propagation delay.

Using this pragma with a larger loop II inserts more pipeline stages and can give you a better component fMAX value.

Example:
#pragma ii 2
for (int i = 0; i < 8; i++) {
 // Loop body
}

ivdep Loop Pragma

Syntax
#pragma ivdep safelen(N) array(array_name)
Description
Tells the compiler to ignore memory dependencies between iterations of this loop.

It can accept an optional argument that specifies the name of the array. If array is not specified, all component memory dependencies are ignored. If there are loop-carried dependencies, your generated RTL produces incorrect results.

The safelen parameter specifies the dependency distance. The dependency distance is the number of iterations between successive load/stores that depend on each other. It is safe to not include safelen is only when the dependence distance is infinite (that is, there are no real dependencies).

Example:
#pragma ivdep safelen(2)
for (int i = 0; i < 8; i++) {
 // Loop body
}

To learn more, review the tutorial: <quartus_installdir>/hls/examples/tutorials/best_practices/loop_memory_dependency.

loop_coalesce Loop Pragma

Syntax
#pragma loop_coalesce N
Description
Tells the compiler to try to fuse all loops nested within this loop into a single loop. This pragma accepts an optional value N which indicates the number of levels of loops to coalesce together.
#pragma loop_coalesce 2
for (int i = 0; i < 8; i++) {
 for (int j = 0; j < 8; j++) {
 // Loop body 
 } 
}

loop_fuse Block-Scope Loop Pragma

Syntax
#pragma loop_fuse [depth(N)] [independent]
Description
Apply this pragma to a block of code to indicate to the compiler that adjacent loops in the code block should be fused when safe, overriding the compiler profitability analysis of the fusion.

The depth(N) clause sets the number of nesting depths the compiler should consider when fusing adjacent loops. Specifying depth(1) is equivalent to indicating that only adjacent top-level loops should be considered for fusing.

The independent clause overrides the safety checks. If you specify the independent option, you are guaranteeing to the compiler that fusing pairs of loops affected by the loop_fuse pragma is safe. If it is not safe, you might get functional errors in your component.

For details of the safety checks, see the Fusion Criteria section of Loop Fusion.

Example:

#pragma loop_fuse
{
 for (int j=0; j < N; ++j){
   data[j] += Q;
 }
 for (int i = 0; i < N; ++l){
   output[i] = Q * data[i];
 }
 }

max_concurrency Loop Pragma

Syntax
#pragma max_concurrency N
Description
This pragma limits the number of iterations of a loop that can simultaneously execute at any time.

This pragma is useful mainly when private copies of are created to improve the throughput of the loop. This is mentioned in the details pane for the loop in the Loop Analysis pane and the Bank view of the Function Memory Viewer of the high level design report (report.html).

This can occur only when the scope of a component memory (through its declaration or access pattern) is limited to this loop. Adding this pragma can be used to reduce the area that the loop consumes at the cost of some throughput.

Example:
// Without this pragma,
// multiple private copies 
// of the array "arr"
#pragma max_concurrency 1
for (int i = 0; i < 8; i++) {
 int arr[1024];
 // Loop body
}

max_interleaving Loop Pragma

Syntax
#pragma max_interleaving <option>
Description<option>
This pragma controls whether iterations of a pipelined inner loop in a loop nest from one invocation of the inner loop can be interleaved in the component data pipeline with iterations from other invocations of the inner loop.

By default, the Intel® HLS Compiler tries interleave a number simultaneous invocations of the inner loop equal to the loop initiation interval (II) of the inner loop. For example, an inner loop with an II of 2 can have iterations from two invocations in the pipeline at a time.

In cases where the interleaving of loop iterations from different loop invocations does not yield a performance benefit, limiting or restricting the amount of interleaving can result in reduced FPGA area utilization.

Supported values for <option>:
  • 1

    The compiler restricts the annotated (inner) loop to be invoked only once per outer loop iteration. That is, all iterations of the inner loop travel the pipeline before the next invocation of the inner loop can occur.

  • 0

    Use the default interleaving behavior.

Example:
// Loop j is pipelined with ii=1
for (int j = 0; j < M; j++) {
  int a[N];
  // Loop i is pipelined with ii=2 
  #pragma max_interleaving 1
  for (int i = 1; i < N; i++) {
      a[i] = foo(i)
  }
  …
}

nofusion Loop Pragma

Syntax
#pragma nofusion
Description
This pragma directs the compiler to not fuse the annotated loop with any adjacent loops.
Example:
#pragma nofusion
L1: for (int j=0; j < N; ++j){
 data[j] += Q;
}
L2: for (int i = 0; i < N; ++l) {
 output[i] = Q * data[i];
}

speculated_iterations Loop Pragma

Syntax
#pragma speculated_iterations N
Description
This pragma specifies the number of loop iterations to wait before considering a loop exit condition. That is, you estimate that a loop takes at least N loop iterations before the exit condition is met.

If you specify a value that is too low, then the loop II increases to accommodate the iterations required to determine whether the loop exit condition is met.

Example:

component int loop_speculate (int N) {
    int m = 0;
    // The exit path has 2 multiplies and 
    // compare is most critical in loop feedback path
    #pragma speculated_iterations 2
    while (m*m*m < N) {
      m += 1;
    }
    return m;
  }

unroll Loop Pragma

Syntax
#pragma unroll N
Description
This pragma unrolls the loop completely or by <N> times, where <N> is optional and is a positive integer value.
Important: Unrolling nested loops with large bounds might generate a large number of instructions that could result in very long compile times for your component.
Example:
#pragma unroll 8
for (int i = 0; i < 8; i++) {
 // Loop body
}

To learn more, review the tutorial: <quartus_installdir>/hls/examples/best_practices/resource_sharing_filter.