Visible to Intel only — GUID: svj1548359147169
Ixiasoft
Visible to Intel only — GUID: svj1548359147169
Ixiasoft
13.7. Intel® HLS Compiler Pro Edition Loop Pragmas
Use the Intel® HLS Compiler loop pragmas to control how the compiler pipelines the loops in your component.
Pragma | Description |
---|---|
disable_loop_pipelining | Prevents compiler from pipelining a loop, |
ii | Forces a loop to have a loop initiation interval (II) of a specified value. |
ivdep | Ignores memory dependencies between iterations of this loop. |
loop_coalesce | Tries to fuse all loops nested within this loop into a single loop. |
loop_fuse | Directs the compiler to try and fuse pairs of adjacent loops. |
max_concurrency | Limits the number of iterations of a loop that can simultaneously execute at any time. |
max_interleaving | Controls whether iterations of a pipelined inner loop in a loop nest from one invocation of the inner loop can be interleaved in the component data pipeline with iterations from other invocations of the inner loop. |
nofusion | Prevents the annotated loop from being fused with adjacent loops. |
speculated_iterations | Specifies the number of clock cycles that a loop exit condition can take to compute. |
unroll | Unrolls the loop completely or by a number of times. |
disable_loop_pipelining Loop Pragma
- Syntax
- #pragma disable_loop_pipelining
- Description
-
Tells the compiler to not pipeline this loop.
Disable loop pipelining for a loop when the loop-carried dependencies cause the loop iterations to effectively execute sequentially. With loop pipelining disabled, the Intel® HLS Compiler can generate a simpler datapath and reduce the FPGA area utilization of your component.
Example:#pragma disable_loop_pipelining for (int i = 1; i < N; i++) { int j = a[i-1]; // Memory dependency induces a high-latency loop feedback path a[i] = foo(j) }
ii Loop Pragma
- Syntax
- #pragma ii N
- Description
-
Forces the loop to which you apply this pragma to have a loop initiation interval (II) of <N>, where <N> is a positive integer value.
Forcing a loop II value can have an adverse effect on the fMAX of your component because using this pragma to get a lower loop II combines pipeline stages together and creates logic with a long propagation delay.
Using this pragma with a larger loop II inserts more pipeline stages and can give you a better component fMAX value.
Example:#pragma ii 2 for (int i = 0; i < 8; i++) { // Loop body }
ivdep Loop Pragma
- Syntax
- #pragma ivdep safelen(N) array(array_name)
- Description
-
Tells the compiler to ignore memory dependencies between iterations of this loop.
It can accept an optional argument that specifies the name of the array. If array is not specified, all component memory dependencies are ignored. If there are loop-carried dependencies, your generated RTL produces incorrect results.
The safelen parameter specifies the dependency distance. The dependency distance is the number of iterations between successive load/stores that depend on each other. It is safe to not include safelen is only when the dependence distance is infinite (that is, there are no real dependencies).
Example:#pragma ivdep safelen(2) for (int i = 0; i < 8; i++) { // Loop body }
To learn more, review the tutorial: <quartus_installdir>/hls/examples/tutorials/best_practices/loop_memory_dependency.
loop_coalesce Loop Pragma
- Syntax
- #pragma loop_coalesce N
- Description
-
Tells the compiler to try to fuse all loops nested within this loop into a single loop. This pragma accepts an optional value N which indicates the number of levels of loops to coalesce together.
#pragma loop_coalesce 2 for (int i = 0; i < 8; i++) { for (int j = 0; j < 8; j++) { // Loop body } }
loop_fuse Block-Scope Loop Pragma
- Syntax
- #pragma loop_fuse [depth(N)] [independent]
- Description
-
Apply this pragma to a block of code to indicate to the compiler that adjacent loops in the code block should be fused when safe, overriding the compiler profitability analysis of the fusion.
The depth(N) clause sets the number of nesting depths the compiler should consider when fusing adjacent loops. Specifying depth(1) is equivalent to indicating that only adjacent top-level loops should be considered for fusing.
The independent clause overrides the safety checks. If you specify the independent option, you are guaranteeing to the compiler that fusing pairs of loops affected by the loop_fuse pragma is safe. If it is not safe, you might get functional errors in your component.
For details of the safety checks, see the Fusion Criteria section of Loop Fusion.
Example:
#pragma loop_fuse { for (int j=0; j < N; ++j){ data[j] += Q; } for (int i = 0; i < N; ++l){ output[i] = Q * data[i]; } }
max_concurrency Loop Pragma
- Syntax
- #pragma max_concurrency N
- Description
-
This pragma limits the number of iterations of a loop that can simultaneously execute at any time.
This pragma is useful mainly when private copies of are created to improve the throughput of the loop. This is mentioned in the details pane for the loop in the Loop Analysis pane and the Bank view of the Function Memory Viewer of the high level design report (report.html).
This can occur only when the scope of a component memory (through its declaration or access pattern) is limited to this loop. Adding this pragma can be used to reduce the area that the loop consumes at the cost of some throughput.
Example:// Without this pragma, // multiple private copies // of the array "arr" #pragma max_concurrency 1 for (int i = 0; i < 8; i++) { int arr[1024]; // Loop body }
max_interleaving Loop Pragma
- Syntax
- #pragma max_interleaving <option>
- Description<option>
-
This pragma controls whether iterations of a pipelined inner loop in a loop nest from one invocation of the inner loop can be interleaved in the component data pipeline with iterations from other invocations of the inner loop.
By default, the Intel® HLS Compiler tries interleave a number simultaneous invocations of the inner loop equal to the loop initiation interval (II) of the inner loop. For example, an inner loop with an II of 2 can have iterations from two invocations in the pipeline at a time.
In cases where the interleaving of loop iterations from different loop invocations does not yield a performance benefit, limiting or restricting the amount of interleaving can result in reduced FPGA area utilization.
Supported values for <option>:- 1
The compiler restricts the annotated (inner) loop to be invoked only once per outer loop iteration. That is, all iterations of the inner loop travel the pipeline before the next invocation of the inner loop can occur.
- 0
Use the default interleaving behavior.
Example:// Loop j is pipelined with ii=1 for (int j = 0; j < M; j++) { int a[N]; // Loop i is pipelined with ii=2 #pragma max_interleaving 1 for (int i = 1; i < N; i++) { a[i] = foo(i) } … }
- 1
nofusion Loop Pragma
- Syntax
- #pragma nofusion
- Description
-
This pragma directs the compiler to not fuse the annotated loop with any adjacent loops.
Example:
#pragma nofusion L1: for (int j=0; j < N; ++j){ data[j] += Q; } L2: for (int i = 0; i < N; ++l) { output[i] = Q * data[i]; }
speculated_iterations Loop Pragma
- Syntax
- #pragma speculated_iterations N
- Description
-
This pragma specifies the number of loop iterations to wait before considering a loop exit condition. That is, you estimate that a loop takes at least N loop iterations before the exit condition is met.
If you specify a value that is too low, then the loop II increases to accommodate the iterations required to determine whether the loop exit condition is met.
Example:
component int loop_speculate (int N) { int m = 0; // The exit path has 2 multiplies and // compare is most critical in loop feedback path #pragma speculated_iterations 2 while (m*m*m < N) { m += 1; } return m; }
unroll Loop Pragma
- Syntax
- #pragma unroll N
- Description
-
This pragma unrolls the loop completely or by <N> times, where <N> is optional and is a positive integer value.
Important: Unrolling nested loops with large bounds might generate a large number of instructions that could result in very long compile times for your component.Example:
#pragma unroll 8 for (int i = 0; i < 8; i++) { // Loop body }
To learn more, review the tutorial: <quartus_installdir>/hls/examples/best_practices/resource_sharing_filter.