Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/19/2022
Public
Document Table of Contents

5.2.4. Fusing Adjacent Loops (loop_fuse Pragma)

Use the loop_fuse pragma to direct the Intel® FPGA SDK for OpenCL™ Offline Compiler to fuse adjacent loops into a single loop without affecting either loop's functionality. The loop_fuse construct defines a region of code where the compiler always attempts to fuse adjacent loops when it is safe to do so.

Fusing adjacent loops can help reduce your kernel area use by reducing the overhead required for loop control and increasing the performance of your kernel by executing both original loops concurrently as one (fused) loop.

To specify a block of program code within which the compiler attempts to fuse loops, specify the pragma as follows:

#pragma loop_fuse [clause[[,]clause]...] new-line
    structured_block

where clause is one of the following:

depth(constant-integer-expression)
If a depth clause is present, the constant-integer-expression clause parameter defines the number of nesting depths at which the fusion of adjacent loops is attempted. The depth clause extends the applicability of the loop_fuse construct to all loops nested in top-level loops contained in the construct at nesting depth less-than or equal to the clause parameter, including loops that become adjacent as a result of fusion of their corresponding containing loops. In the absence of a depth clause, only loops at the top-level of the loop_fuse construct are attempted to be fused (that is, loops not contained in other loops defined within the construct). The depth clause with a parameter of 1 is equivalent to the absence of a depth clause.
independent
If an independent clause is present, adjacent loops that are fusion candidates within a loop_fuse construct are assumed to have no negative-distance data access dependencies. That is, for two adjacent loops considered for fusion, iterations of the logically-second loop does not access data elements produced in a later iteration of the logically-first loop. The independent clause overrides the offline compiler's static analysis during loop fusion safety analysis.

If a function call is present in a loop_fuse construct at any of the applicable nesting depths and inlining the function call materializes a loop, then the resulting loop is considered to be a candidate for fusion.

CAUTION:
Default clauses are none, making the loop_fuse construct unable to introduce a functional error on its own. Introduction of an independent clause is a guarantee from you that bypasses a respective aspect of the compiler's safety analysis and might lead to functional errors.

Nested Depth Clauses

In programs where loop_fuse constructs are nested and their implied sets of fusion candidates overlap, the overall set of fusion candidates comprises a union of all loops covered by the distinct loop_fuse regions. The loop_fuse attribute clauses apply only to the fusion candidates implied by the directive to which the clauses apply.

#pragma loop_fuse depth(2) independent
{
  L1: for(...) {}
  L2: for(...) {
    #pragma loop_fuse depth(2)
    {
      L3: for(...) {}
      L4: for(...) {
        L5: for(...) {}
        L6: for(...) {}
      }
    }
  }
}
In this example, loops L1, L2, L3, L4, L5, and L6 are considered for fusion and loops L1, L2, L3, L4 are considered for fusion overriding the compiler's dependence analysis.