Visible to Intel only — GUID: zfn1580758717181
Ixiasoft
Visible to Intel only — GUID: zfn1580758717181
Ixiasoft
5.2.4. Fusing Adjacent Loops (loop_fuse Pragma)
Fusing adjacent loops can help reduce your kernel area use by reducing the overhead required for loop control and increasing the performance of your kernel by executing both original loops concurrently as one (fused) loop.
To specify a block of program code within which the compiler attempts to fuse loops, specify the pragma as follows:
#pragma loop_fuse [clause[[,]clause]...] new-line
structured_block
where clause is one of the following:
- depth(constant-integer-expression)
- If a depth clause is present, the constant-integer-expression clause parameter defines the number of nesting depths at which the fusion of adjacent loops is attempted. The depth clause extends the applicability of the loop_fuse construct to all loops nested in top-level loops contained in the construct at nesting depth less-than or equal to the clause parameter, including loops that become adjacent as a result of fusion of their corresponding containing loops. In the absence of a depth clause, only loops at the top-level of the loop_fuse construct are attempted to be fused (that is, loops not contained in other loops defined within the construct). The depth clause with a parameter of 1 is equivalent to the absence of a depth clause.
- independent
- If an independent clause is present, adjacent loops that are fusion candidates within a loop_fuse construct are assumed to have no negative-distance data access dependencies. That is, for two adjacent loops considered for fusion, iterations of the logically-second loop does not access data elements produced in a later iteration of the logically-first loop. The independent clause overrides the offline compiler's static analysis during loop fusion safety analysis.
If a function call is present in a loop_fuse construct at any of the applicable nesting depths and inlining the function call materializes a loop, then the resulting loop is considered to be a candidate for fusion.
Nested Depth Clauses
In programs where loop_fuse constructs are nested and their implied sets of fusion candidates overlap, the overall set of fusion candidates comprises a union of all loops covered by the distinct loop_fuse regions. The loop_fuse attribute clauses apply only to the fusion candidates implied by the directive to which the clauses apply.
#pragma loop_fuse depth(2) independent
{
L1: for(...) {}
L2: for(...) {
#pragma loop_fuse depth(2)
{
L3: for(...) {}
L4: for(...) {
L5: for(...) {}
L6: for(...) {}
}
}
}
}
In this example, loops L1, L2, L3, L4, L5, and L6 are considered for fusion and loops L1, L2, L3, L4 are considered for fusion overriding the compiler's dependence analysis.