Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.2.3. Coalescing Nested Loops

Use the loop_coalesce pragma to direct the Intel® FPGA SDK for OpenCL™ Offline Compiler to coalesce nested loops into a single loop without affecting the loop functionality. Coalescing loops can help reduce your kernel area usage by directing the compiler to reduce the overhead needed for loop control.

Coalescing nested loops also reduces the latency of the kernels , which can further reduce your kernel area usage. However, in some cases, coalescing loops might lengthen the critical loop initiation interval path, so coalescing loops might not be suitable for all kernels .

For NDRange kernels, the compiler automatically attempts to coalesce loops even if they are not annotated by the loop_coalesce pragma. Coalescing loops in NDRange kernels improves throughput as well as reducing kernel area usage. You can use the loop_coalesce pragma to prevent the automatic coalescing of loops in NDRange kernels.

To coalesce nested loops, specify the pragma as follows:
#pragma loop_coalesce <loop_nesting_level>

The <loop_nesting_level> parameter is optional and is an integer that specifies how many nested loop levels that you want the compiler to attempt to coalesce. If you do not specify the <loop_nesting_level> parameter, the compiler attempts to coalesce all of the nested loops.

For example, consider the following set of nested loops:
for (A)
  for (B)
    for (C)
      for (D)
    for (E)
If you place the pragma before loop (A), then the loop nesting level for these loops is defined as:
  • Loop (A) has a loop nesting level of 1.
  • Loop (B) has a loop nesting level of 2.
  • Loop (C) has a loop nesting level of 3.
  • Loop (D) has a loop nesting level of 4.
  • Loop (E) has a loop nesting level of 3.
Depending on the loop nesting level that you specify, the compiler attempts to coalesce loops differently:
  • If you specify #pragma loop_coalesce 1 on loop (A), the compiler does not attempt to coalesce any of the nested loops.
  • If you specify #pragma loop_coalesce 2 on loop (A), the compiler attempts to coalesce loops (A) and (B).
  • If you specify #pragma loop_coalesce 3 on loop (A), the compiler attempts to coalesce loops (A), (B), (C), and (E).
  • If you specify #pragma loop_coalesce 4 on loop (A), the compiler attempts to coalesce all of the loops [loop (A) - loop (E)].
Important: If you specify #pragma loop_coalesce 1 for a loop in an NDRange kernel, you prevent automatic loop coalescing for that loop.

Example

The following simple example shows how the compiler coalesces two loops into a single loop.

Consider a simple nested loop written as follows:
#pragma loop_coalesce
for (int i = 0; i < N; i++)
  for (int j = 0; j < M; j++)
    sum[i][j] += i+j;
The compiler coalesces the two loops together so that they run as if they were a single loop written as follows:
int i = 0;
int j = 0;
while(i < N){
  
  sum[i][j] += i+j;
  j++;

  if (j == M){
    j = 0;
    i++;
  }
}