Visible to Intel only — GUID: zbz1521229520101
Ixiasoft
Visible to Intel only — GUID: zbz1521229520101
Ixiasoft
5.2.2. Coalescing Nested Loops
Coalescing nested loops also reduces the latency of the component, which could further reduce your kernel area usage. However, in some cases, coalescing loops might lengthen the critical loop initiation interval path, so coalescing loops might not be suitable for all kernels .
For NDRange kernels, the compiler automatically attempts to coalesce loops even if they are not annotated by the loop_coalesce pragma. Coalescing loops in NDRange kernels improves throughput as well as reducing kernel area usage. You can use the loop_coalesce pragma to prevent the automatic coalescing of loops in NDRange kernels.
#pragma loop_coalesce <loop_nesting_level>
The <loop_nesting_level> parameter is optional and is an integer that specifies how many nested loop levels that you want the compiler to attempt to coalesce. If you do not specify the <loop_nesting_level> parameter, the compiler attempts to coalesce all of the nested loops.
for (A)
for (B)
for (C)
for (D)
for (E)
- Loop (A) has a loop nesting level of 1.
- Loop (B) has a loop nesting level of 2.
- Loop (C) has a loop nesting level of 3.
- Loop (D) has a loop nesting level of 4.
- Loop (E) has a loop nesting level of 3.
- If you specify #pragma loop_coalesce 1 on loop (A), the compiler does not attempt to coalesce any of the nested loops.
- If you specify #pragma loop_coalesce 2 on loop (A), the compiler attempts to coalesce loops (A) and (B).
- If you specify #pragma loop_coalesce 3 on loop (A), the compiler attempts to coalesce loops (A), (B), (C), and (E).
- If you specify #pragma loop_coalesce 4 on loop (A), the compiler attempts to coalesce all of the loops [loop (A) - loop (E)].
Example
The following simple example shows how the compiler coalesces two loops into a single loop.
#pragma loop_coalesce
for (int i = 0; i < N; i++)
for (int j = 0; j < M; j++)
sum[i][j] += i+j;
int i = 0;
int j = 0;
while(i < N){
sum[i][j] += i+j;
j++;
if (j == M){
j = 0;
i++;
}
}