Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

block_loop/noblock_loop

Enables or disables loop blocking for the immediately following nested loops. block_loop enables loop blocking for the nested loops. noblock_loop disables loop blocking for the nested loops.

Syntax

#pragma block_loop [clause[,clause]...]

#pragma noblock_loop

Arguments

clause

Can be any of the following:

factor (expr)

expr is a positive scalar constant integer expression representing the blocking factor for the specified loops. This clause is optional. If the factor clause is not present, the blocking factor will be determined based on processor type and memory access patterns and will be applied to the specified levels in the nested loop following the pragma.

At most only one factor clause can appear in a block_loop pragma.

level (level_expr[, level_expr]... )

level_expr is specified in the form const1 or const1:const2 where const1 is a positive integer constant m<= 8 representing the loop at level m, where the immediate following loop is level 1. The const2 is a positive integer constant n<= 8 representing the loop at level n, where n > m. const1:const2 represents the nested loops from level const1 through const2.

The clauses can be specified in any order. If you do not specify any clause, the compiler chooses the best blocking factor to apply to all levels of the immediately following nested loop.

Description

The block_loop pragma lets you exert greater control over optimizations on a specific loop inside a nested loop.

Using a technique called loop blocking, the block_loop pragma separates large iteration counted loops into smaller iteration groups. Execution of these smaller groups can increase the efficiency of cache space use and augment performance.

If there is no level and factor clause, the blocking factor will be determined based on the processor's type and memory access patterns and it will apply to all the levels in the nested loops following this pragma.

You can use the noblock_loop pragma to tune the performance by disabling loop blocking for nested loops.

The loop-carried dependence is ignored during the processing of block_loop pragmas.

The block_loop pragma is supported in host code only.

#pragma block_loop factor(256) level(1) /* applies blocking factor 256 to */ #pragma block_loop factor(512) level(2) /* the top level loop in the following nested loop and blocking factor 512 to the 2nd level (1st nested) loop */ #pragma block_loop factor(256) level(2) #pragma block_loop factor(512) level(1) /* levels can be specified in any order */ #pragma block_loop factor(256) level(1:2) /* adjacent loops can be specified as a range */ #pragma block_loop factor(256) /* the blocking factor applies to all levels of loop nest */ #pragma block_loop /* the blocking factor will be determined based on processor type and memory access patterns and will be applied to all the levels in the nested loop following the directive */ #pragma noblock_loop /* None of the levels in the nested loop following this directive will have a blocking factor applied */

Consider the following:

#pragma block_loop factor(256) level(1:2) for (j = 1 ; j<n ; j++){ f = 0 ; for (i =1 ;i<n i++){ f = f + a[i] * b [i] ; } c [j] = c[j] + f ; }

The above code produces the following result after loop blocking:

for ( jj=1 ; jj<n/256+1 ; jj+){ for ( ii = 1 ; ii<n/256+1 ;ii++){ for ( j = (jj-1)*256+1 ; min(jj*256, n) ;j++){ f = 0 ; for ( i = (ii-1)*256+1 ;i<min(ii*256,n) ;i++){ f = f + a[i] * b [i]; } c[j] = c[j] + f ; } } }