Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

block_loop/noblock_loop

Enables or disables loop blocking for the immediately following nested loops. block_loop enables loop blocking for the nested loops. noblock_loop disables loop blocking for the nested loops.

Syntax

#pragma block_loop [clause[,clause]...]

#pragma noblock_loop

Arguments

clause

Can be any of the following:

factor (expr)

expr is a positive scalar constant integer expression representing the blocking factor for the specified loops. This clause is optional. If the factor clause is not present, the blocking factor will be determined based on processor type and memory access patterns and will be applied to the specified levels in the nested loop following the pragma.

At most only one factor clause can appear in a block_loop pragma.

level (level_expr[, level_expr]... )

level_expr is specified in the form const1 or const1:const2 where const1 is a positive integer constant m<= 8 representing the loop at level m, where the immediate following loop is level 1. The const2 is a positive integer constant n<= 8 representing the loop at level n, where n > m. const1:const2 represents the nested loops from level const1 through const2.

The clauses can be specified in any order. If you do not specify any clause, the compiler chooses the best blocking factor to apply to all levels of the immediately following nested loop.

Description

The block_loop pragma lets you exert greater control over optimizations on a specific loop inside a nested loop.

Using a technique called loop blocking, the block_loop pragma separates large iteration counted loops into smaller iteration groups. Execution of these smaller groups can increase the efficiency of cache space use and augment performance.

If there is no level and factor clause, the blocking factor will be determined based on the processor's type and memory access patterns and it will apply to all the levels in the nested loops following this pragma.

You can use the noblock_loop pragma to tune the performance by disabling loop blocking for nested loops.

The loop-carried dependence is ignored during the processing of block_loop pragmas.

The block_loop pragma is supported in host code only.


#pragma  block_loop factor(256) level(1)    /* applies blocking factor 256 to               */
#pragma  block_loop factor(512) level(2)    /* the top level loop in the following          
                                               nested loop and blocking factor 512 to       
                                               the 2nd level (1st nested) loop              */

#pragma  block_loop factor(256) level(2) 
#pragma  block_loop factor(512) level(1)     /* levels can be specified in any order        */

#pragma  block_loop factor(256) level(1:2)   /* adjacent loops can be specified as a range  */

#pragma  block_loop factor(256)              /* the blocking factor applies to all levels   
                                                of loop nest                                */

#pragma  block_loop               /* the blocking factor will be determined based on 
                                     processor type and memory access patterns and will 
                                     be applied to all the levels in the nested loop 
                                     following the directive                                */

#pragma  noblock_loop             /* None of the levels in the nested loop following this 
                                     directive will have a blocking factor applied          */

Consider the following:

#pragma block_loop factor(256) level(1:2)
for (j = 1 ; j<n ; j++){ 
  f  = 0 ; 
  for (i =1 ;i<n  i++){
    f  =  f +   a[i]  *  b  [i] ;
  }
  c [j]  = c[j]  + f ; 
}

The above code produces the following result after loop blocking:

for ( jj=1 ; jj<n/256+1 ; jj+){
  for ( ii = 1 ; ii<n/256+1 ;ii++){ 
    for ( j = (jj-1)*256+1 ;  min(jj*256, n) ;j++){ 
      f = 0 ; 
      for ( i = (ii-1)*256+1 ;i<min(ii*256,n) ;i++){
        f = f + a[i] * b [i]; 
      } 
      c[j]  = c[j] + f ; 
    } 
  } 
}