Visible to Intel only — GUID: GUID-D98B015E-7AC2-4C46-99AE-6BEFE23FCAE0
Visible to Intel only — GUID: GUID-D98B015E-7AC2-4C46-99AE-6BEFE23FCAE0
block_loop/noblock_loop
Enables or disables loop blocking for the immediately following nested loops. block_loop enables loop blocking for the nested loops. noblock_loop disables loop blocking for the nested loops.
Syntax
#pragma block_loop [clause[,clause]...] |
#pragma noblock_loop |
Arguments
clause |
Can be any of the following:
The clauses can be specified in any order. If you do not specify any clause, the compiler chooses the best blocking factor to apply to all levels of the immediately following nested loop. |
Description
The block_loop pragma lets you exert greater control over optimizations on a specific loop inside a nested loop.
Using a technique called loop blocking, the block_loop pragma separates large iteration counted loops into smaller iteration groups. Execution of these smaller groups can increase the efficiency of cache space use and augment performance.
If there is no level and factor clause, the blocking factor will be determined based on the processor's type and memory access patterns and it will apply to all the levels in the nested loops following this pragma.
You can use the noblock_loop pragma to tune the performance by disabling loop blocking for nested loops.
The loop-carried dependence is ignored during the processing of block_loop pragmas.
The block_loop pragma is supported in host code only.
#pragma block_loop factor(256) level(1) /* applies blocking factor 256 to */
#pragma block_loop factor(512) level(2) /* the top level loop in the following
nested loop and blocking factor 512 to
the 2nd level (1st nested) loop */
#pragma block_loop factor(256) level(2)
#pragma block_loop factor(512) level(1) /* levels can be specified in any order */
#pragma block_loop factor(256) level(1:2) /* adjacent loops can be specified as a range */
#pragma block_loop factor(256) /* the blocking factor applies to all levels
of loop nest */
#pragma block_loop /* the blocking factor will be determined based on
processor type and memory access patterns and will
be applied to all the levels in the nested loop
following the directive */
#pragma noblock_loop /* None of the levels in the nested loop following this
directive will have a blocking factor applied */
Consider the following:
#pragma block_loop factor(256) level(1:2)
for (j = 1 ; j<n ; j++){
f = 0 ;
for (i =1 ;i<n i++){
f = f + a[i] * b [i] ;
}
c [j] = c[j] + f ;
}
The above code produces the following result after loop blocking:
for ( jj=1 ; jj<n/256+1 ; jj+){
for ( ii = 1 ; ii<n/256+1 ;ii++){
for ( j = (jj-1)*256+1 ; min(jj*256, n) ;j++){
f = 0 ;
for ( i = (ii-1)*256+1 ;i<min(ii*256,n) ;i++){
f = f + a[i] * b [i];
}
c[j] = c[j] + f ;
}
}
}