Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference

ID 767251
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

TILE

OpenMP* Fortran Compiler Directive: Tiles (or blocks) one or more loops of a loop nest. This feature is only available for ifx.

Syntax

!$OMP TILE clause

   loop-nest

[!$OMP END TILE]

clause

Is SIZES (size-list)

The clause is required, and it can appear only once.

The size-list is a list of positive integer expressions (s1, … sn), where n is less than or equal to the number of loops in the loop-nest.

When n is the number of integer expressions in size-list, the depth of the loop nest must be at least n. The TILE construct replaces the outer n loops with 2n perfectly nested loops.

The outer n loops are called the floor loops, f1, … fn, from outermost to innermost floor loop.

The inner n loops, nested within floor loop fn, are the tile loops, t1, … tn, from outermost to innermost tile loop. The resulting tile loops do not have canonical form.

The value of the expression si specifies the maximum iteration value for the tile loop ti.

A tile loop that contains sk iterations, where sk is the kth expression in size-list, is a complete tile; otherwise, it is a partial tile loop.

loop-nest

Is a nest of DO loops in canonical form.

Description

Loop tiling, also known as loop blocking, is a loop transformation performed on loops within loop nests. The transformation splits the processing of data into smaller segments called tiles (or blocks), which allows the data in the tiles (or blocks) to be accessed in parallel.

The loops associated with the construct must be perfectly nested. Each loop associated with the construct must be rectangular, that is, the loop control expressions of an inner loop cannot depend on the loop control variable of an outer loop.

Tiled loops perform best when a loop's iteration count is a multiple of the tile size for that loop. When the iteration count is not a multiple of the tile size, the loop nest may be transformed in a number of different ways. This allows partial tiles to execute in a manner that is optimal for the target hardware. The specified order of iterations must be preserved in the complete tile loops.

TILE constructs can be nested. If two TILE constructs are nested, the result is as if the outer TILE construct is applied to the resulting transformed loop nest created by the inner TILE construct.

Examples

In the following example, the loop iteration counts are each a multiple of their corresponding tile size, so there are no resulting partial tile loops. The PARALLEL DO construct is applied to the transformed loop nest.

The inner loop iterates through the rows, and the outer loop iterates through the columns, of a 64 x 20 matrix. The size expressions 8 and 4 specified in the SIZES clause of the TILE construct indicate a 8 x 4 blocking applied to the outer and inner loops:

  INTEGER,DIMENSION (64,20) :: arr
  INTEGER                   :: i, j
  !$OMP PARALLEL DO
  !$OMP TILE SIZES(8,4)
  DO i = 1, 64
    DO j = 1, 20
      arr(i,j) = arr(i,j)*10
    END DO
  END DO 

The transformed tiled loops are the following:

INTEGER,DIMENSION (64,20) :: arr
INTEGER                   :: i_inner, i_outer, j_inner, j_outer
!$OMP PARALLEL DO
DO i_outer = 1, 64, 8
  DO j_outer = 1, 20, 4
    DO i_inner = i_outer, i_outer+(8-1)
      DO j_inner = j_outer, j_outer+(4–1)
        arr(i_inner,j_inner) = arr(i_inner,j_inner)*10
      END DO 
    END DO
  END DO 
END DO 

In the following example, the inner loop iteration count 20 is not a multiple of the corresponding tile size 7. To handle the remaining iterations, there may be a partial tile loop depending on how the loop is transformed:

  INTEGER,DIMENSION (64,20) :: arr
  INTEGER                   :: i, j
  !$OMP TILE SIZES(8,7)
  DO i = 1, 64
    DO j = 1, 20
      arr(i,j) = arr(i,j) * 10
    END DO
  END DO 

In this case, various transformations are possible. The compiler is free to pick a transformation that is optimal for the target hardware. The order of execution of one tile with respect to other tiles can be changed, but within a given tile, the order of iteration execution must be preserved.

One possible transformation for the above loop nest is the following:

INTEGER,DIMENSION (64,20) :: arr
INTEGER                    :: i_inner, i_outer, j_inner, j_outer, j
! Complete tiles
DO i_outer = 1, 64, 8
  DO j_outer = 1, 14, 7
    DO i_inner = i_outer, i_outer + (8–1)
      DO j_inner = j_outer, j_outer + (7–1)
        arr(i_inner,j_inner) = arr(i_inner,j_inner)*10
      END DO 
    END DO
  END DO 
END DO 

! Partial tiles
DO i_outer = 1, 64, 8
  DO i_inner = i_outer, i_outer + (8–1)
    DO j = 15, 20
      arr(i_inner,j) = arr(i_inner,j) * 10
    END DO
  END DO 
END DO 

An equivalent transformation for the tiled loop nest is the following:

INTEGER,DIMENSION (64,20) :: arr
INTEGER                   :: i_inner, i_outer, j_inner, j_outer
DO i_outer = 1, 64, 8
  DO j_outer = 1, 20, 7
    DO i_inner = i_outer, i_outer + (8–1)
      DO j_inner = j_outer, MIN(j_outer+(7–1),20)
        arr(i_inner,j_inner) = arr(i_inner,j_inner) * 10
      END DO 
    END DO
  END DO 
END DO 

See Also