Visible to Intel only — GUID: GUID-D2787EDA-F37D-4DD1-B148-F9BB66CADEDA
Visible to Intel only — GUID: GUID-D2787EDA-F37D-4DD1-B148-F9BB66CADEDA
TILE
OpenMP* Fortran Compiler Directive: Tiles (or blocks) one or more loops of a loop nest. This feature is only available for ifx.
Syntax
!$OMP TILE clause
loop-nest
[!$OMP END TILE]
clause |
Is SIZES (size-list) The clause is required, and it can appear only once. The size-list is a list of positive integer expressions (s1, … sn), where n is less than or equal to the number of loops in the loop-nest. When n is the number of integer expressions in size-list, the depth of the loop nest must be at least n. The TILE construct replaces the outer n loops with 2n perfectly nested loops. The outer n loops are called the floor loops, f1, … fn, from outermost to innermost floor loop. The inner n loops, nested within floor loop fn, are the tile loops, t1, … tn, from outermost to innermost tile loop. The resulting tile loops do not have canonical form. The value of the expression si specifies the maximum iteration value for the tile loop ti. A tile loop that contains sk iterations, where sk is the kth expression in size-list, is a complete tile; otherwise, it is a partial tile loop. |
loop-nest |
Is a nest of DO loops in canonical form. |
Description
Loop tiling, also known as loop blocking, is a loop transformation performed on loops within loop nests. The transformation splits the processing of data into smaller segments called tiles (or blocks), which allows the data in the tiles (or blocks) to be accessed in parallel.
The loops associated with the construct must be perfectly nested. Each loop associated with the construct must be rectangular, that is, the loop control expressions of an inner loop cannot depend on the loop control variable of an outer loop.
Tiled loops perform best when a loop's iteration count is a multiple of the tile size for that loop. When the iteration count is not a multiple of the tile size, the loop nest may be transformed in a number of different ways. This allows partial tiles to execute in a manner that is optimal for the target hardware. The specified order of iterations must be preserved in the complete tile loops.
TILE constructs can be nested. If two TILE constructs are nested, the result is as if the outer TILE construct is applied to the resulting transformed loop nest created by the inner TILE construct.
Examples
In the following example, the loop iteration counts are each a multiple of their corresponding tile size, so there are no resulting partial tile loops. The PARALLEL DO construct is applied to the transformed loop nest.
The inner loop iterates through the rows, and the outer loop iterates through the columns, of a 64 x 20 matrix. The size expressions 8 and 4 specified in the SIZES clause of the TILE construct indicate a 8 x 4 blocking applied to the outer and inner loops:
INTEGER,DIMENSION (64,20) :: arr
INTEGER :: i, j
!$OMP PARALLEL DO
!$OMP TILE SIZES(8,4)
DO i = 1, 64
DO j = 1, 20
arr(i,j) = arr(i,j)*10
END DO
END DO
The transformed tiled loops are the following:
INTEGER,DIMENSION (64,20) :: arr
INTEGER :: i_inner, i_outer, j_inner, j_outer
!$OMP PARALLEL DO
DO i_outer = 1, 64, 8
DO j_outer = 1, 20, 4
DO i_inner = i_outer, i_outer+(8-1)
DO j_inner = j_outer, j_outer+(4–1)
arr(i_inner,j_inner) = arr(i_inner,j_inner)*10
END DO
END DO
END DO
END DO
In the following example, the inner loop iteration count 20 is not a multiple of the corresponding tile size 7. To handle the remaining iterations, there may be a partial tile loop depending on how the loop is transformed:
INTEGER,DIMENSION (64,20) :: arr
INTEGER :: i, j
!$OMP TILE SIZES(8,7)
DO i = 1, 64
DO j = 1, 20
arr(i,j) = arr(i,j) * 10
END DO
END DO
In this case, various transformations are possible. The compiler is free to pick a transformation that is optimal for the target hardware. The order of execution of one tile with respect to other tiles can be changed, but within a given tile, the order of iteration execution must be preserved.
One possible transformation for the above loop nest is the following:
INTEGER,DIMENSION (64,20) :: arr
INTEGER :: i_inner, i_outer, j_inner, j_outer, j
! Complete tiles
DO i_outer = 1, 64, 8
DO j_outer = 1, 14, 7
DO i_inner = i_outer, i_outer + (8–1)
DO j_inner = j_outer, j_outer + (7–1)
arr(i_inner,j_inner) = arr(i_inner,j_inner)*10
END DO
END DO
END DO
END DO
! Partial tiles
DO i_outer = 1, 64, 8
DO i_inner = i_outer, i_outer + (8–1)
DO j = 15, 20
arr(i_inner,j) = arr(i_inner,j) * 10
END DO
END DO
END DO
An equivalent transformation for the tiled loop nest is the following:
INTEGER,DIMENSION (64,20) :: arr
INTEGER :: i_inner, i_outer, j_inner, j_outer
DO i_outer = 1, 64, 8
DO j_outer = 1, 20, 7
DO i_inner = i_outer, i_outer + (8–1)
DO j_inner = j_outer, MIN(j_outer+(7–1),20)
arr(i_inner,j_inner) = arr(i_inner,j_inner) * 10
END DO
END DO
END DO
END DO
See Also
To learn more about canonical form loops, see the OpenMP* specification.