Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference

ID 767251
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

DO Directive

OpenMP* Fortran Compiler Directive: Specifies that the iterations of the immediately following DO loop must be executed in parallel.

Syntax

!$OMP DO [clause[[,] clause] ... ]

   do_loop

[!$OMP END DO [NOWAIT]]

clause

Is one of the following:

  • ALLOCATE ([allocator :] list)

  • COLLAPSE (n)

  • FIRSTPRIVATE (list)

  • LASTPRIVATE ([CONDITIONAL:] list)

  • LINEAR (linear-list [: linear-modifier [, linear-modifier]])

  • NOWAIT

  • ORDER ([order-modifier :] CONCURRENT) (ifx only)

  • ORDERED [ (n) ]

    Must be used if ordered sections are contained in the dynamic extent of the DO directive. For more information about ordered sections, see the ORDERED directive.

    If n is specified, it must be a positive scalar integer constant expression.

    The ORDERED clause must not appear in the worksharing-loop (DO) directive if the loops associated with the worksharing-loop construct include loops generated as the result of a TILE directive, or if the ORDER clause is specified.

  • PRIVATE (list)

  • REDUCTION ([reduction-modifier, ]reduction-identifier : list)

    If the REDUCTION clause contains the INSCAN reduction-modifier, the DO directive must not contain an ORDERED or a SCHEDULE clause.

  • SCHEDULE ([modifier [, modifier]:] kind[, chunk_size])

    Specifies how iterations of the DO loop are divided among the threads of the team. chunk_size must be a loop invariant positive scalar integer expression. The value of chunk_size must be the same for all threads in the team. The following kinds are permitted, only some of which allow the optional parameter chunk_size:

    Kinds

    Effect

    STATIC

    Divides iterations into contiguous pieces by dividing the number of iterations by the number of threads in the team. Each piece is then dispatched to a thread before loop execution begins.

    If chunk_size is specified, iterations are divided into pieces of a size specified by chunk_size. The pieces are statically dispatched to threads in the team in a round-robin fashion in the order of the thread number.

    DYNAMIC

    Can be used to get a set of iterations dynamically. It defaults to 1 unless chunk_size is specified.

    If chunk_size is specified, the iterations are broken into pieces of a size specified by chunk. As each thread finishes a piece of the iteration space, it dynamically gets the next set of iterations.

    GUIDED

    Can be used to specify a minimum number of iterations. It defaults to 1 unless chunk_size is specified.

    If chunk_size is specified, the chunk size is reduced exponentially with each succeeding dispatch. The chunk_size specifies the minimum number of iterations to dispatch each time. If there are less than chunk_size iterations remaining, the rest are dispatched.

    AUTO1

    Delegates the scheduling decision until compile time or runtime. The schedule is processor dependent. The programmer gives the implementation the freedom to choose any possible mapping of iterations to threads in the team.

    RUNTIME1

    Defers the scheduling decision until runtime. You can choose a schedule type and chunk size at runtime by using the environment variable OMP_SCHEDULE.

    1No chunk_size is permitted for this type.

    At most one SCHEDULE clause can appear. If the SCHEDULE clause is not used, the default schedule type is STATIC.

    modifier can be one of the following:

    Modifier

    Effect

    MONOTONIC

    Each thread executes the chunks that it is assigned in increasing logical iteration order.

    NONMONOTONIC1

    Chunks are assigned to threads in any order and the behavior of an application that depends on any execution order of the chunks is unspecified.

    SIMD

    When do_loop is associated with an OMP SIMD construct, the chunk_size for all chunks except the first and last chunks is:

    new_chunk_size = (chunk_size /simd_width)*simd_width

    where simd_width is an implementation-defined value.

    The first chunk will have at least new_chunk_size iterations unless it is also the last chunk. The last chunk may have fewer iterations than new_chunk_size.

    If SIMD is specified and the loop is not associated with an OMP SIMD construct, the modifier is ignored.

    1NONMONOTONIC can only be specified with SCHEDULE(DYNAMIC) or SCHEDULE(GUIDED).

    If the schedule kind is STATIC or if the ORDERED clause appears, and if MONOTONIC does not appear, the effect will be as if MONOTONIC was specified. NONMONOTONIC cannot be specified if the ORDERED clause appears. Either MONOTONIC or NONMONTONIC can appear but not both.

    modifier cannot appear if the LINEAR clause appears.

    The SIMD modifier can be used with MONOTONIC or NONMONOTONIC in either order. The SIMD modifier and the MONOTONIC modifier can be used with all kinds.

do_loop

Is a DO iteration (an iterative DO loop). It cannot be a DO WHILE or a DO loop without loop control. The DO loop iteration variable must be of type integer.

The iterations of the DO loop are distributed across the existing team of threads. The values of the loop control parameters of the DO loop associated with a DO directive must be the same for all the threads in the team.

You cannot branch out of a DO loop associated with a DO directive.

The binding thread set for a DO construct is the current team. A DO loop region binds to the innermost enclosing parallel region.

If used, the END DO directive must appear immediately after the end of the loop. If you do not specify an END DO directive, an END DO directive is assumed at the end of the DO loop.

If you specify NOWAIT, threads do not synchronize at the end of the parallel loop. Threads that finish early proceed straight to the instruction following the loop without waiting for the other members of the team to finish the DO directive.

Parallel DO loop control variables are block-level entities within the DO loop. If the loop control variable also appears in the LASTPRIVATE list of the parallel DO, it is copied out to a variable of the same name in the enclosing PARALLEL region. The variable in the enclosing PARALLEL region must be SHARED if it is specified in the LASTPRIVATE list of a DO directive.

Only a single SCHEDULE, COLLAPSE, or ORDERED clause can appear in a DO directive.

ORDERED (n) specifies how many loops are associated with the DO directive and it specifies that those associated loops form a doacross loop nest. n does not affect how the logical iteration space is divided.

If you specify COLLAPSE (M) ORDERED (N) for loops nested K deep, the following rules apply:

  • If either M > K or N > K, the behavior is unspecified.

  • N must be greater than M

A LINEAR clause or an ORDERED (n) clause can be specified in a DO directive, but not both.

DO directives must be encountered by all threads in a team or by none at all. It must also be encountered in the same order by all threads in a team.

Example

In the following example, the loop iteration variable is private by default, and it is not necessary to explicitly declare it. The END DO directive is optional:

  !$OMP PARALLEL
  !$OMP DO
        DO I=1,N
          B(I) = (A(I) + A(I-1)) / 2.0
        END DO
  !$OMP END DO
  !$OMP END PARALLEL

If there are multiple independent loops within a parallel region, you can use the NOWAIT keyword in the END DO directive, or the NOWAIT clause in the DO directive, to avoid the implied BARRIER at the end of the DO directive, as follows:

  !$OMP PARALLEL
  !$OMP DO
        DO I=2,N
          B(I) = (A(I) + A(I-1)) / 2.0
        END DO
  !$OMP END DO NOWAIT
  !$OMP DO
        DO I=1,M
          Y(I) = SQRT(Z(I))
        END DO
  !$OMP END DO NOWAIT
  !$OMP END PARALLEL

Correct execution sometimes depends on the value that the last iteration of a loop assigns to a variable. Such programs must list all such variables as arguments to a LASTPRIVATE clause so that the values of the variables are the same as when the loop is executed sequentially, as follows:

  !$OMP PARALLEL
  !$OMP DO LASTPRIVATE(I)
        DO I=1,N
          A(I) = B(I) + C(I)
        END DO
  !$OMP END PARALLEL
        CALL REVERSE(I)

In this case, the value of I at the end of the parallel region equals N+1, as in the sequential case.

Ordered sections are useful for sequentially ordering the output from work that is done in parallel. Assuming that a reentrant I/O library exists, the following program prints out the indexes in sequential order:

  !$OMP DO ORDERED SCHEDULE(DYNAMIC)
        DO I=LB,UB,ST
          CALL WORK(I)
        END DO
        ...
        SUBROUTINE WORK(K)
  !$OMP ORDERED
        WRITE(*,*) K
  !$OMP END ORDERED

In the next example, the loops over J1 and J2 are collapsed and their iteration space is executed by all threads of the current team:

!$OMP DO COLLAPSE(2) PRIVATE(J1, J2, J3)
    DO J1 = J1_L, J1_U, J1_S
        DO J2 = J2_L, J2_U, J2_S
            DO J3 = J3_L, J3_U, J3_S
                CALL BAR(A, J1, J2, J3)
            ENDDO
        ENDDO
    ENDDO
!$OMP END DO