Intel® Advisor User Guide

ID 766448
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Parallelize Data - OpenMP Counted Loops

When tasks are loop iterations, and the iterations are over a range of values that are known before the loop starts, the loop is easily expressed in OpenMP.

Consider the following annotated serial C/C++ loop:

    ANNOTATE_SITE_BEGIN(sitename);
        for (int i = lo; i < hi; ++i) {
            ANNOTATE_ITERATION_TASK(taskname);
                statement;
        }
    ANNOTATE_SITE_END();

OpenMP makes it easy to introduce parallelism into loops. With C or C++ programs, add the omp parallel for pragma immediately before the C/C++ for statement:

...
  #pragma omp parallel for
     for (int i = lo; i < hi; ++i) {
      statement;
  }

Consider the following annotated Fortran serial loop:

 call annotate_site_begin("sitename")

     do i = 1, N
     call annotate_iteration_task("taskname")
          statement
     end do
 
call annotate_site_end

With Fortran programs, add the !$omp parallel do directive immediately before the Fortran do statement:

...
   !$omp parallel do
      do i = 1, N
        statement
      end do
   !$omp end parallel do
TIP:
After you rewrite your code to use OpenMP* parallel framework, you can analyze its performance with Intel® Advisor perspectives. Use the Vectorization and Code Insights perspective to analyze how well you OpenMP code is vectorized or use the Offload Modeling perspective to model its performance on a GPU.

The OpenMP compiler support encapsulates the parallel construct. The rules for capturing the locals can be defaulted or specified as part of the pragma or directive. The loop control variable defaults to being private so each iteration sees its allotted value.