Intel® Advisor User Guide

ID 766448
Date 10/31/2024
Public
Document Table of Contents

Replace Annotations with OpenMP* Code

This topic explains the steps needed to implement parallelism proposed by the Intel Advisor annotations by adding OpenMP* parallel framework code.

  • Add OpenMP code to provide appropriate synchronization of shared resources, using the LOCK annotations as a guide.

  • Add code to create OpenMP tasks, using the SITE/TASK annotations as a guide.

The recommended order for replacing the annotations with OpenMP code:

  1. Add appropriate synchronization of shared resources, using LOCK annotations as a guide.

  2. Test to verify you did not break anything, before adding the possibility of non-deterministic behavior with parallel tasks.

  3. Add code to create OpenMP parallel sections or equivalent, using the SITE/TASK annotations as a guide.

  4. Test with one thread to verify that your program still works correctly. For example, set the environment variable OMP_NUM_THREADS to 1 before you run your program.

  5. Test with more than one thread to see that the multithreading works as expected.

TIP:
After you rewrite your code to use OpenMP* parallel framework, you can analyze its performance with Intel® Advisor perspectives. Use the Vectorization and Code Insights perspective to analyze how well you OpenMP code is vectorized or use the Offload Modeling perspective to model its performance on a GPU.

OpenMP creates worker threads automatically. In general, you should concern yourself only with the tasks, and leave it to the parallel frameworks to create and destroy the worker threads.

If you do need some control over creation and destruction of worker threads, see the compiler documentation. For example, to limit the number of threads, set the OMP_THREAD_LIMIT or the OMP_NUM_THREADS environment variable.

The table below shows the serial, annotated program code in the left column and the equivalent OpenMP C/C++ and Fortran parallel code in the right column for some typical code to which parallelism can be applied.

Serial C/C++ and Fortran Code with Intel Advisor Annotations Parallel C/C++ and Fortran Code using OpenMP
// Synchronization, C/C++
ANNOTATE_LOCK_ACQUIRE(0);
  Body();
ANNOTATE_LOCK_RELEASE(0);
// Synchronization can use OpenMP 
// critical sections, atomic operations, locks, 
// and reduction operations (shown later)
! Synchronization, Fortran
call annotate_lock_acquire(0)
    body
call annotate_lock_release(0)
// Synchronization can use OpenMP 
// critical sections, atomic operations, locks, 
// and reduction operations (shown later)
// Parallelize data - one task within a
// C/C++ counted loop
ANNOTATE_SITE_BEGIN(site);
  for (i = lo; i < n; ++i) {
    ANNOTATE_ITERATION_TASK(task);
      statement;
  }
ANNOTATE_SITE_END();
// Parallelize data - one task, C/C++ counted loops 
  #pragma omp parallel for
     for (int i = lo; i < n; ++i) {
      statement;
      }
! Parallelize data - one task within a 
    ! Fortran counted loop
    call annotate_site_begin("site1")
    do i = 1, N
    call annotate_iteration_task("task1")
        statement
    end do
    call annotate_site_end
! Parallelize data - one task with a  
    ! Fortran counted loop 
    !$omp parallel do
      do i = 1, N
        statement
      end do
    !$omp end parallel do

// Parallelize C/C++ functions
ANNOTATE_SITE_BEGIN(site);
  ANNOTATE_TASK_BEGIN(task1);
    function_1();
  ANNOTATE_TASK_END();
  ANNOTATE_TASK_BEGIN(task2);
    function_2();
  ANNOTATE_TASK_END();
ANNOTATE_SITE_END();
// Parallelize C/C++ functions 
#pragma omp parallel //start parallel region
{
 #pragma omp sections 
   {
   #pragma omp section 
      function_1();
   #pragma omp section 
      function_2();
   }
} // end parallel region
! Parallelize Fortran functions 
call annotate_site_begin("site1")
call annotate_task_begin("task1")
   call subroutine_1
call annotate_task_end
call annotate_task_begin("task2")
   call subroutine_2
call annotate_task_end
call annotate_site_end
! Parallelize Fortran functions 
!$omp parallel ! start parallel region
  !$omp sections
   !$omp section
    call subroutine_1
   !$omp section
    call subroutine_2
  !$omp end sections
!$omp end parallel ! end parallel region