Intel® Advisor User Guide

ID 766448
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Parallelize Data - Intel® oneAPI Threading Building Blocks (oneTBB) Loops with Complex Iteration Control

Sometimes the loop control is spread across complex control flow. Using Intel® oneAPI Threading Building Blocks (oneTBB) in this situation requires more features than the simple loops. Note that the task body must not access any of the auto variables defined within the annotation site, because they may have been destroyed before or while the task is running. Consider this serial code:

 extern char a[];
 int previousEnd = -1;
  ANNOTATE_SITE_BEGIN(sitename);
    for (int i=0; i<=100; i++) {
       if (!a[i] || i==100) {
          ANNOTATE_TASK_BEGIN(do_something);
              DoSomething(previousEnd+1,i);
          ANNOTATE_TASK_END();
          previousEnd=i;
       }
    }
 ANNOTATE_SITE_END();

In general, counted loops have better scalability than loops with complex iteration control, because the complex control is inherently sequential. Consider reformulating your code as a counted loop if possible.

The prior example is easily converted to parallelism by using the task_group feature of oneTBB :

 
 #include <tbb/tbb.h>
 ...
 extern char a[]; 
 int previousEnd = -1; 
 task_group g;
    for (int i=0; i<=100; i++) {
        if (!a[i] || i==100) {
            g.run([=]{DoSomething(previousEnd+1,i);}
            previousEnd=i;
        }
    }
  g.wait(); // Wait until all tasks in the group finish
 

Here the lambda expression uses capture by value [=] because it is important for it to grab the values of i and previousEnd when the expression constructs its functor, because afterwards the value of previousEnd and i change.

For more information on tbb::task_group, see the oneTBB documentation.