Intel® Advisor User Guide

ID 766448
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Data and Task Parallelism

This topic describes two fundamental types of program execution - data parallelism and task parallelism - and the task patterns of each.

Data Parallelism

In many programs, most of the work is done processing items in a collection of data, often in a loop. The data parallelism pattern is designed for this situation. The idea is to process each data item or a subset of the data items in separate task instances. In general, the parallel site contains the code that invokes the processing of each data item, and the processing is done in a task.

In the most common version of this pattern, the serial program has a loop that iterates over the data items, and the loop body processes each item in turn. The data parallelism pattern makes the whole loop a parallel site, and the loop body is a task. Consider this C/C++ simple loop:

ANNOTATE_SITE_BEGIN(sitename);
for (int I = 0; I != n; ++I) {
    ANNOTATE_ITERATION_TASK(task_process);
    process(a[i]);
}
ANNOTATE_SITE_END();

The following C/C++ code shows a situation where the data items to be processed are in the nodes of a tree. The recursive tree walk is part of the serial execution of the parallel site - only the process_node calls are executed in separate tasks.

ANNOTATE_SITE_BEGIN(sitename);
process_subtree(root);
ANNOTATE_SITE_END(sitename);
. . .
void process_subtree(node) // in the dynamic extent of the parallel site
{
    ANNOTATE_TASK_BEGIN(task_process);
    process_node(node);
    ANNOTATE_TASK_END();
    for (child = first_child(node);
         child;
         child = next_child(child) )
    {
        process_subtree(child);
    }
}

In the data parallelism pattern, the parallel site usually contains a single task.

The sample tachyon_Advisor demonstrates data parallelism.

Task Parallelism

When work is divided into several activities which you cannot parallelize individually, you may be able to take advantage of the task parallelism pattern.

NOTE:

The word task in task parallelism is used in the general sense of an activity or job. It is just a coincidence that we use the same word to refer to "a body of code that is executed independently of other bodies of code".

In this pattern, you have multiple distinct task bodies in a parallel site performing different activities at the same time.

Suppose that neither the display nor the update operation from the previous example can be parallelized individually. You still might be able to do the display and the update simultaneously. Consider this C/C++ code:

initialize(data);
while (!done) {
    old_data = data;
    ANNOTATE_SITE_BEGIN(sitename);
    ANNOTATE_TASK_BEGIN(task_display);
    display_on_screen(old_data);
    ANNOTATE_TASK_END();
    ANNOTATE_TASK_BEGIN(task_updatedata);
    update(data);
    ANNOTATE_TASK_END();
    ANNOTATE_SITE_END();
}

The most obvious shortcoming of the task-parallel pattern is that it cannot take advantage of more cores than the number of distinct tasks. In this example, any more than two cores would be wasted. On the other hand, the task parallel pattern may be applicable to programs that simply do not fit the data parallel pattern - some parallelism may be better than none.

The tasks used in task parallelism are not limited to called functions. For example, consider this C/C++ code that creates two tasks that separately increment variables X and Y:

main() {
ANNOTATE_SITE_BEGIN(sitename);
  ANNOTATE_TASK_BEGIN(task_x);     
     X++;
  ANNOTATE_TASK_END();

  ANNOTATE_TASK_BEGIN(task_y);
     Y++;
  ANNOTATE_TASK_END();
ANNOTATE_SITE_END(); 
}

The sample stats demonstrates task parallelism.