Intel® Advisor User Guide

ID 766448
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Parallel Processing Terminology

A serial (non-parallel) program uses a single thread, so you do not need to control the side-effects that can occur when multiple threads interact with shared resources.

A program takes time to run to completion. A serial program only uses a single core, so its run time will not decrease by running it on a system with multiple cores. However, if you add parallel processing (parallelism) to parts of the program, it can use more cores, so it finishes sooner.

Threads and Tasks

An operating system process has an address space, open files, and other resources. A thread executes instructions within a process. Each process has one or more threads active at a time. Threads share the address space of the process, but have their own stack, program counter, and other registers. A program that uses multiple threads is called a multithreadedor parallel program.

A task is a portion of a program that can be run in parallel with other portions of the program and other instances of that task. Each task instance is run by a thread, and the operating system assigns threads to cores.

Hotspots - Find Where a Program Spends Its Time

A hotspot is a small code region that consumes much of the program's run time. You can use profiling tools such as the Survey tool provided with Intel Advisor to identify where your program spends it time. To improve your program's performance when you add parallelism:

  • Find the hotspots and hot parts of the call tree, such as hot loops or hot routines. The Intel Advisor Survey tool's report provides an extended top-down call tree that identifies the top hot loops.

  • Examine all the functions in the call tree from main() to each hot routine or loop. You want to distribute frequently executed instructions to different tasks that can run at the same time.

Data and Task Parallelism

If the hot part of the call tree is caused by executing the same region of code many times, it may be possible to divide its execution by running multiple instances of its code, each on a separate core. This is called data parallelism because each execution is processing different parts of the same composite data item. Compute-intensive loops over arrays are often good candidates for data parallelism. For example, the line process(a[i]); below is a possible task:

 for (int i = 0; i != n; ++i) {
    process(a[i]);
 }

If two or more hotspots are close to each other in the serial execution, and do not share data, it may be possible to execute the hotspots as tasks. This is task parallelism. For example:

 initialize(data); 
 while (!done) {
    old_data = data;
    display_on_screen(old_data);
    update(data);
 }

Making effective use of multiple cores may require both data-level parallelism to process large amounts of data, and task-parallelism to overlap the execution of unrelated portions of the program.