Developer Guide and Reference

ID 767253
Date 10/31/2024
Public
Document Table of Contents

Control Thread Allocation

The KMP_HW_SUBSET and KMP_AFFINITY environment variables allow you to control how the OpenMP* runtime uses the hardware threads on the processors. These environment variables allow you to try different thread distributions on the cores of the processors and determine how these threads are bound to the cores. You can use the environment variables to work out what is optimal for your application.

The KMP_HW_SUBSET variable controls the allocation of hardware resources and the KMP_AFFINITY variable controls how the OpenMP threads are bound to those resources.

Control Thread Distribution

The KMP_HW_SUBSET variable controls the hardware resources that will be used by the program. This variable often specifies three layers of machine topology: the number of sockets to use, how many cores to use per socket, and how many threads to use per core. For example, KMP_HW_SUBSET=2s,12c,2t means to use two sockets, 12 cores per socket, and two threads per core, giving a total of 48 available hardware threads.

When more layers exist (NUMA domain, tile, etc.) in the machine topology, you can specify those layers as well. For example, KMP_HW_SUBSET=2s,2n,8c,2t means to use two sockets, two NUMA domains per socket, eight cores per NUMA domain, and two threads per core, giving a total of 64 available hardware threads. For historical reasons, when a layer is not explicitly specified in KMP_HW_SUBSET, it is assumed you want all the resources in that unspecified layer. You can use KMP_AFFINITY=verbose to see all the different detected layers in the machine. For example, KMP_HW_SUBSET=2s,2t is interpreted to mean use two sockets, all cores per socket (and possibly all resources of other detected layers as well), and two threads per layer.

When available, you can specify core attributes to choose different sets of cores. The core attributes are appended to the regular core layer specification with a colon (:) and attribute. There are two attributes to help filter types of cores:

  1. Core type, specified as intel_core, or intel_atom.
  2. Core efficiency, specified as effnum where num is a non-negative integer from zero to the number of core efficiencies detected minus one. The larger the efficiency the more performant the core. For example, KMP_HW_SUBSET=4c:eff0,5c:eff1 will select all sockets, four cores of efficiency 0, five cores of efficiency 1, and all threads per those cores.

There is also a special syntax to explicitly request all resources at a specific layer. Instead of specifying a positive integer, you can use an optional asterisk (* ). For example, KMP_HW_SUBSET=*c:eff0 or KMP_HW_SUBSET=c:eff0 will request all the cores of efficiency 0.

Consider a system with 24 cores and four hardware threads per core. While specifying two threads per core often yields better performance than one thread per core, specifying three or four threads per core may or may not improve the performance. This variable enables you to conveniently measure the performance of up to four threads per core.

For example, you can determine the effects of assigning 24, 48, 72, or the maximum 96 OpenMP threads in a system with 24 cores by specifying the following variable settings:

To Assign This Number of Threads ...

... Use This Setting

24

KMP_HW_SUBSET=24c,1t

48

KMP_HW_SUBSET=24c,2t

72

KMP_HW_SUBSET=24c,3t

96

KMP_HW_SUBSET=24c,4t

CAUTION:

Take care when using the OMP_NUM_THREADS variable along with this variable. Using the OMP_NUM_THREADS variable can result in over or under subscription.

NOTE:

If you use KMP_HW_SUBSET to specify more resources than the system has, the runtime will issue a warning and ignore the setting. For example, setting KMP_HW_SUBSET=24c,5t will be ignored on a system where each core has four hardware threads.

Control Thread Bindings

The KMP_AFFINITY variable controls how the OpenMP threads are bound to the hardware resources allocated by the KMP_HW_SUBSET variable. While this variable can be set to several binding or affinity types, the following are the recommended affinity types to use to run your OpenMP threads on the processor:

  • compact: Distribute the threads sequentially among the cores.

  • scatter: Distribute the threads among the cores in a round robin manner. Distribution is one thread per core initially, followed by repeat distribution among the cores.

The following table shows how the threads are bound to the cores when you want to use three threads per core on two cores by specifying KMP_HW_SUBSET=2c,3t:

Affinity

OpenMP Threads on Core 0

OpenMP Threads on Core 1

KMP_AFFINITY=compact

0, 1, 2

3, 4, 5

KMP_AFFINITY=scatter

0, 2, 4

1, 3, 5

Determine the Best Setting

To determine the best thread distribution and bindings using these variables, use the following:

  1. Ensure that your OpenMP code is working properly before using these environment variables.

  2. Establish a baseline with your current OpenMP code to compare to the performance when you allocate the threads to a processor.

  3. Measure the performance of distributing one, two, three, or four threads per core by use the KMP_HW_SUBSET variable.

  4. Measure the performance of binding the threads to the cores by using the KMP_AFFINITY variable.