Visible to Intel only — GUID: GUID-C056AE26-EF71-4819-BCE6-C1F66F566E1D
Visible to Intel only — GUID: GUID-C056AE26-EF71-4819-BCE6-C1F66F566E1D
Control Binary Execution on Multiple CPU Cores
Environment Variables
The following environment variables control the placement of SYCL* or OpenMP* threads on multiple CPU cores during program execution. Use these variables if you are using the OpenCL™ runtime CPU device to offload to a CPU.
Environment Variable |
Description |
---|---|
DPCPP_CPU_CU_AFFINITY |
Set thread affinity to CPU. The value and meaning is the following:
This environment variable is similar to the OMP_PROC_BIND variable used by OpenMP. Default: Not set |
DPCPP_CPU_SCHEDULE |
Specify the algorithm for scheduling work-groups by the scheduler. Currently, the SYCL runtime uses Intel® oneAPI Threading Building Blocks (Intel® oneTBB) for scheduling. The value selects the petitioner used by the Intel oneTBB scheduler. The value and meaning is the following:
Default: Dynamic |
DPCPP_CPU_NUM_CUS |
Set the numbers threads used for kernel execution. To avoid over subscription, maximum value of DPCPP_CPU_NUM_CUS should be the number of hardware threads. If DPCPP_CPU_NUM_CUS is 1, all the workgroups are executed sequentially by a single thread and this is useful for debugging. This environment variable is similar to OMP_NUM_THREADS variable used by OpenMP. Default: Not set. Determined by Intel oneTBB. |
DPCPP_CPU_PLACES |
Specify the places that affinities are set. The value is { sockets | numa_domains | cores | threads }. This environment variable is similar to the OMP_PLACES variable used by OpenMP. If value is numa_domains, Intel oneTBB NUMA API will be used. This is analogous to OMP_PLACES=numa_domains in the OpenMP 5.1 Specification. Intel oneTBB task arena is bound to numa node and SYCL nd range is uniformly distributed to task arenas. DPCPP_CPU_PLACES is suggested to be used together with DPCPP_CPU_CU_AFFINITY. Default: cores |
See the Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference for more information about all supported environment variables.
Example 1: Hyper-threading Enabled
Assume a machine with 2 sockets, 4 physical cores per socket, and each physical core has 2 hyper threads.
S<num> denotes the socket number that has 8 cores specified in a list
T<num> denotes the Intel® oneAPI Threading Building Blocks (Intel® oneTBB) thread number
“-” means unused core
DPCPP_CPU_NUM_CUS=16 export DPCPP_CPU_PLACES=sockets DPCPP_CPU_CU_AFFINITY=close: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15] DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T2 T4 T6 T8 T10 T12 T14] S1:[T1 T3 T5 T7 T9 T11 T13 T15] DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15] export DPCPP_CPU_PLACES=cores DPCPP_CPU_CU_AFFINITY=close : S0:[T0 T8 T1 T9 T2 T10 T3 T11] S1:[T4 T12 T5 T13 T6 T14 T7 T15] DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T8 T2 T10 T4 T12 T6 T14] S1:[T1 T9 T3 T11 T5 T13 T7 T15] DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15] export DPCPP_CPU_PLACES=threads DPCPP_CPU_CU_AFFINITY=close: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15] DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T2 T4 T6 T8 T10 T12 T14] S1:[T1 T3 T5 T7 T9 T11 T13 T15] DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15] export DPCPP_CPU_NUM_CUS=8 DPCPP_CPU_PLACES=sockets, cores and threads have the same bindings: DPCPP_CPU_CU_AFFINITY=close close: S0:[T0 - T1 - T2 - T3 -] S1:[T4 - T5 - T6 - T7 -] DPCPP_CPU_CU_AFFINITY=close spread: S0:[T0 - T2 - T4 - T6 -] S1:[T1 - T3 - T5 - T7 -] DPCPP_CPU_CU_AFFINITY=close master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[]
Example 2: Hyper-threading Disabled
Assume a machine with 2 sockets, 4 physical cores per socket, and each physical core has 2 hyper threads.
S<num> denotes the socket number that has 8 cores specified in a list
T<num> denotes the Intel oneTBB thread number
“-” means unused core
export DPCPP_CPU_NUM_CUS=8 DPCPP_CPU_PLACES=sockets, cores and threads have the same bindings: DPCPP_CPU_CU_AFFINITY=close: S0:[T0 T1 T2 T3] S1:[T4 T5 T6 T7] DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T2 T4 T6] S1:[T1 T3 T5 T7] DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3] S1:[T4 T5 T6 T7] export DPCPP_CPU_NUM_CUS=4 DPCPP_CPU_PLACES=sockets, cores and threads have the same bindings: DPCPP_CPU_CU_AFFINITY=close: S0:[T0 - T1 - ] S1:[T2 - T3 - ] DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 - T2 - ] S1:[T1 - T3 - ] DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3] S1:[ - - - - ]