Visible to Intel only — GUID: GUID-062A3B81-B683-42C5-BC15-E346E3BF069C
Control Binary Execution on Multiple CPU Cores
Environment Variables
The following environment variables control the placement of SYCL* or OpenMP* threads on multiple CPU cores during program execution. Use these variables if you are using the OpenCL™ runtime CPU device to offload to a CPU.
Environment Variable |
Description |
---|---|
DPCPP_CPU_CU_AFFINITY |
Set thread affinity to CPU. The value and meaning is the following:
This environment variable is similar to the OMP_PROC_BIND variable used by OpenMP. Default: Not set |
DPCPP_CPU_SCHEDULE |
Specify the algorithm for scheduling work-groups by the scheduler. Currently, the SYCL runtime uses Intel® oneAPI Threading Building Blocks (Intel® oneTBB) for scheduling. The value selects the petitioner used by the Intel oneTBB scheduler. The value and meaning is the following:
Default: Dynamic |
DPCPP_CPU_NUM_CUS |
Set the numbers threads used for kernel execution. To avoid over subscription, maximum value of DPCPP_CPU_NUM_CUS should be the number of hardware threads. If DPCPP_CPU_NUM_CUS is 1, all the workgroups are executed sequentially by a single thread and this is useful for debugging. This environment variable is similar to OMP_NUM_THREADS variable used by OpenMP. Default: Not set. Determined by Intel oneTBB. |
DPCPP_CPU_PLACES |
Specify the places that affinities are set. The value is { sockets | numa_domains | cores | threads }. This environment variable is similar to the OMP_PLACES variable used by OpenMP. If value is numa_domains, Intel oneTBB NUMA API will be used. This is analogous to OMP_PLACES=numa_domains in the OpenMP 5.1 Specification. Intel oneTBB task arena is bound to numa node and SYCL nd range is uniformly distributed to task arenas. DPCPP_CPU_PLACES is suggested to be used together with DPCPP_CPU_CU_AFFINITY. Default: cores |
See the Intel oneAPI DPC++/C++ Compiler Developer Guide and Reference for more information about all supported environment variables.
Allocating Host Memory
When using OpenMP, you can allocate host memory so it can be shared with the device by using this API:
EXTERN void *llvm_omp_target_alloc_host(size_t Size, int DeviceNum)
For more information on memory allocation, see the Level Zero Core Programming Guide.
Example 1: Hyper-threading Enabled
Assume a machine with 2 sockets, 4 physical cores per socket, and each physical core has 2 hyper threads.
S<num> denotes the socket number that has 8 cores specified in a list
T<num> denotes the Intel® oneAPI Threading Building Blocks (Intel® oneTBB) thread number
“-” means unused core
DPCPP_CPU_NUM_CUS=16
export DPCPP_CPU_PLACES=sockets
DPCPP_CPU_CU_AFFINITY=close: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15]
DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T2 T4 T6 T8 T10 T12 T14] S1:[T1 T3 T5 T7 T9 T11 T13 T15]
DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15]
export DPCPP_CPU_PLACES=cores
DPCPP_CPU_CU_AFFINITY=close : S0:[T0 T8 T1 T9 T2 T10 T3 T11] S1:[T4 T12 T5 T13 T6 T14 T7 T15]
DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T8 T2 T10 T4 T12 T6 T14] S1:[T1 T9 T3 T11 T5 T13 T7 T15]
DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15]
export DPCPP_CPU_PLACES=threads
DPCPP_CPU_CU_AFFINITY=close: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15]
DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T2 T4 T6 T8 T10 T12 T14] S1:[T1 T3 T5 T7 T9 T11 T13 T15]
DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[T8 T9 T10 T11 T12 T13 T14 T15]
export DPCPP_CPU_NUM_CUS=8
DPCPP_CPU_PLACES=sockets, cores and threads have the same bindings:
DPCPP_CPU_CU_AFFINITY=close close: S0:[T0 - T1 - T2 - T3 -] S1:[T4 - T5 - T6 - T7 -]
DPCPP_CPU_CU_AFFINITY=close spread: S0:[T0 - T2 - T4 - T6 -] S1:[T1 - T3 - T5 - T7 -]
DPCPP_CPU_CU_AFFINITY=close master: S0:[T0 T1 T2 T3 T4 T5 T6 T7] S1:[]
Example 2: Hyper-threading Disabled
Assume a machine with 2 sockets, 4 physical cores per socket, and each physical core has 2 hyper threads.
S<num> denotes the socket number that has 8 cores specified in a list
T<num> denotes the Intel oneTBB thread number
“-” means unused core
export DPCPP_CPU_NUM_CUS=8
DPCPP_CPU_PLACES=sockets, cores and threads have the same bindings:
DPCPP_CPU_CU_AFFINITY=close: S0:[T0 T1 T2 T3] S1:[T4 T5 T6 T7]
DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 T2 T4 T6] S1:[T1 T3 T5 T7]
DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3] S1:[T4 T5 T6 T7]
export DPCPP_CPU_NUM_CUS=4
DPCPP_CPU_PLACES=sockets, cores and threads have the same bindings:
DPCPP_CPU_CU_AFFINITY=close: S0:[T0 - T1 - ] S1:[T2 - T3 - ]
DPCPP_CPU_CU_AFFINITY=spread: S0:[T0 - T2 - ] S1:[T1 - T3 - ]
DPCPP_CPU_CU_AFFINITY=master: S0:[T0 T1 T2 T3] S1:[ - - - - ]