Visible to Intel only — GUID: GUID-EAFF959F-6620-48D4-838F-D99760A501AD
Visible to Intel only — GUID: GUID-EAFF959F-6620-48D4-838F-D99760A501AD
GPU Buffers Support
Short Description
This feature enables handling of device buffers in MPI functions such as MPI_Send, MPI_Recv, MPI_Bcast, MPI_Allreduce, and so on by using the Level Zero* library specified in the I_MPI_OFFLOAD_LEVEL_ZERO_LIBRARY variable.
Tto pass a pointer of an offloaded memory region to MPI, you may need to use specific compiler directives or get it from corresponding acceleration runtime API. For example, use_device_ptr and use_device_addr are useful keywords to obtain device pointers in the OpenMP environment, as shown in the following code sample:
/* Copy data from host to device */ #pragma omp target data map(to: rank, values[0:num_values]) use_device_ptr(values) { /* Compute something on device */ #pragma omp target parallel for is_device_ptr(values) for (unsigned i = 0; i < num_values; ++i) { values[i] *= (rank + 1); } /* Send device buffer to another rank */ MPI_Send(values, num_values, MPI_INT, dest_rank, tag, MPI_COMM_WORLD); }
To achieve the best performance, use the same GPU buffer in MPI communications if possible. It helps Intel® MPI Library cache necessary structures to communicate with the device and reuse them in next iterations.
Set I_MPI_OFFLOAD=0 to disable this feature if you do not provide device buffers to MPI primitives, since handling of device buffers can affect performance.
I_MPI_OFFLOAD_MEMCPY
Set this environment variable to select the GPU memcpy kind
Syntax
I_MPI_OFFLOAD_MEMCPY=<value>
Arguments
Value | Description |
---|---|
cached | Cache created objects for communication with GPU so that they can be reused if the same device buffer is later provided to the MPI function. Default value. |
blocked | Copy device buffer to host and wait for the copy to be completed inside MPI function. |
nonblocked | Copy device buffer to host and do not wait for the copy to be completed inside MPI function. Wait for the operation completion in MPI_Wait. |
Description
Set this environment variable to select the GPU memcpy kind. The best performed option is chosen by default. Nonblocked memcpy can be used with MPI non-blocked point-to-point operations to achieve the overlap with compute part. Blocked memcpy can be used if other types are not stable.
I_MPI_OFFLOAD_PIPELINE
Set this environment variable to enable pipeline algorithm.
Syntax
I_MPI_OFFLOAD_PIPELINE=<value>
Arguments
Value | Description |
---|---|
0 | Disable pipeline algorithm. |
1 | Enable pipeline algorithm. Default value. |
Description
Set this environment variable to enable pipeline algorithm, which can improve performance for large message sizes. The main idea of the algorithm is to split user buffer into several segment, and copy the segments to the host and send them to another rank.
I_MPI_OFFLOAD_PIPELINE_THRESHOLD
Set this environment variable to control the threshold for pipeline algorithm.
Syntax
I_MPI_OFFLOAD_PIPELINE_THRESHOLD=<value>
Arguments
Value | Description |
---|---|
0 | Threshold in bytes. The default value is 65536 |
I_MPI_OFFLOAD_RDMA
Set this environment variable to enable GPU RDMA.
Syntax
I_MPI_OFFLOAD_RDMA=<value>
Arguments
Value | Description |
---|---|
0 | Disable RDMA. Default value |
1 | Enable RDMA |
Description
Set this environment variable to enable GPU direct transfer using GPU RDMA. When this capability is supported by the network, enabling this environment variable enables direct data transfer between two GPUs.
I_MPI_OFFLOAD_FAST_MEMCPY
Set this environment variable to enable/disable fast memcpy for GPU buffers.
Syntax
I_MPI_OFFLOAD_FAST_MEMCPY=<value>
Arguments
Value | Description |
---|---|
0 | Disable fast memcpy |
1 | Enable fast memcpy. Default value |
Description
Set this environment variable to enable/disable fast memcpy to optimize performance for small message sizes.
NEOReadDebugKeys=1
EnableImplicitScaling=0
I_MPI_OFFLOAD_IPC
Set this environment variable to enable/disable GPU IPC
Syntax
I_MPI_OFFLOAD_IPC=<value>
Arguments
Value | Description |
---|---|
0 | Disable IPC path |
1 | Enable IPC path. Default value |
Description
Set this environment variable to enable/disable GPU IPC. When this capability is supported by the system and devices, enabling this environment variable enables direct data transfer between two GPUs on the same node.