Visible to Intel only — GUID: GUID-A2B3C286-647B-4436-A624-D9581CEE9091
Visible to Intel only — GUID: GUID-A2B3C286-647B-4436-A624-D9581CEE9091
Device-Initiated Communications
Intel® MPI Library supports device-initiated one-sided communications to provide you with the ability to perform communications directly from the kernels executed on GPU without interrupting the kernel.
This feature allows you to use the existing MPI primitives without changing semantics with SYCL*, OpenMP* offload, and Intel® Data Center GPU Max Series. To enable device-initiated communications, in addition to the I_MPI_OFFLOAD=1 environment variable, set I_MPI_OFFLOAD_ONESIDED_DEVICE_INITIATED=1.
The current version supports the next set of primitives available on the device:
- Communication primitives:
- MPI_Put
- MPI_Get
- Passive-target synchronization primitives:
- MPI_Win_lock
- MPI_Win_lock_all
- MPI_Win_unlock
- MPI_Win_unlock_all
- MPI_Win_flush
- MPI_Win_flush_all
- Active-target synchronization primitives:
- MPI_Win_fence
- Window-query primitives and group management:
- MPI_Win_get_attr
- MPI_Win_shared_query
- MPI_Win_get_group
- MPI_Group_free
- MPI_Group_size
- Intel® MPI notified one-sided communication extension:
- MPIX_Win_set_notify
- MPIX_Win_get_notify
- MPIX_Get_notify
- MPIX_Put_notify
Language Support
Communication Primitives | Passive-Target Synchronization Primitives | Active-Target Synchronization Primitives | Window-Query Primitives and Group Management | Intel® MPI Notified One-Sided Communication Extension | |
SYCL* | Supported | Supported | Supported | Supported | Supported |
C/OpenMP* | Supported | Supported | Supported | Supported | Supported |
F77*/OpenMP* | Supported | Supported | Supported | N/A | N/A |
F90*/OpenMP* | Supported | Supported | Supported | N/A | N/A |
Examples
You can modify your code to incorporate device-initiated communications:
sycl::queue q; MPI_Win win; int peer_rank = X; // Allocate device memory local to a process int *buf = sycl::malloc_device<int>(size, q); // Create a RMA-Window device memory MPI_Win_create(buf, (MPI_Aint) size * sizeof(int), sizeof(int), MPI_INFO_NULL, MPI_COMM_WORLD, &win); // SYCL automatically capture local variables q.submit([&](sycl::handler &h) { h.single_task([=]() { int var = 0; // Perfrom computations updating "var" // ... // Initiate communication from the device MPI_Win_lock(MPI_LOCK_SHARED, peer_rank, 0, win); MPI_Put(&var, 1, MPI_INT, peer_rank, 0, 1, MPI_INT, win); MPI_Win_unlock(peer_rank, win); // Continue computations on GPU ); }).wait();
According to the MPI-4.0 standard, you can call MPI_Win_shared_query for an RMA window constructed using any available window creation method. Using MPI_Win_shared_query, an application may efficiently implement any communication pattern using direct access to the device memory of the peer process.