Intel® MPI Library

Developer Guide for Linux* OS

ID 768728
Date 10/31/2024
Public
Document Table of Contents

Notified One-Sided Communications

Intel® MPI Library supports a set of extension primitives for one-sided communications. These primitives provide an efficient way to notify a target process of one-sided communication upon data transfer completion. These primitives allow the implementation of some communication patterns that are not possible or inefficient with the current MPI standard.

For example, when the origin process needs to let the target know that data transferred with MPI_Put arrive, it must use a separate synchronization window and an explicit "flush" call to guarantee data transfer completion.

MPI_Put(&buf, count, MPI_INT, peer_rank, displacement, count, MPI_INT, win);
// To guarantee data is available on a target, call MPI_Win_flush
MPI_Win_flush(peer, win); // Within this call, origin process is blocked till data is available on the target process

//Add one to a notification variable on a target
int one = 1;
MPI_MPI_Accumulate(&one, 1, MPI_INT, peer_rank, 0, 1, MPI_INT, MPI_SUM, signal_win);

This approach requires a separate window to expose notification variables to all processes. In the example above, the origin should close the exposure epoch before updating the notification on a target, which may lead to a situation in which the origin is blocked until the moment of data transfer completion.

Using proposed extension primitives, this operation can be effectively fused within a single call, and MPI runtime handles internally an event of data transfer completion and updates notification on a target without the involvement of extra logic on the application side:

MPIX_Put_notify(&buf, count, MPI_INT, peer_rank, displacement, count, MPI_INT, win, notification_index);

Each process may associate an arbitrary amount of notifications to a window. This notification counter is independent and can only be read by the process itself but may be incremented by others via notified communications calls.

Notification Management

int MPIX_Win_create_notify(MPI_Win win, int notification_num);

Create and expose a given amount of notification counters for the window object. Where:

  • win is a valid MPI_win object that can be created separately using any available window creation method.
  • notification_num is a number of notification counters process creates and exposes to others. It should be a non-negative value, but it could be 0. Value is independent for each process and may differ.

This is a collective call; all processes participating in the window should call it. Notification may be created before any notified communication calls are used.

int MPIX_Win_free_notify(MPI_Win win);
Destroy notification counters previously associated with a window. This is a collective call. A new set of notification counters can be associated with windows as soon as the previous one is destroyed. This allows the application to rearrange a number of used notification counters as needed.

Notified Communications

Intel® MPI provides notified variants of put-and-get calls that are almost identical to regular put-and-get operations. However, in addition to regular arguments, these calls also accept an integer notification index to identify a counter that should be updated on a target.

int MPIX_Put_notify( …, int notification_idx, MPI_Win win);
int MPIX_Get_notify( …, int notification_idx, MPI_Win win);
int MPIX_Put_notify_c( …, int notification_idx, MPI_Win win);
int MPIX_Get_notify_c( …, int notification_idx, MPI_Win win);

Notification Counter Access

As soon as notification counters are completely managed internally by MPI runtime, the library provides accessor functions to get and set a counter's value. In this case, the library completely controls the atomicity and consistency of counter values.

int MPIX_Notify_get_value(MPI_Win win, int notification_idx, MPI_Count *value);
int MPIX_Notify_set_value(MPI_Win win, int notification_idx, MPI_Count value);

For example, these primitives could be used by an application to implement a poll loop that waits for a particular value:

MPI_Count notify_value = 0;
// Wait for "expected_value" data transfers to complete
while (notify_value < expected_value) {
    MPIX_Notify_get_value(win, 0, &notify_value);
}
// Reset notification value to 0
MPIX_Notify_set_value(win, 0, (MPI_Count)0);

In addition to accessor functions, an MPI_Request object can be created and associated with a counter. The request is completed when the associated notification counter reaches or exceeds the requested threshold.

int MPIX_Win_get_notify_request(MPI_Win win, int notification_idx, MPI_Count expected_value, MPI_Request *request);

Requests provided by this call can be passed to regular request management operations, like MPI_Wait or MPI_Test.