Visible to Intel only — GUID: GUID-3813478F-38AE-4944-A277-62014A071961
Visible to Intel only — GUID: GUID-3813478F-38AE-4944-A277-62014A071961
Code Change Guide
The example in this section shows you one of the ways to change a legacy program to effectively use the advantages of the MPI_THREAD_SPLIT threading model.
In the original code (thread_split.cpp), the functions work_portion_1(), work_portion_2(), and work_portion_3() represent a CPU load that modifies the content of the memory pointed to by the in and out pointers. In this particular example, these functions perform correctness checking of the MPI_Allreduce() function.
Changes Required to Use the OpenMP* Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread() with the argument equal to MPI_THREAD_MULTIPLE must be called instead of MPI_Init().
- According to the MPI_THREAD_SPLIT model, in each thread you must execute MPI operations over the communicator specific to this thread only. So, in this example, the MPI_COMM_WORLD communicator must be duplicated several times so that each thread has its own copy of MPI_COMM_WORLD.
NOTE: The limitation is that communicators must be used in such a way that the thread with thread_id n on one node communicates only with the thread with thread_id m on the other. Communications between different threads (thread_id n on one node, thread_id m on the other) are not supported.
- The data to transfer must be split so that each thread handles its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the OpenMP level must be combined.
- Check that the runtime sets up a reasonable affinity for OpenMP threads. Typically, the OpenMP runtime does this out of the box, but sometimes, setting up the OMP_PLACES=cores environment variable might be necessary for optimal multi-threaded MPI performance.
Changes Required to Use the POSIX Threading Model
- To run MPI functions in a multithreaded environment, MPI_Init_thread() with the argument equal to MPI_THREAD_MULTIPLE must be called instead of MPI_Init().
- You must execute MPI collective operation over a specific communicator in each thread. So the duplication of MPI_COMM_WORLD should be made, creating a specific communicator for each thread.
- The info key thread_id must be properly set for each of the duplicated communicators.
NOTE: The limitation is that communicators must be used in such a way that the thread with thread_idn on one node communicates only with the thread with thread_idm on the other. Communications between different threads (thread_idn on one node, thread_idm on the other) are not supported.
- The data to transfer must be split so that each thread handles its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the POSIX level must be combined.
- The affinity of POSIX threads can be set up explicitly to reach optimal multithreaded MPI performance.