Visible to Intel only — GUID: GUID-DAF34499-70D5-4406-A098-B8C2B2C51D92
Visible to Intel only — GUID: GUID-DAF34499-70D5-4406-A098-B8C2B2C51D92
Problem: MPI Application Hangs
Problem
MPI application hangs without any output.
Case 1
Cause
Application does not use MPI in a correct way.
Solution
Run your MPI application with the -check_mpi option to perform correctness checking. The correctness checker is specifically designed to find MPI errors, and provides tight integration with the Intel® MPI Library. In case of a deadlock, the checker will set up a one-minute timeout and show the state of each rank.
For more information, refer to this page.
Case 2
Cause
The remote service (for example, SSH) is not running on all nodes or it is not configured properly.
Solution
Check the state of the remote service on the nodes and connection to all nodes.
Case 3
Cause
The Intel® MPI Library runtime scripts are not available, so the shared space cannot be reached.
Solution
Check if the shared path is available across all the nodes.
Case 4
Cause
Different CPU architectures are used in a single MPI run.
Solution
Set export I_MPI_PLATFORM=<arch> , where <arch> is the oldest platform you have, for example skx. Note that usage of different CPU architectures in a single MPI job negatively affects application performance, so it is recommended not to mix different CPU architecture in a single MPI job.