Visible to Intel only — GUID: GUID-FC9933ED-1D17-49FC-BAC0-C741A5E2362C
Visible to Intel only — GUID: GUID-FC9933ED-1D17-49FC-BAC0-C741A5E2362C
Error Message: Fatal Error
Case 1
Error Message
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: MPIR_Init_thread(653)......: MPID_Init(860).............: MPIDI_NM_mpi_init_hook(698): OFI addrinfo() failed (ofi_init.h:698:MPIDI_NM_mpi_init_hook:No data available)
Cause
The current provider cannot be run on these nodes. The MPI application is run over the psm2 provider on the non-Intel® Omni-Path card or over the verbs provider on the non-InfiniBand*, non-iWARP, or non-RoCE card.
Solution
- Change the provider or run MPI application on the right nodes. Use fi_info to get information about the current provider.
- Check if services are running on nodes (opafm for Intel® Omni-Path and opensmd for InfiniBand).
Case 2
Error Message
Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: … MPIDI_OFI_send_handler(704)............: OFI tagged inject failed (ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)
Cause
OFI transport uses IP interface without access to remote ranks.
Solution
Set FI_SOCKET_IFACE If the socket provider is used or FI_TCP_IFACE and FI_VERBS_IFACE in case of TCP and verbs providers, respectively. To retrieve the list of configured and active IP interfaces, use the ifconfig utility.
Case 3
Error Message
Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack: … MPIDI_OFI_send_handler(704)............: OFI tagged inject failed (ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)
Cause
Ethernet is used as an interconnection network.
Solution
Run FI_PROVIDER = sockets mpirun … to overcome this problem.