Developer Guide Windows

ID 768730
Date 10/31/2024
Public
Document Table of Contents

Error Message: Fatal Error

Case 1

Error Message

Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: 
MPIR_Init_thread(653)......: 
MPID_Init(860).............: 
MPIDI_NM_mpi_init_hook(698): OFI addrinfo() failed
(ofi_init.h:698:MPIDI_NM_mpi_init_hook:No data available)

Cause

The current provider cannot be run on these nodes. The MPI application is run over the psm2 provider on the non-Intel® Omni-Path card or over the verbs provider on the non-InfiniBand*, non-iWARP, or non-RoCE card.

Solution

  1. Change the provider or run MPI application on the right nodes. Use fi_info to get information about the current provider.
  2. Check if services are running on nodes (opafm for Intel® Omni-Path and opensmd for InfiniBand).

Case 2

Error Message

Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: 
Other MPI error, error stack:
…
MPIDI_OFI_send_handler(704)............: OFI tagged inject failed 
(ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)

Cause

OFI transport uses IP interface without access to remote ranks.

Solution

Set FI_SOCKET_IFACE If the socket provider is used or FI_TCP_IFACE and FI_VERBS_IFACE in case of TCP and verbs providers, respectively. To retrieve the list of configured and active IP interfaces, use the ifconfig utility.

Case 3

Error Message

Abort(6337423) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: 
Other MPI error, error stack:
…
MPIDI_OFI_send_handler(704)............: OFI tagged inject failed 
(ofi_impl.h:704:MPIDI_OFI_send_handler:Transport endpoint is not connected)

Cause

Ethernet is used as an interconnection network.

Solution

Run FI_PROVIDER = sockets mpirun … to overcome this problem.