Intel® MPI Library Developer Guide for Linux* OS

ID 768728
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Error Message: Bad File Descriptor

Error Message

[mpiexec@node00] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:353): write error (Bad file descriptor)
[mpiexec@node00] cmd_bcast_root (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:147): error sending cwd cmd to proxy
[mpiexec@node00] stdin_cb (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:324): unable to send response downstream
[mpiexec@node00] HYDI_dmx_poll_wait_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:79): callback returned error status
[mpiexec@node00] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:2064): error waiting for event

or:

[mpiexec@host1] wait_proxies_to_terminate (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:389): downstream from host host2 exited with status 255

Cause

The remote hydra_pmi_proxy process is unavailable due to:

  • the host reboot;
  • an unexpected signal received;
  • out-of-memory manager (OOM) errors;
  • job termination by the Job Scheduler (PBS Pro*,  SLURM*) in case of resources limitation (for example, walltime or cputime limitation).

Solution

  1. Check the system log files.
  2. Try to find the reason of the hydra_pmi_proxy process termination and fix the issue.