Visible to Intel only — GUID: GUID-ACF09814-1AC3-4779-A88C-B062D740AF81
Error Message: Bad Termination
Error Message: No such file or Directory
Error Message: Permission Denied
Error Message: Fatal Error
Error Message: Bad File Descriptor
Error Message: Too Many Open Files
Problem: High Memory Consumption Readings
Problem: MPI Application Hangs
Problem: Password Required
Problem: Cannot Execute Binary File
Problem: MPI limitation for Docker*
Visible to Intel only — GUID: GUID-ACF09814-1AC3-4779-A88C-B062D740AF81
Error Message: Bad File Descriptor
Error Message
[mpiexec@node00] HYD_sock_write (../../../../../src/pm/i_hydra/libhydra/sock/hydra_sock_intel.c:353): write error (Bad file descriptor) [mpiexec@node00] cmd_bcast_root (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:147): error sending cwd cmd to proxy [mpiexec@node00] stdin_cb (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:324): unable to send response downstream [mpiexec@node00] HYDI_dmx_poll_wait_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:79): callback returned error status [mpiexec@node00] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:2064): error waiting for event
or:
[mpiexec@host1] wait_proxies_to_terminate (../../../../../src/pm/i_hydra/mpiexec/intel/i_mpiexec.c:389): downstream from host host2 exited with status 255
Cause
The remote hydra_pmi_proxy process is unavailable due to:
- the host reboot;
- an unexpected signal received;
- out-of-memory manager (OOM) errors;
- job termination by the Job Scheduler (PBS Pro*, SLURM*) in case of resources limitation (for example, walltime or cputime limitation).
Solution
- Check the system log files.
- Try to find the reason of the hydra_pmi_proxy process termination and fix the issue.
Parent topic: Troubleshooting