Visible to Intel only — GUID: GUID-5A7586DE-560A-4C20-A14A-EEAA51A5D7FD
Visible to Intel only — GUID: GUID-5A7586DE-560A-4C20-A14A-EEAA51A5D7FD
Error Message: Bad Termination
NOTE: The values in the tables below may not reflect the exact node or MPI process where a failure can occur.
Case 1
Error Message
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 27494 RUNNING AT node1 = KILLED BY SIGNAL: 11 (Segmentation fault) ===================================================================================
or:
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 27494 RUNNING AT node1 = KILLED BY SIGNAL: 8 (Floating point exception) ===================================================================================
Cause
One of MPI processes is terminated by a signal (for example, Segmentation fault or Floating point exception) on the node01.
Solution
Find the reason of the MPI process termination. It can be the out-of-memory issue in case of Segmentation fault or division by zero in case of Floating point exception.
Case 2
Error Message
================================================================================ = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 1 PID 20066 RUNNING AT node01 = KILLED BY SIGNAL: 9 (Killed) ================================================================================
Cause
One of MPI processes is terminated by a signal (for example, SIGTERM or SIGKILL) on the node01 due to:
- the host reboot;
- an unexpected signal received;
- out-of-memory manager (OOM) errors;
- killing by the process manager (if another process was terminated before the current process);
- job termination by the Job Scheduler (PBS Pro*, SLURM*) in case of resources limitation (for example, walltime or cputime limitation).
Solution
- Check the system log files.
- Try to find the reason of the MPI process termination and fix the issue.