Intel® MPI Library Developer Guide for Linux* OS

ID 768728
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Error Message: Bad Termination

NOTE: The values in the tables below may not reflect the exact node or MPI process where a failure can occur.

Case 1

Error Message

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 27494 RUNNING AT node1
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

or:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 27494 RUNNING AT node1
=   KILLED BY SIGNAL: 8 (Floating point exception)
===================================================================================

Cause

One of MPI processes is terminated by a signal (for example, Segmentation fault or Floating point exception) on the node01.

Solution

Find the reason of the MPI process termination. It can be the out-of-memory issue in case of Segmentation fault or division by zero in case of Floating point exception.

Case 2

Error Message

================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 20066 RUNNING AT node01
= KILLED BY SIGNAL: 9 (Killed)
================================================================================

Cause

One of MPI processes is terminated by a signal (for example, SIGTERM or SIGKILL) on the node01 due to:

  • the host reboot;
  • an unexpected signal received;
  • out-of-memory manager (OOM) errors;
  • killing by the process manager (if another process was terminated before the current process);
  • job termination by the Job Scheduler (PBS Pro*,  SLURM*) in case of resources limitation (for example, walltime or cputime limitation).

Solution

  1. Check the system log files.
  2. Try to find the reason of the MPI process termination and fix the issue.