Visible to Intel only — GUID: GUID-E73C9C2B-48CE-4C89-9EC5-AF31A390AE92
Visible to Intel only — GUID: GUID-E73C9C2B-48CE-4C89-9EC5-AF31A390AE92
Debugger Integration
It is necessary to manually set a breakpoint in the function MessageCheckingBreakpoint(). Immediately after reporting an error on stderr this function is called, so the stack backtrace directly leads to the source code location of the MPI call where the error was detected. In addition to the printed error report, you can also look at the parameters of the MessageCheckingBreakpoint() which contain the same information. It is also possible to look at the actual MPI parameters with the debugger because the initial layer of MPI wrappers in libVTmc is always compiled with debug information. This can be useful if the application itself lacks debug information or calls MPI with a complex expression or function call as parameter for which the result is not immediately obvious.
The exact methods to set breakpoints depend on the debugger used. Here is some information how it works with specific debuggers. For additional information or other debuggers please refer to the debugger documentation.
The first two debuggers mentioned below can be started by Intel® MPI Library by adding the -tv and -gdb options to the command line of mpirun. Allinea Distributed Debugging Tool* can be reconfigured to attach to MPI jobs that it starts.
Using debuggers like that and Valgrind* are mutually exclusive because the debuggers would try to debug Valgrind, not the actual application. The Valgrind --db-attach option does not work out-of-the-box either because each process would try to read from the terminal. One solution that is known to work on some systems for analyzing at least Valgrind reports is to start each process in its own X terminal:
$ mpirun -check_mpi -l -n <numprocs> xterm -e bash -c 'valgrind --db-attach=yes --suppressions=$VT_LIB_DIR/impi.supp <app>; echo press return; read'
In that case the Intel® Trace Collector error handling still occurs outside the debugger, so those errors have to be analyzed based on the printed reports.