Intel® Trace Analyzer and Collector User and Reference Guide

ID 767272
Date 10/31/2024
Public
Document Table of Contents

Analyzing the Results

For interactive debugging, you should start the application so that stderr is printed to a console window. Then you can follow which errors are found while the application is running and start analyzing them without having to wait for it to complete. If critical errors are found early, you can abort the run, fix the problem and restart. This ensures a much faster code and test cycle than a post-mortem analysis.

The output for each error varies, depending on the error: only the relevant information is printed, thus avoiding the need to manually skip over irrelevant information. In general, Intel® Trace Collector starts with the error name and then continues with a description of the failure.

For each MPI call involved in the error the MPI parameters are dumped. If PC tracing is enabled (see PCTRACE), Intel Trace Collector also provides a backtrace of source code locations for each call. For entities like requests, the involved calls include the places where a request was created or activated. This helps to track down errors where the problem is not at the place where it is detected.

Because multiple processes might print errors concurrently, each line is prefixed with a tag that includes the rank of the process in MPI_COMM_WORLD which reports the problem. MPI applications which use process spawning or attachment are not supported, therefore that rank is unique.

When the application terminates, Intel Trace Collector does further error checks (for example, unfree resources, pending messages).

Notes:
  • If any process is killed without giving it a chance to clean up (that is, by sending it a SIGKILL), this final step is not possible.
  • Sending a SIGINT to mpiexec through kill or pressing CTRL-C will cause Intel MPI Library to abort all processes with such a hard SIGKILL.