Visible to Intel only — GUID: GUID-D671650F-B81D-428F-BE9E-85578047475F
Visible to Intel only — GUID: GUID-D671650F-B81D-428F-BE9E-85578047475F
Correctness Checking
Intel® Trace Collector provides the correctness checking functionality that addresses the following tasks:
Finding programming mistakes in the application. They include potential portability problems and violations of the MPI standard, which do not immediately cause problems, but might when switching to different hardware or a different MPI implementation. In this case you are recommended to perform correctness checking interactively on a smaller development cluster, but you can also include it in automated regression testing.
Detecting errors in the execution environment. In this case use the hardware and software stack on the system that is to be checked.
While doing correctness checking, you should distinguish between error detection that is done automatically by tools, and error analysis that is done by the user to determine the root cause of an error and eventually fix it.
The error detection in Intel Trace Collector is implemented in the libVTmc library, which performs error detection at runtime. To address both of the above scenarios, Intel Trace Collector supports recording of error reports for later analysis, and interactive debugging at runtime.
The correctness checker prints errors to stderr as soon as they are found. You can perform interactive debugging with the help of a traditional debugger: if the application is already running under debugger control, the debugger can stop a process when an error is found. You should manually set a breakpoint in the function MessageCheckingBreakpoint(). This function and debug information about it are contained in the Intel Trace Collector library. Therefore, you can set the breakpoint and inspect the parameters of the function after a process is stopped. The parameters indicate what error occurred.