Download PDF
Instrumenting an Example with a Deadlock
To experiment with the deadlock example, copy the contents of the <install-dir>/itac/examples/checking/global/deadlock/hard/ directory to your working directory:
$ cp -r <install-dir>/itac_latest/examples/checking/global/deadlock/hard/ ~ $ cd ~/hard
Compile and run the example with the following commands:
$ mpiicc -g MPI_Recv.c -o MPI_Recv $ mpirun -check_mpi -genv VT_CHECK_TRACING on -genv VT_DEADLOCK_TIMEOUT 20s -genv VT_DEADLOCK_WARNING 25s -genv VT_PCTRACE on -n 2 MPI_Recv
The command lines above use the following flags:
- -g – generate the debugging information in the object file to be able to analyze the source files
- -check_mpi – dynamically link the correctness checker library (VTmc.so)
- -genv VT_CHECK_TRACING on – enable writing of the trace file .stf for analyzing in Intel® Trace Analyzer (trace file is not written by default with VTmc.so)
- -genv VT_DEADLOCK_TIMEOUT 20s, -genv VT_DEADLOCK_WARNING 25s – see this section for details
- -genv VT_PCTRACE on – enable recording of source code locations to the trace file
The resulting output should look as follows:
... [0] ERROR: no progress observed in any process for over 0:20 minutes, aborting application [0] WARNING: starting emergency trace file writing [0] ERROR: GLOBAL:DEADLOCK:HARD: fatal error [0] ERROR: Application aborted because no progress was observed for over 0:20 minutes, [0] ERROR: check for real deadlock (cycle of processes waiting for data) or [0] ERROR: potential deadlock (processes sending data to each other and getting blocked [0] ERROR: because the MPI might wait for the corresponding receive). [0] ERROR: [0] no progress observed for over 0:20 minutes, process is currently in MPI call: [0] ERROR: MPI_Recv(*buf=0x7fff447cc494, count=1, datatype=MPI_CHAR, source=1, tag=100, comm=MPI_COMM_WORLD, *status=0x7fff447cc450) [0] ERROR: main (/checking/global/deadlock/hard/MPI_Recv.c:53) [0] ERROR: [1] no progress observed for over 0:20 minutes, process is currently in MPI call: [0] ERROR: MPI_Recv(*buf=0x7fffaf31b9a4, count=1, datatype=MPI_CHAR, source=0, tag=100, comm=MPI_COMM_WORLD, *status=0x7fffaf31b960) [0] ERROR: main (/checking/global/deadlock/hard/MPI_Recv.c:53) [0] INFO: Writing tracefile MPI_Recv.stf in /checking/global/deadlock/hard [0] INFO: GLOBAL:DEADLOCK:HARD: found 1 time (1 error + 0 warnings), 0 reports were suppressed [0] INFO: Found 1 problem (1 error + 0 warnings), 0 reports were suppressed. ...
You can observe that the correctness checker reported a deadlock error that needs to be fixed. To dig deeper into the reported problem, analyze the generated .stf file in Intel® Trace Analyzer.