Debug MPI Programs
There are multiple ways to debug MPI spawned processes. One such method, using gtool, attaches the debugger to the specified processes, while the second method, using xterm, opens a terminal window for each process.
Kernels are serialized when debugging. For a good debug experience, there should be only one rank per device. Distributing Intel GPU devices between MPI ranks is called GPU Pinning. The following settings enable the GPU Pinning feature with level-zero backend:
$ export I_MPI_OFFLOAD=1
$ export I_MPI_OFFLOAD_TOPOLIB=level_zero
Using gtool to launch mpigdb
Using gtool provides a single user interface, making it convenient to debug a large number of parallel processes.
Below we set up a debug session on a host with two GPU devices using the file mpi3_onesided_jacobian_gpu_sycl.cpp, in Distributed Jacobian Solver SYCL/MPI Sample.
The following command directs MPI to spawn two ranks, attach a debugger to both, and provide an interface (mpigdb) to communicate with the debuggers:
$ mpirun -n 2 -gtool "gdb-oneapi:all=attach" \
src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl
Once MPI has spawned the processes and debuggers have attached to them, we see the (mpigdb) prompt:
[0,1] (mpigdb)
At this point, all MPI-spawned processes have reached MPI_Init. The prompt shows that mpigdb is waiting for commands, and will deliver them to processes 0 and 1.
Note that mpigdb distributes commands to all the processes listed at the front of the prompt. We can use the command z <process[es]> to select processes with which to communicate.
For example, we could set different breakpoints in different processes:
[0,1] (mpigdb) z 0
mpigdb: set active processes to 0
[0] (mpigdb) break mpi3_onesided_jacobian_gpu_sycl.cpp:141
[0] Breakpoint 1 at 0x40743c: file mpi3_onesided_jacobian_gpu_sycl.cpp, line 141.
[0] (mpigdb) z 1
mpigdb: set active processes to 1
[1] (mpigdb) break mpi3_onesided_jacobian_gpu_sycl.cpp:142
[1] Breakpoint 1 at 0x407459: file mpi3_onesided_jacobian_gpu_sycl.cpp, line 142.
[1] (mpigdb) z all
[0,1] (mpigdb) continue
[0,1] Continuing.
[1] [Switching to thread 3.404:0 (ZE 0.6.2.3 lane 0)]
[1]
[1] Thread 3.404 hit Breakpoint 1.2, with SIMD lanes [0-15], main::{lambda(auto:1&)#1}::operator() ...
[1] this=0xff00000100931690, index=...)
[1] at mpi3_onesided_jacobian_gpu_sycl.cpp:142
[1] 142 a_out[idx] = 0.25 * (a[idx - 1] + a[idx + 1]
[0] [Switching to thread 2.405:0 (ZE 0.6.2.3 lane 0)]
[0]
[0] Thread 2.405 hit Breakpoint 1.2, with SIMD lanes [0-15], main::{lambda(auto:1&)#1}::operator() ...
[0] this=0xff00000100931690, index=...)
[0] at mpi3_onesided_jacobian_gpu_sycl.cpp:141
[0] 141 idx = XY_2_IDX(column, my_subarray.y_size - 1, my_subarray);
[0,1] (mpigdb)
For a quick overview on how to configure your system for debugging an MPI application, refer to Debug an MPI Application with Intel Distribution for GDB*.
Using xterm to launch several instances
Xterm can be used to display the output from multiple processes launched in separate xterm windows. For low process-count jobs, it can be quite handy to have separate windows for each MPI process.
The following command tells xterm to launch gdb-oneapi for each MPI-spawned process:
$ mpirun -n 2 xterm -e gdb-oneapi \
src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl
This creates two terminal windows, each with a gdb-oneapi debugging a different instance of MPI spawned processes. On each window, the user interacts directly with gdb-oneapi, without mpigdb in the middle. However, this approach lacks a single interface to communicate with all the debugger instances at once.
Multi-host debugging
To run MPI programs on multiple hosts, you must set up password-less (via ssh key) login. It helps to automatically generate and distribute SSH keys for a user. For instructions to set up password-less SSH connection, check here.
The following examples show how to launch processes on two machines. <hostnameN> refers the hostname of <machineN>, as returned by hostname -f.
The program can be executed on multiple hosts using mpirun, as shown below:
$ mpirun -n 2 -ppn 1 --hosts <hostname1>,<hostname2> \
src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl
This spawns a total of two processes with one process per node (-ppn), i.e. both the hosts have one process each. Instead of specifying a comma-separated list of nodes in the --hosts option, the path to a host file listing the cluster nodes can alternatively be used. For more on the list of available options, check this documentation.
You can launch a debug session on multiple hosts using the following command:
$ mpirun -n 2 -ppn 1 --hosts <hostname1>,<hostname2> \
-gtool "gdb-oneapi:all=attach" \
src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl
This attaches a debugger to each of the processes on both nodes. Here, <machine1> gets id 0, while <machine2> gets id 1.
Using a proxy script to launch multi-host debug sessions
To ensure proper environment settings in the remote debugging session, you can use a proxy script:
#!/bin/bash
export ZET_ENABLE_PROGRAM_DEBUGGING=1
export PATH=/opt/intel/oneapi/debugger/latest/gdb/intel64/bin/:$PATH
gdb-oneapi $@
Make sure the script adds gdbserver-ze to the default path. Give the script a name, e.g. custom-gdb-oneapi, and save it in default path in all hosts used in multi-host debugging. Use the following command to show the default path of the remote host:
[0] (mpigdb) show environment PATH
Finally, the script must have execution permission:
$ chmod +x custom-gdb-oneapi
Now you can launch a debug session using the proxy script:
$ mpirun -n 2 -ppn 1 --hosts <hostname1>,<hostname2> \
-gtool "custom-gdb-oneapi:all=attach" \
src/02_jacobian_device_mpi_one-sided_gpu_aware/mpi3_onesided_jacobian_gpu_sycl