Multi-device Debugging
Tutorial: Debugging with Intel® Distribution for GDB*
Debugging applications on systems with multiple GPUs and/or subdevices (i.e. tiles) is supported. There are essentially three user scenarios in a multi-device setting.
An application submits workloads to multiple devices.
Multiple applications submit workloads to different (sub)devices.
Multiple applications submit workloads to the same subdevice.
The GPUs available on a system can be listed using the command sycl-ls. The output below shows that the system has two GPU cards, which can be used for offloading using the OpenCL™ and the Intel® oneAPI Level Zero (Level Zero) backend.
$ sycl-ls [opencl:gpu:0] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x0bd5] 3.0 [22.39.24347.8] [opencl:gpu:1] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x0bd5] 3.0 [22.39.24347.8] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347] [ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347]
The example below shows the output of the sycl-ls command with the SYCL_DEVICE_FILTER set to level_zero:
$ export SYCL_DEVICE_FILTER=level_zero $ sycl-ls Warning: SYCL_DEVICE_FILTER environment variable is set to level_zero. To see the correct device id, please unset SYCL_DEVICE_FILTER. [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347] [ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347]
Scenario 1: An Application Uses Multiple Devices
The debugger supports debugging a program that offloads kernels to multiple GPU (sub)devices. Each sub-device (i.e. a tile) appears in the debugger as a separate inferior. The auto-attach feature initializes the devices for debugging and creates the corresponding inferiors.
A possible output is as follows:
$ gdb-oneapi -q --args ./multi-device Reading symbols from ./multi-device... (gdb) break get_transformed Breakpoint 1 at 0x40431a: file multi-device.cpp, line 27. (gdb) run Starting program: /path/to/multi-device [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". intelgt: gdbserver-ze started for process 581849. [New Thread 0x7fffe4645700 (LWP 581871)] [Switching to Thread 1.97 lane 0] Thread 2.97 hit Breakpoint 1, with SIMD lanes [0-15], get_transformed (data=1, device_idx=0) at multi-device.cpp:27 27 return data * 3 + 11 * (device_idx + 1);
We can check the devices’ inferiors using:
info inferiors
The below output presents 4 inferiors, one for each sub-device. The following format is used in device enumeration [<pci-location>].<sub-device-id>.
Num Description Connection Executable 1 process 581849 1 (native) /path/to/multi-device * 2 device [3a:00.0].0 2 (remote | gdbserver-ze --attach - 581849) 3 device [3a:00.0].1 2 (remote | gdbserver-ze --attach - 581849) 4 device [9a:00.0].0 2 (remote | gdbserver-ze --attach - 581849) 5 device [9a:00.0].1 2 (remote | gdbserver-ze --attach - 581849) Type "info devices" to see details of the devices.
We can display further information using the command:
info devices
A possible output would be:
Location Sub-device Vendor Id Target Id Cores Device Name * [3a:00.0] 0 0x8086 0x0bd5 512 Intel(R) Graphics [0x0bd5] [3a:00.0] 1 0x8086 0x0bd5 512 Intel(R) Graphics [0x0bd5] [9a:00.0] 0 0x8086 0x0bd5 512 Intel(R) Graphics [0x0bd5] [9a:00.0] 1 0x8086 0x0bd5 512 Intel(R) Graphics [0x0bd5]
The GPU (sub)devices available to the application can be limited using the ZE_AFFINITY_MASK environment variable. For example, the same debug session above gives the output below, if run under the environment variable ZE_AFFINITY_MASK=0.0:
(gdb) info devices Num Description Connection Executable 1 process 581966 1 (native) /path/to/multi-device * 2 device [3a:00.0] 2 (remote | gdbserver-ze --attach - 581966) Type "info devices" to see details of the devices. (gdb) info devices Location Sub-device Vendor Id Target Id Cores Device Name * [3a:00.0] - 0x8086 0x0bd5 512 Intel(R) Graphics [0x0bd5]
Please see the oneAPI programming documentation for more details about the usage of ZE_AFFINITY_MASK.
Scenario 2: Multiple Applications Use Different (Sub)Devices
Simultaneous debugging of applications, where each application runs under a separate instance of the debugger is supported. For example, the Array Transform application from the Basic Debugging section can be started to utilize the subdevice 0 of GPU 0 as follows:
$ ZE_AFFINITY_MASK=0.0 gdb-oneapi array-transform ... (gdb) run gpu ...
While this first application is being debugged (e.g. GPU threads hit a breakpoint and their state is under investigation), another process of the same or a different user can freely utilize another subdevice/GPU, e.g. subdevice 1 of GPU 0:
$ ZE_AFFINITY_MASK=0.1 gdb-oneapi array-transform ... (gdb) run gpu ...
As long as the applications use different subdevices, simultaneous debugging works.
Alternative to the use of ZE_AFFINITY_MASK above, the applications may also select GPUs/subdevices programmatically.
Scenario 3: Multiple Applications Use the Same Subdevice
A restriction to multi-device debugging occurs when different applications utilize the same subdevice. In this case, the workload submitted by one of the applications occupies the subdevice during the debug session until the workload finishes. While the workload is being debugged, no other workload can be scheduled on that same subdevice. Hence, other applications submitting workloads to that subdevice may appear to be waiting indefinitely.