Multi-device Debugging

Debugging with Intel® Distribution for GDB* on Linux* OS Host

Download PDF

ID 766459

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Multi-device Debugging

Tutorial: Debugging with Intel® Distribution for GDB*

Debugging applications on systems with multiple GPUs and/or subdevices (i.e. tiles) is supported. There are essentially three user scenarios in a multi-device setting.

An application submits workloads to multiple devices.
Multiple applications submit workloads to different (sub)devices.
Multiple applications submit workloads to the same subdevice.

The GPUs available on a system can be listed using the command sycl-ls. The output below shows that the system has two GPU cards, which can be used for offloading using the OpenCL™ and the Intel® oneAPI Level Zero (Level Zero) backend.

$ sycl-ls
[opencl:gpu:0] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x0bd5] 3.0 [22.39.24347.8]
[opencl:gpu:1] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x0bd5] 3.0 [22.39.24347.8]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347]

NOTE:

As of writing this tutorial, only the Intel® oneAPI Level Zero (Level Zero) backend is supported for debug. The SYCL_DEVICE_FILTER <https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md> environment variable can be used for limiting the available backends, if desired.

The example below shows the output of the sycl-ls command with the SYCL_DEVICE_FILTER set to level_zero:

$ export SYCL_DEVICE_FILTER=level_zero
$ sycl-ls
Warning: SYCL_DEVICE_FILTER environment variable is set to level_zero.
To see the correct device id, please unset SYCL_DEVICE_FILTER.

[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Graphics [0x0bd5] 1.3 [1.3.24347]

Scenario 1: An Application Uses Multiple Devices

The debugger supports debugging a program that offloads kernels to multiple GPU (sub)devices. Each sub-device (i.e. a tile) appears in the debugger as a separate inferior. The auto-attach feature initializes the devices for debugging and creates the corresponding inferiors.

A possible output is as follows:

$ gdb-oneapi -q --args ./multi-device
Reading symbols from ./multi-device...
(gdb) break get_transformed
Breakpoint 1 at 0x40431a: file multi-device.cpp, line 27.
(gdb) run
Starting program: /path/to/multi-device
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
intelgt: gdbserver-ze started for process 581849.
[New Thread 0x7fffe4645700 (LWP 581871)]
[Switching to Thread 1.97 lane 0]

Thread 2.97 hit Breakpoint 1, with SIMD lanes [0-15], get_transformed (data=1, device_idx=0) at multi-device.cpp:27
27        return data * 3 + 11 * (device_idx + 1);

We can check the devices’ inferiors using:

info inferiors

The below output presents 4 inferiors, one for each sub-device. The following format is used in device enumeration [<pci-location>].<sub-device-id>.

  Num  Description              Connection                                  Executable
  1    process 581849           1 (native)                                  /path/to/multi-device
* 2    device [3a:00.0].0       2 (remote | gdbserver-ze --attach - 581849)
  3    device [3a:00.0].1       2 (remote | gdbserver-ze --attach - 581849)
  4    device [9a:00.0].0       2 (remote | gdbserver-ze --attach - 581849)
  5    device [9a:00.0].1       2 (remote | gdbserver-ze --attach - 581849)
Type "info devices" to see details of the devices.

We can display further information using the command:

info devices

A possible output would be:

  Location   Sub-device   Vendor Id   Target Id   Cores   Device Name
* [3a:00.0]  0            0x8086      0x0bd5      512     Intel(R) Graphics [0x0bd5]
  [3a:00.0]  1            0x8086      0x0bd5      512     Intel(R) Graphics [0x0bd5]
  [9a:00.0]  0            0x8086      0x0bd5      512     Intel(R) Graphics [0x0bd5]
  [9a:00.0]  1            0x8086      0x0bd5      512     Intel(R) Graphics [0x0bd5]

NOTE:

Switching between the inferiors and threads is the same as explained in the Basic Debugging section.

The GPU (sub)devices available to the application can be limited using the ZE_AFFINITY_MASK environment variable. For example, the same debug session above gives the output below, if run under the environment variable ZE_AFFINITY_MASK=0.0:

(gdb) info devices
  Num  Description              Connection                                  Executable
  1    process 581966           1 (native)                                  /path/to/multi-device
* 2    device [3a:00.0]         2 (remote | gdbserver-ze --attach - 581966)
Type "info devices" to see details of the devices.
(gdb) info devices
  Location   Sub-device   Vendor Id   Target Id   Cores   Device Name
* [3a:00.0]  -            0x8086      0x0bd5      512     Intel(R) Graphics [0x0bd5]

Please see the oneAPI programming documentation for more details about the usage of ZE_AFFINITY_MASK.

Scenario 2: Multiple Applications Use Different (Sub)Devices

Simultaneous debugging of applications, where each application runs under a separate instance of the debugger is supported. For example, the Array Transform application from the Basic Debugging section can be started to utilize the subdevice 0 of GPU 0 as follows:

$ ZE_AFFINITY_MASK=0.0 gdb-oneapi array-transform
...
(gdb) run gpu
...

While this first application is being debugged (e.g. GPU threads hit a breakpoint and their state is under investigation), another process of the same or a different user can freely utilize another subdevice/GPU, e.g. subdevice 1 of GPU 0:

$ ZE_AFFINITY_MASK=0.1 gdb-oneapi array-transform
...
(gdb) run gpu
...

As long as the applications use different subdevices, simultaneous debugging works.

Alternative to the use of ZE_AFFINITY_MASK above, the applications may also select GPUs/subdevices programmatically.

Scenario 3: Multiple Applications Use the Same Subdevice

A restriction to multi-device debugging occurs when different applications utilize the same subdevice. In this case, the workload submitted by one of the applications occupies the subdevice during the debug session until the workload finishes. While the workload is being debugged, no other workload can be scheduled on that same subdevice. Hence, other applications submitting workloads to that subdevice may appear to be waiting indefinitely.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Debugging with Intel® Distribution for GDB* on Linux* OS Host

Multi-device Debugging

Scenario 1: An Application Uses Multiple Devices

Scenario 2: Multiple Applications Use Different (Sub)Devices

Scenario 3: Multiple Applications Use the Same Subdevice