Debug a SYCL* Application on a GPU

Debug with Intel® Distribution for GDB* on Linux* OS Host

Download PDF

ID 766459

Date 3/31/2025

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-BF0E3A4E-4A2F-4DF3-A9C9-E939D2281007

View Details

Debug a SYCL* Application on a GPU

Use a simple SYCL application named Array Transform application to perform basic debugging operations, such as break, run, print, continue, info, disassemble, and next. This example describes how to interact with SIMD lanes, as additional thread elements. The application being debugged is instructed to run on a GPU by setting the ONEAPI_DEVICE_SELECTOR=level_zero:gpu environment variable.

The debug array transform application used in this example can be found in the Intel oneAPI sample repo or by way of the oneapi-cli sample browser tool. After you have installed and initialized the Intel oneAPI Base Toolkit (sourced setvars.sh), run oneapi-cli --help in your terminal command line. The sample includes a build script to create an application that can be debugged and run on either a CPU or a GPU (the compiler debug flags are set during the build).

Before you proceed, make sure you have completed all necessary setup steps described in the Get Started Guide.

Basic Debugging

NOTE:

For your convenience, all common Intel Distribution for GDB commands used in examples below are provided in the Cheat Sheet.

If you have not already done so, start the debugger.


gdb-oneapi array-transform

NOTE:

The default choice for offloading is a Level Zero GPU device. This example requires that you have a GPU device. If you do not have a GPU device, refer to the CPU device example.

You must set the following environment variables to ensure that the kernel is offloaded to the correct device and that GPU debugging is enabled:


set env ONEAPI_DEVICE_SELECTOR=level_zero:gpu
set env ZET_ENABLE_PROGRAM_DEBUGGING=1
run

Example output:


[SYCL] Using device: [Intel(R) Data Center GPU Max 1550] from
[Intel(R) Level-Zero]
success; result is correct.

NOTE:

If you encounter an “auto-loading has been declined” error message, see <https://sourceware.org/gdb/current/onlinedocs/gdb.html/Auto_002dloading-safe-path.html> for help.

Exit gdb-oneapi by typing: quit

Consider the array-transform.cpp example again:


52        h.parallel_for(data_range, [=](id<1> index) {
53            size_t id0 = GetDim(index, 0);
54            int element = in[index]; // breakpoint-here
55            int result = element + 50;
56            if (id0 % 2 == 0) {
57                result = result + 50; // then-branch
58            } else {
59                result = -1; // else-branch
60            }
61            out[index] = result;
62        });

The code processes elements of the input array depending on whether they are even or odd and produces an output array.

Start gdb-oneapi again and set the required environment variables:


gdb-oneapi array-transform
set env ONEAPI_DEVICE_SELECTOR=level_zero:gpu
set env ZET_ENABLE_PROGRAM_DEBUGGING=1

Set two breakpoints inside the kernel (one for each conditional branch) as follows:


break 57

Expected output:


Breakpoint 1 at 0x40583c: file /path/to/array-transform.cpp, line 57.


break 59

Expected output:


Breakpoint 2 at 0x40584a: file /path/to/array-transform.cpp, line 59.

NOTE:

Do not expect your output to exactly match that provided in the example. The output may vary due to the nature of parallelism and different machine properties. We use an ellipsis […] to denote output omitted for brevity.

To start the program, execute:

run

You should see the following output:


Starting program: /path/to/array-transform
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
intelgt: gdbserver-ze started for process 8194.
[New Thread 0x7fffed706700 (LWP 8213)]
[SYCL] Using device: [Intel(R) Data Center GPU Flex Series 140 [0x56c1]] from [Intel(R) Level-Zero]
[Switching to Thread 1.129 lane 1]

Thread 2.129 hit Breakpoint 2, with SIMD lanes [1 3 5 7], main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:59
59                result = -1;  // else-branch
(gdb)

The debugger has a mechanism called Auto-Attach that spawns an instance of gdbserver-ze to listen to and control the GPU for debug. In the example above, the auto-attach mechanism is triggered and the gdbserver-ze is added to the debugger as an inferior. An inferior in GDB represents the unit under debug. In our case, the host application process and the GPU device each correspond to an inferior.

Check the presence of gdbserver-ze as follows:


info inferiors

Expected output:


(gdb) info inferiors
  Num  Description                   Connection                                Executable
  1    process 8194                  1 (native)                                /path/to/array-transform
* 2    device [0000:37:00.0]         2 (extended-remote | gdbserver-ze --multi --once -)
Type "info devices" to see details of the devices.

Execute the info devices command to see the further details of the device.


info devices

Expected output:


  Num   Location        Sub-device   Vendor Id   Target Id   Cores   Device Name
* 1     [0000:37:00.0]  -            0x8086      0x56c1      128     Intel(R) Data Center GPU Flex Series 140 [0x56c1]

The location of a GPU device is extracted from the PCI properties and is encoded as follows:


<domain>:<bus>:<device>.<function>

NOTE:

The auto-attach feature sets schedule-multiple to on, which allows all threads of all inferiors to be resumed during the same session. For example, when you run the continue command, all inferiors will continue.

The breakpoint event is received from Inferior 2, which represents the GPU. The thread ID 2.129:1 points to the thread 129 of the inferior 2 and indicates that the first active SIMD lane is now in focus.

The breakpoint at line 59 is hit first. The order of branch execution is defined by the Intel® Graphics Compiler.

NOTE:

The behavior of the debugger may vary if the compiled code is optimized. For the best debugging experience, the compiler flags -g -O0 are recommended.

Check which SIMD lanes are currently active with the following command. The -stopped flag filters out GPU threads that are currently unavailable (e.g. not utilized by the program). We recommend using it to obtain a more concise output. We also recommend using the with print frame-arguments none -- prefix to reduce the overhead of the command, which can be noticeably large because of having to fetch the state of a large number of GPU threads.


with print frame-arguments none -- info threads -stopped

In the example, thread 2.129 has 4 active SIMD lanes: 1, 3, 5, and 7. The asterisk ‘*’ marks the current SIMD lane. See the expected output below.

NOTE:

SIMD lane enumeration starts from 0. (The GPU used in this example had eight SIMD lanes. If your GPU has a different number of lanes, your output will be different.)


(gdb) with print frame-arguments none -- info threads -stopped
  Id              Target Id                                          Frame
  1.1             Thread 0x7ffff598fb80 (LWP 8194) "array-transform" [...]
  1.2             Thread 0x7fffed706700 (LWP 8213) "array-transform" [...]
* 2.129:[*1 3 5 7] ZE 0.0.0.0                                        [...] at array-transform.cpp:59
  2.137:[1 3 5 7] ZE 0.0.1.0                                         [...] at array-transform.cpp:59
  2.145:[1 3 5 7] ZE 0.0.4.0                                         [...] at array-transform.cpp:59
  2.153:[1 3 5 7] ZE 0.0.5.0                                         [...] at array-transform.cpp:59
  ...

To switch the focus to a different SIMD lane, use the thread <thread_ID> command. Thread ID is specified by a triple: inferior.thread:lane. See examples of working with particular lanes:


thread 2.129:3

Example output:


[Switching to thread 2.129:3 (Thread 1.129 lane 3)]
#0  main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:59
59                result = -1;  // else-branch


print element

Example output:


$1 = 111

thread 2.129:5

Example output:


[Switching to thread 2.129:5 (Thread 1.129 lane 5)]
#0  main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:59
59                result = -1;  // else-branch


print element

Example output:


$2 = 113

NOTE:

To filter threads for a specific device or sub-device, do the following:

Obtain the corresponding inferior number via info inferiors command:


(gdb) info inferiors
     Num  Description              Connection                                 Executable
     1    process 25855            1 (native)                                 [...]
   * 2    device [0000:3a:00.0].0       2 (extended-remote | gdbserver-ze --multi --once -)
     3    device [0000:3a:00.0].1       2 (extended-remote | gdbserver-ze --multi --once -)
     Type "info devices" to see details of the devices.

Run the info threads command and supply the obtained inferior numbers, followed by a star-wildcard thread range .*:


(gdb) with print frame-arguments none -- info threads 2.* 3.*
     Id           Target Id         Frame
     2.1: [1 *3 5 7 9 11 13 15]    ZE 0.0.0.1        [...] at array-transform.cpp:53
     2.2: [1 *3 5 7 9 11 13 15]    ZE 0.0.0.2        [...] at array-transform.cpp:53
     ...

NOTE:

In the thread ID, the inferior number can be skipped. In this case, the current inferior ID is used. The thread number can also be skipped in case of switching to a lane in the current thread. Thus, the command below can be used to switch to the desired SIMD lane:


thread :7

Expected output:


[Switching to thread 2.129:7 (ZE 0.0.0.0 lane 7)]
#0  main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:59
59                result = -1;  // else-branch

As you are now inside the kernel running on the GPU, you can look into the assembly code and GPU registers, for example, to understand the cause of unexpected application behavior. Get the GPU assembly code to inspect generated instructions by executing the following command:


disassemble

See an example output below:


Dump of assembler code for function _ZZZ4mainENKUlRT_E_clIN4sycl3_V17handlerEEEDaS0_ENKUlNS4_2idILi1EEEE_clES7_:
   0xffff8000ffe87200 <+0>:     (W)     shr (1|M16)              a0.2<1>:ud    r126.7<0;1,0>:ud  0x4:ud              {F@1}
   0xffff8000ffe87210 <+16>:    (W)     add (1|M16)              r126.0<1>:ud  r125.2<0;1,0>:ud  0x0:ud
   0xffff8000ffe87220 <+32>:    (W)     send.ugm (1|M16)         null     r126    r125:1  a0.2        0x4200C504           {ExBSO,A@1,$0} // wr:1+1, rd:0; store.ugm.d32x8t.a32.ss[a0.2]
   0xffff8000ffe87230 <+48>:    (W)     mov (1|M16)              r125.3<1>:ud  r125.2<0;1,0>:ud                 {$0.src}
   0xffff8000ffe87240 <+64>:    (W)     add (1|M16)              r125.2<1>:ud  r125.2<0;1,0>:ud  0x180:ud
   0xffff8000ffe87250 <+80>:    (W)     add (1|M16)              r126.0<1>:ud  r125.3<0;1,0>:ud  0x40:ud              {I@2}
   0xffff8000ffe87260 <+96>:    (W)     send.ugm (1|M16)         null     r126    r60:4   a0.2        0x4200E504           {ExBSO,A@1,$1} // wr:1+4, rd:0; store.ugm.d32x32t.a32.ss[a0.2]
   0xffff8000ffe87270 <+112>:   (W)     add (1|M16)              r126.0<1>:ud  r125.3<0;1,0>:ud  0xC0:ud              {$1.src}
   0xffff8000ffe87280 <+128>:   (W)     send.ugm (1|M16)         null     r126    r64:4   a0.2        0x4200E504           {ExBSO,A@1,$2} // wr:1+4, rd:0; store.ugm.d32x32t.a32.ss[a0.2]

To learn more about GEN assembly and registers, refer to the “Introduction to GEN assembly” article.

To display a list of GPU registers, run the following command:


info registers

You can use registers to see the state of the application or inspect arithmetic instructions, such as which operands are used and where the result is located.

Additionally, you can inspect the channel execution mask ($ce register), which shows active lanes. To print the result in binary format, use the /t format flag as follows:


print/t $ce

Example output:


$3 = 10101010

Recall that you have stopped at line 59, the else-branch of the condition that checks evenness of the work-item index. Hence, every other SIMD lane is inactive, as indicated by the $ce bit pattern.

To move forward and stop at the then-branch, set the scheduler-locking mode to step and execute the next command. The set scheduler-locking step command keeps the other threads stopped while the current thread is stepping:


set scheduler-locking step


next

You should see the following output:


[Switching to SIMD lane 0]

Thread 2.129 hit Breakpoint 1, with SIMD lanes [0 2 4 6], main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:57
57                result = result + 50;  // then-branch

Due to the breakpoint event, the SIMD lane focus switches to the first active lane in the then-branch, which is SIMD lane 0. Other threads of inferior 2 stayed at the line 59:


with print frame-arguments none -- info threads -stopped

Example output:


  Id              Target Id                                          Frame
  1.1             Thread 0x7ffff598fb80 (LWP 8194) "array-transform" [...]
  1.2             Thread 0x7fffed706700 (LWP 8213) "array-transform" [...]
* 2.129:[*0 2 4 6] ZE 0.0.0.0                                        [...] at array-transform.cpp:57
  2.137:[1 3 5 7] ZE 0.0.1.0                                         [...] at array-transform.cpp:59
  2.145:[1 3 5 7] ZE 0.0.4.0                                         [...] at array-transform.cpp:59
  2.153:[1 3 5 7] ZE 0.0.5.0                                         [...] at array-transform.cpp:59
...

Since the thread is vectorized, you can also inspect the vector of a local variable:


x /8dw &result

Example output:


0xffffd556ab1627e0:     158     -1      160     -1
0xffffd556ab1627f0:     162     -1      164     -1

SIMD Lanes

To investigate the program state from the point of view of SIMD lanes without switching, use the thread apply command. You can specify a SIMD lane as a number:


thread apply 2.129:2 print element

Example output:


Thread 2.129:2 (ZE 0.0.0.0 lane 2):
$5 = 110

You can also specify a SIMD lane as a range. In this case, only active SIMD lanes from the range are considered:


thread apply 2.129:2-5 print element

Example output:


Thread 2.129:2 (ZE 0.0.0.0 lane 2):
$11 = 110
warning: SIMD lane 3 is inactive in thread 2.129

Thread 2.129:4 (ZE 0.0.0.0 lane 4):
$12 = 112
warning: SIMD lane 5 is inactive in thread 2.129

To denote all active SIMD lanes, use the wildcard:


thread apply 2.129:* print element

Example output:


Thread 2.129:0 (ZE 0.0.0.0 lane 0):
$13 = 108

Thread 2.129:2 (ZE 0.0.0.0 lane 2):
$14 = 110

Thread 2.129:4 (ZE 0.0.0.0 lane 4):
$15 = 112

Thread 2.129:6 (ZE 0.0.0.0 lane 6):
$16 = 114

To apply the command to all active SIMD lanes of all threads, use the all-lanes parameter:


thread apply all-lanes print element

Example output:


Thread 2.217:7 (ZE 0.0.0.0 lane 7):
$17 = 155

Thread 2.217:5 (ZE 0.0.0.0 lane 5):
$18 = 153


[...]
Thread 2.129:2 (ZE 0.0.1.0 lane 2):
$47 = 110

Thread 2.129:0 (ZE 0.0.1.0 lane 0):
$48 = 108

Thread 1.2 (Thread 0x7fffed706700 (LWP 8213) "array-transform"):
No symbol "element" in current context.

You can mix SIMD lane ranges with thread ranges and the thread wildcard. For example, to apply the command to all active lanes of all threads of inferior 2, you can use any of the following commands:


thread apply 2.127-129:*


thread apply 2.*:*

If the current inferior is 2, the inferior number can be skipped:


thread apply 127-129:*


thread apply *:*

If you need a formatted output for a set of threads, thread apply might be used together with the printf command, as in the following examples.

A more compact output in comparison to thread apply *:* print element:


(gdb) thread apply *:* -q printf "%d.%d:%d element=%d\n",$_inferior,$_thread,$_simd_lane,element
2.129:0 element=108
2.129:1 element=109
2.129:2 element=110
2.129:3 element=111
2.129:4 element=112
2.129:5 element=113
2.129:6 element=114
2.129:7 element=115
2.137:0 element=124
2.137:1 element=125
2.137:2 element=126
2.137:3 element=127
2.137:4 element=128
2.137:5 element=129
2.137:6 element=130
2.137:7 element=131
2.145:0 element=140
<...>

In the above command, -q flag is used to suspend the standard thread information, usually printed by thread apply. To print the thread context in a compact way, three convenience variables were used:

$_inferior to get the inferior number;
$_thread to get the thread number within the inferior;
$_simd_lane to get the SIMD lane.

To get a more hierarchical view, you can combine thread apply *
(which applies a command to all threads of the current inferior) with
the command thread apply :* <printf>. The latter applies the

<printf> command to every active SIMD lane of a thread, selected by thread apply *. The result might look as follows:


(gdb) thread apply * -s thread apply :* -q printf "dim0{%d}=%d \n",$_simd_lane,id0

Thread 2.129:0 (ZE 0.0.0.0 lane 0):
dim0{0}=8
dim0{1}=9
dim0{2}=10
dim0{3}=11
dim0{4}=12
dim0{5}=13
dim0{6}=14
dim0{7}=15

Thread 2.137:0 (ZE 0.0.1.0 lane 0):
dim0{0}=24
dim0{1}=25
dim0{2}=26
dim0{3}=27
dim0{4}=28
dim0{5}=29
dim0{6}=30
dim0{7}=31

Thread 2.145:0 (ZE 0.0.4.0 lane 0):
dim0{0}=40
<...>

Thread Filtering

To investigate a range of threads for a particular variable value or a location, use the thread filter command. In the context of SIMD lanes you can also specify a SIMD lane as a number in the thread range selection. All the information given in the previous SIMD Lanes section regarding the selection of the range of threads is also applicable for the thread filter command as well. Few use cases are given below regarding this.

To print only the selected active lane of each thread in the current inferior from an input list of threads, use:


thread filter 129 137
Filtered Threads:
$1 = "2.129:0 2.137:0"

To print all active lanes of each thread in the current inferior from an input list of threads, use:


thread filter 129:* 137:*
Filtered Threads:
$2 = "2.129:0 2.129:1 2.129:2 2.129:3 2.129:4 2.129:5 2.129:6 2.129:7
2.137:0 2.137:1 2.137:2 2.137:3 2.137:4 2.137:5 2.137:6 2.137:7"

To filter all threads with id0==8 only using the active selected lane of each thread from an input list of threads, use:


thread filter 129 137 id0==8
Filtered Threads:
$3 = "2.129:0"

To filter all threads with id0<24 from all active lanes of an input list of threads, use:


thread filter 129:* 137:* id0<24
Filtered Threads:
$4 = "2.129:0 2.129:1 2.129:2 2.129:3 2.129:4 2.129:5 2.129:6 2.129:7"

Similarly to filter from an input list of threads with the value of x coordinate of the Work-Item Global Id is less than 20, use:


thread filter 129 137 145 $_workitem_global_id[0]<20
Filtered Threads:
$5 = "2.129:0"

The thread filter command ends the execution in case of an error while evaluating an expression after printing the thread information where error is seen. So the -c flag can be used to print the error and continue processing the other threads or the -s option to process all threads without printing any warning or error message.

To filter from all active lanes, the all-lanes option can be used. The -c option is used here to ignore errors due to uninitialized convenience variable $_workitem_global_id in the host’s threads.


thread filter all-lanes -c $_workitem_global_id[0]<20

Thread 1.3 (Thread 0x7fffea976640 (LWP 29487) "array-transform"):
cannot subscript something of type `void'

Thread 1.1 (Thread 0x7ffff4d9bc40 (LWP 29459) "array-transform"):
cannot subscript something of type `void'
Filtered Threads:
$6 = "2.201:3 2.201:2 2.201:1 2.201:0 2.193:7 2.193:6 2.193:5 2.193:4
2.193:3 2.193:2 2.193:1 2.193:0 2.129:7 2.129:6 2.129:5 2.129:4 2.129:3
2.129:2 2.129:1 2.129:0"

To filter using a location from all active lanes of a thread, -location option can be used as well:


thread filter all-lanes -c -location array-transform.cpp:54 $_workitem_global_id[0]<20

Thread 1.3 (Thread 0x7fffea976640 (LWP 29487) "array-transform"):
cannot subscript something of type `void'

Thread 1.1 (Thread 0x7ffff4d9bc40 (LWP 29459) "array-transform"):
cannot subscript something of type `void'
Filtered Threads:
$7 = "2.201:3 2.201:2 2.201:1 2.201:0 2.193:7 2.193:6 2.193:5 2.193:4
2.193:3 2.193:2 2.193:1 2.193:0 2.129:7 2.129:6 2.129:5 2.129:4 2.129:3
2.129:2 2.129:1 2.129:0"

To filter from the default selected lane of all threads, the all option can be used:


thread filter all -c $_workitem_global_id[0]<20

Thread 1.3 (Thread 0x7fffea976640 (LWP 29487) "array-transform"):
cannot subscript something of type `void'

Thread 1.1 (Thread 0x7ffff4d9bc40 (LWP 29459) "array-transform"):
cannot subscript something of type `void'
Filtered Threads:
$8 = "2.201:0 2.193:0 2.129:0"

To filter using a location from the default selected lane of all threads, -location option can be used as well:


thread filter all -c -location array-transform.cpp:54 $_workitem_global_id[0]<20

Thread 1.3 (Thread 0x7fffea976640 (LWP 29487) "array-transform"):
cannot subscript something of type `void'

Thread 1.1 (Thread 0x7ffff4d9bc40 (LWP 29459) "array-transform"):
cannot subscript something of type `void'
Filtered Threads:
$9 = "2.201:0 2.193:0 2.129:0"

Work-Item Coordinates

The GPGPU execution model defines a work-item as parallel executions of a kernel function.

Use the convenience variables $_thread_workgroup, $_workitem_local_id, and $_workitem_global_id to get the coordinates of the work-item processed by the current context, defined by the current thread and its current lane.


(gdb) print $_thread_workgroup
$1 = {<x>: 0, <y>: 0, <z>: 0}
(gdb) print $_workitem_local_id
$2 = {<x>: 56, <y>: 0, <z>: 0}
(gdb) print $_workitem_global_id
$3 = {<x>: 56, <y>: 0, <z>: 0}

Please note that the above convenience variables show work-item coordinates using X-Y-Z notation, as per execution model of the device, while SYCL execution model defines coordinates in notation of dimensions 1-2-3.

SYCL defines linearization such that the rightmost index moves fastest. To achieve that linearization on Intel GPUs, the SYCL runtime transposes the iteration space and maps the rightmost dimension to x, the second rightmost dimension (if it exists) to y, and the third rightmost dimension (if it exists) to z.

NOTE:

The coordinates are available only for work-items that are currently being processed. If a work-item has not yet been started or was already finished, we cannot find a thread which has processed it.

Find a Specific Work-item

Using the convenience variables you can find a thread and its lane, which works on a specific work-item.

The first option to find the work-item is to define a conditional breakpoint. However, for a program with many threads, it could take time, till the breakpoint is hit. In the following example, we set the conditional breakpoint for the work-item with the global ID {37,0,0}:

NOTE:

Restart the application before proceeding. Be sure to set the environment variables again.


(gdb) break 54 if $_workitem_global_id=={37,0,0}
Breakpoint 1 at 0x407093: file /home/gta/sources/oneAPI-samples/Tools/ApplicationDebugger/array-transform/src/array-transform.cpp, line 54.
(gdb) run
Starting program: /home/gta/sources/oneAPI-samples/Tools/ApplicationDebugger/array-transform/build/array-transform
[...]
[SYCL] Using device: [Intel(R) Data Center GPU Flex 140] from [Intel(R) Level-Zero]
[Switching to Thread 1.209 lane 5]

Thread 2.209 hit Breakpoint 1.2, with SIMD lane 5, main::{lambda(auto:1&)#1}::operator()<sycl::_V1::handler>(sycl::_V1::handler&) const::{lambda(sycl::_V1::id<1>)#1}::operator()(sycl::_V1::id<1>) const (this=0xffffd556ab1e5810, index=...) at array-transform.cpp:54
54              int element = in[index];  // breakpoint-here
(gdb) print $_workitem_global_id
$1 = {<x>: 37, <y>: 0, <z>: 0}
(gdb)

The second option is to use the thread apply command and store the found thread ID and lane number into the convenience variables $thr and $lane. The $found variable shows whether the search was successful. In the following example, we search for a work-item with the global ID {47,0,0}, and then switch to the found thread and lane:


(gdb) thread apply *:* -q -s set $found=($_workitem_global_id == {47,0,0}) ? ($thr=$_thread) && ($lane = $_simd_lane) : $found
(gdb) print $found
$191 = true
(gdb) thread $thr:$lane
[Switching to thread 2.145:7 (ZE 0.0.5.0 lane 7)]
#0  main::[...]
54              int element = in[index];  // breakpoint-here
(gdb) print $_workitem_global_id
$192 = {<x>: 47, <y>: 0, <z>: 0}

The third option is to use the thread filter command with the expression option to get the thread id information for a work-item with the global ID {47,0,0}, and then switch to the found thread and lane:


(gdb) thread filter *:* $_workitem_global_id == {47,0,0}
$193 = "2.145:7"
(gdb) thread 2.145:7
[Switching to thread 2.145:7 (ZE 0.0.5.0 lane 7)]
#0  main::[...]
54              int element = in[index];  // breakpoint-here
(gdb) print $_workitem_global_id
$194 = {<x>: 47, <y>: 0, <z>: 0}

Filter Threads by a Work-group

By combining thread apply and eval, we can filter threads by a specific expression. In the following, we filter by $_thread_workgroup=={0,0,0}.

First, we construct a convenience variable $ids that holds a stringified list of qualified ids (<inferior num>.<thread num>), which belong to the work-group:


(gdb) set $ids=""
(gdb) thread apply * -s -q eval "set $ids=($_thread_workgroup == {0,0,0}) ? \"%s %d.%d\" : \"%s\"",$ids,$_inferior,$_thread, $ids
(gdb) print $ids
$2 = " 2.129 2.137 2.145 2.153 2.193 2.201 2.209 2.217"

Note that the variable $ids must be initialized with an empty string first. The eval GDB command is used here to append the list of already found ids to the newly found one, or leave it without change, if the condition does not hold.

Now the convenience variable $ids contains the list of filtered thread ids.

The second option to get a list of filtered thread ids using the thread filter command.


(gdb) thread filter *:* $_thread_workgroup=={0,0,0}
$3 = "2.129:0 2.129:1 2.129:2 2.129:3 2.129:4 2.129:5 2.129:6 2.129:7 2.137:0
2.137:1 2.137:2 2.137:3 2.137:4 2.137:5 2.137:6 2.137:7 2.145:0 2.145:1 2.145:2
2.145:3 2.145:4 2.145:5 2.145:6 2.145:7 2.153:0 2.153:1 2.153:2 2.153:3 2.153:4
2.153:5 2.153:6 2.153:7 2.193:0 2.193:1 2.193:2 2.193:3 2.193:4 2.193:5 2.193:6
2.193:7 2.201:0 2.201:1 2.201:2 2.201:3 2.201:4 2.201:5 2.201:6 2.201:7 2.209:0
2.209:1 2.209:2 2.209:3 2.209:4 2.209:5 2.209:6 2.209:7 2.217:0 2.217:1 2.217:2
2.217:3 2.217:4 2.217:5 2.217:6 2.217:7"

The history variable $3 contains the list of filtered thread ids and instead of $ids in the follow up example of the info threads command below, $3 can also be used instead.

To call info threads for these ids, we need to use eval again, since the info threads command cannot take a list of threads stored in a convenience variable:


(gdb) eval "info threads %s", $ids
  Id          Target Id         Frame
* 2.129:[1-7] ZE 0.0.0.0        main::[...] at array-transform.cpp:54
  2.137:[0-7] ZE 0.0.1.0      main::[...] at array-transform.cpp:54
  2.145:[0-7] ZE 0.0.4.0      main::[...] at array-transform.cpp:54
  2.153:[0-7] ZE 0.0.5.0      main::[...] at array-transform.cpp:54
  ...

Breakpoint Actions

You can define a set of actions for a breakpoint to be executed when the breakpoint is hit. By default, the actions are executed in the context of the SIMD lane selected after the hit.

Quit the current debugging session and start a new one:


quit
gdb-oneapi array-transform
set env ONEAPI_DEVICE_SELECTOR=level_zero:gpu
set env ZET_ENABLE_PROGRAM_DEBUGGING=1

Define two temporary breakpoints with actions for the if and else branches:
1. Set a temporary breakpoint:
```
tbreak 59
```
  Example output:
```
Temporary breakpoint 1 at 0x40584a: file /path/to/array-transform.cpp, line 59.
```
2. Define an action:
```
commands
```
  When you are asked to type commands, enter the following:
```
print element
end
```
  When you are done with each command, finish with the end keyword.
3. Set another temporary breakpoint:
```
tbreak 57
```
  Example output:
```
Temporary breakpoint 2 at 0x40583c: file /path/to/array-transform.cpp, line 57.
```
4. Define an action to be executed for all SIMD lines by adding the /a modifier:
```
commands /a
```
  When you are asked to type commands, enter the following:
```
print element
end
```

Start the program:

run

Example output:


[...]
Thread 2.129 hit Temporary breakpoint 1, with SIMD lanes [1 3 5 7], main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:59
59                result = -1;  // else-branch
$1 = 109

Continue to hit both breakpoints:
```
continue
```
Example output:
```
Continuing.
Thread 2.129 hit Temporary breakpoint 2, with SIMD lanes [0 2 4 6], main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:57
57                result = result + 50;  // then-branch
$2 = 108
$3 = 110
$4 = 112
$5 = 114
```
The action for the breakpoint at the else branch was executed for a single SIMD lane 1, while the action at the then branch was executed for all active SIMD lanes.

NOTE:
For conditional breakpoints, the actions are executed only for SIMD lanes that meet the condition.

Conditional Breakpoints

Quit the debugging session and start the program from the beginning:


quit

gdb-oneapi array-transform
ONEAPI_DEVICE_SELECTOR=level_zero:gpu
set env ZET_ENABLE_PROGRAM_DEBUGGING=1

This time set a breakpoint at line 57 with the condition element==106:


break 57 if element == 106

Example output:


Breakpoint 1 at 0x40583c: file /path/to/array-transform.cpp, line 57.

Run the program (execute the run command) and check if the output looks as follows:


Starting program: <path_to_array-transform>

[...]

[Switching to Thread 1.193 lane 6]

Thread 2.193 hit Breakpoint 1, with SIMD lane 6, main::{lambda(auto:1&)#1}::operator()[...] at array-transform.cpp:57
57                result = result + 50;  // then-branch
(gdb)

The condition is true for the lane 6 in thread 2.193.

NOTE:

A breakpoint condition is evaluated only for active SIMD lanes, meaning that (gdb) break 57 if element == 107 does not cause a stop, since element == 107 is true for the lane 7 in thread 2.193, which is inactive at line 57.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Debug with Intel® Distribution for GDB* on Linux* OS Host

Debug a SYCL* Application on a GPU

Basic Debugging

SIMD Lanes

Work-Item Coordinates

Breakpoint Actions