Debug a SYCL* Application on a CPU
Use the simple SYCL application named Array Transform to perform basic debugging operations, such as break, run, print, continue, info, and next. The application being debugged is limited to running on multiple CPU threads by setting the ONEAPI_DEVICE_SELECTOR=*:cpu environment variable.
The debug array transform application used in this example can be found in the Intel oneAPI sample repo or by way of the oneapi-cli sample browser tool. After you have installed and initialized the Intel oneAPI Base Toolkit (sourced setvars.sh), run oneapi-cli --help in your terminal command line. The sample includes a build script to create an application that can be debugged and run on either a CPU or a GPU (the compiler debug flags are set during the build).
Before you proceed, make sure you have completed all the necessary setup steps and successfully completed the debug session in the Get Started Guide.
Basic Debugging
If you have not already done so, start the debugger:
gdb-oneapi array-transform
Make sure that the kernel is offloaded to the correct device:
set env ONEAPI_DEVICE_SELECTOR=*:cpu
run
Example output:
[SYCL] Using device: [Intel® Core™ i7-9750H CPU @ 2.60GHz] from [Intel® OpenCL]
success; result is correct.
Consider the Array Transform sample, which contains a simple kernel function that can be offloaded to different devices:
52 h.parallel_for(data_range, [=](id<1> index) {
53 size_t id0 = GetDim(index, 0);
54 int element = in[index]; // breakpoint-here
55 int result = element + 50;
56 if (id0 % 2 == 0) {
57 result = result + 50; // then-branch
58 } else {
59 result = -1; // else-branch
60 }
61 out[index] = result;
62 });
The code processes elements of the input array depending on whether they are even or odd, and produces an output array.
Define a breakpoint at line 54:
break 54
Expected output:
Breakpoint 1 at 0x405800: file /path/to/array-transform.cpp, line 54.
Run the program:
run
When the thread hits the breakpoint, you should see the following output:
Starting program: <path_to_array-transform>
Registering SYCL extensions for gdb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[...]
[New Thread 0x7ffff37dc700 (LWP 21540)]
[New Thread 0x7fffdba79700 (LWP 21605)]
[New Thread 0x7fffdb678700 (LWP 21606)]
[New Thread 0x7fffdb277700 (LWP 21607)]
[SYCL] Using device: [... Intel(R) CPU ...] from [Intel(R) OpenCL]
Thread 1 "array-transform" hit Breakpoint 1.2, main::$_1::operator()[...]
at array-transform.cpp:54
54 int element = in[index]; // breakpoint-here
Now you can issue the usual Intel Distribution for GDB commands to inspect the local variables, print a stack trace, and get information on threads. For your convenience, common Intel Distribution for GDB commands are provided in the Cheat Sheet.
Keep debugging and display the value of the index variable:
print index
Expected output:
$1 = sycl::id = 0
Continue program execution:
continue
You should see the next breakpoint hit event, which comes from another thread.
Continuing.
[Switching to thread 46 (Thread 0x7fff567fc640 (LWP 1148133))]
Thread 3 "array-transform" hit Breakpoint 1.2, main::$_1::operator()[...]
at array-transform.cpp:54
54 int element = in[index]; // breakpoint-here
If you print the value of the index variable now:
print index
The output differs from the previous one:
$2 = sycl::id = 12
To print data elements, use the bracket operator of the accessor:
print in[index]
Expected output:
$3 = 112
You can also print the accessor contents:
print in
Expected output:
$4 = sycl::accessor read range 64 = {100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120,
121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148,
149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162,
163}
Note, that the output is pretty-printed, as the compiler provides a pretty-printer for the sycl::accessor class. You can also print the accessor without pretty-printers using the following command:
print /r in
Expected output:
$5 = {[...], {MData = 0x7fffffffd450}}
To examine the data managed by the accessor in yourself, use
x /4dw in.MData
where the x command examines the memory contents at the given address and /4dw specifies that the examination output must contain four items in decimal format, word-length each.
Expected output:
0x7fffffffd450: 100 101 102 103
Single stepping
A common debugging activity is single-stepping in the source. The step and next commands allow you to step through source lines, stepping into or over function calls.
To check the current thread data, run the following command:
thread
You should get the following output:
[Current thread is 3 (Thread 0x7fffdba79700 (LWP 21605))]
To check the data of a particular thread, run:
info thread 3
Example output:
Id Target Id Frame
* 3 Thread [...] main::$_1::operator()[...] at array-transform.cpp:54
To make Thread 3 move forward by one source line, run:
next
You should see the following output:
[Switching to thread 5 (Thread 0x7fffdb277700 (LWP 21607))]
Thread 5 "array-transform" hit Breakpoint 1.2, main::$_1::operator()[...]
at array-transform.cpp:54
54 int element = in[index]; // breakpoint-here
Stepping has not occurred. Instead, a breakpoint event from Thread 5 is received and the debugger switched the context to that thread. This happens because you are debugging a multi-threaded program and multiple events may be received from different threads. This is the default behavior, but you can configure it for more efficient debugging. To ensure that the current thread executes a single line without interference, configure the scheduler-locking setting. By default scheduler-locking is “off” for continue/step commands, which means that all threads are resumed when executing one of those commands. For more information refer to documention in the Intel® Distribution for GDB* User Manual (PDF).
Now configure set scheduler-locking step on to keep the other threads stopped while the current thread is stepping:
set scheduler-locking step on
Continue executing the next command:
next
You should see the following output:
55 int result = element + 50;
Continue executing the next command:
next
You should see the following output:
56 if (id0 % 2 == 0) {
To see the value of index variable, run:
print index
You should see the following output:
$6 = sycl::id = 21
Run:
print in[index]
The expected output is shown below:
$7 = 121
Finally, run:
print result
You should see the following output:
$8 = 171