Troubleshooting
Template Operators Cannot Be Found
The compiler omits the code of a template class method if that method is not used in the code. This is a C++ issue and may cause inconvenience when you want to invoke the omitted function. This issue is seen in SYCL* because the basic classes (range, id, nd_range, accessor, and others) are templates that have several overloaded operators. Examples:
p index
Output:
$1 = sycl::_V1::id<1> = {32}
p index + 5
Output:
Could not find operator +.
As a solution, you can explicitly instantiate a template class in your source. Then the methods of the template instance are available in the binary. The instantiations can be surrounded with #ifndef NDEBUG and #endif to avoid code bloat in release builds. Example:
#ifndef NDEBUG template class sycl::id<1>; template class sycl::id<2>; template class sycl::id<3>; template class sycl::range<1>; template class sycl::nd_range<1>; #endif // #ifndef NDEBUG
Accessor Operator [] Cannot Be Resolved
Elements of an accessor object cannot be accessed using the multi-dimensional access syntax during expression evaluation. See example below:
print anAccessor[5][3][4]
Example output:
Cannot resolve function operator[] to any overloaded instance
Instead, use an id object:
print workItemId
Example output:
$1 = sycl::_V1::id<3> = {5, 3, 4}
print anAccessor[workItemId]
Example output:
$2 = 1234
GDB appears hanging at synchronization points
Synchronization points (e.g., a semaphore) may be implicitly inserted by the compiler at kernel boundaries. For example, consider the following code:
1 #include <omp.h> 2 3 int main(void) 4 { 5 6 const int N = 8; 7 float sum = 0.f; 8 const float alpha = 2.f; 9 10 #pragma omp parallel for simd reduction(+: sum) 11 for(size_t j = 0; j < N; j++) 12 sum += alpha; 13 14 return 0; 15 }
You can run this code on your supported GPU device:
OMP_TARGET_OFFLOAD=MANDATORY LIBOMPTARGET_DEVICETYPE=GPU LIBOMPTARGET_PLUGIN=LEVEL0 gdb-oneapi -q omp_test
A barrier is implicitly inserted by the compiler between lines 10 and 11. You can see this by inspecting the disassembly:
(gdb) info line 9 Line 9 of "main.cpp" starts at address 0xfffdf000 <main.extracted(void)> and ends at 0xfffdf460 <main.extracted(void)+1120>. (gdb) info line 11 Line 11 of "main.cpp" starts at address 0xfffe4a70 <main.extracted(void)+23152> and ends at 0xfffe4bb0 <main.extracted(void)+23472>. (gdb) disassemble /m 0xfffdf000,0xfffe4bb0 0x00000000fffe0980 <main.extracted(void)+6528>: (W) send.gtwy (1|M0) null r4 null 0x0 0x02000004 {@1,$11} // wr:1+0, rd:0; signal barrier
Inserting a breakpoint at line 11 and stepping into the for-loop via set scheduler-locking step and step may cause the debugger to appear ‘hanging’ as only the current thread is resumed and other threads, which are expected to signal the barrier, are not resumed. To avoid this, make sure to set the scheduler-locking mode to replay or off at synchronization points.
Another way to work around this issue and being able to step a single thread over a synchronization point is using the non-stop mode and inserting a temporary breakpoint at the synchronization point. Temporary breakpoints are only ever hit by one thread (and are deleted afterwards) which, together with ‘non-stop mode’, ensures that only the first thread that hits the breakpoint is stopped whilst other threads may continue to the synchronization point. This enables one to step the thread that hit the temporary breakpoint over the synchronization point.
Kernel Stops Responding
If the kernel that is offloaded to a GPU stops responding:
Check whether there are any stray `gdbserver-ze` processes running in the background:
ps -u $USER | grep gdbserver-ze
Stop background `gdbserver-ze` processes, if there are any:
killall -9 gdbserver-ze
If the breakpoints defined inside the kernel are not hit when running on a GPU, and Virtualization technology for directed I/O (VT-d) is enabled, disable VT-d through the BIOS menu.