Visible to Intel only — GUID: GUID-533809F7-3F7D-44FA-8533-A6D1AB5D1C77
Visible to Intel only — GUID: GUID-533809F7-3F7D-44FA-8533-A6D1AB5D1C77
Using the SYCL* Exception Handler
As explained in the book Data Parallel C++ Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL:
The C++ exception features are designed to cleanly separate the point in a program where an error is detected from the point where it may be handled, and this concept fits very well with both synchronous and asynchronous errors in SYCL.
Using the methods from this book, C++ exceptions can help terminate a program when an error is encountered instead of allowing the program to silently fail.
Note: the italicized text in this section is copied directly from Chapter 5 “Error Handling” in the book Data Parallel C++ Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL. In some places, text has been removed for brevity. See the book for full details.
Ignoring Error Handling
C++ and SYCL are designed to tell us that something went wrong even when we don’t handle errors explicitly. The default result of unhandled synchronous or asynchronous errors is abnormal program termination which an operating system should tell us about. The following two examples mimic the behavior that will occur if we do not handle a synchronous and an asynchronous error, respectively.
The figure below shows the result of an unhandled C++ exception, which could be an unhandled SYCL synchronous error, for example. We can use this code to test what a particular operating system will report in such a case.
Figure: Unhandled exception in C++
#include <iostream> class something_went_wrong {}; int main() { std::cout << "Hello\n"; throw(something_went_wrong{}); } Example output in Linux: Hello terminate called after throwing an instance of 'something_went_wrong' Aborted (core dumped)
The next figure shows example output from std::terminate being called, which will be the result of an unhandled SYCL asynchronous error in our application. We can use this code to test what a particular operating system will report in such a case.
Although we probably should handle errors in our programs, since uncaught errors will be caught and the program terminated, we do not need to worry about a program silently failing!
Figure: std::terminateis called when a SYCL asynchronous exception isn’t handled
#include <iostream> int main() { std::cout << "Hello\n"; std::terminate(); } Example output in Linux: Hello terminate called without an active exception Aborted (core dumped)
The book details reasons for why synchronous errors can be handled by the C++ exceptions, but to handle asynchronous errors at controlled points in an application, SYCL exceptions must be used.
Synchronous errors defined by SYCL are a derived class from std::exception of type ``sycl::exception``, which allows us to catch the SYCL errors specifically though a try-catch structure such as what we see in the figure below.
Figure.Pattern to catchsycl::exceptionspecifically
try{ // Do some SYCL work } catch (sycl::exception &e) { // Do something to output or handle the exception std::cout << "Caught sync SYCL exception: " << e.what() << "\n"; return 1; }
On top of the C++ error handling mechanisms, SYCL adds a * ``sycl::exception`` *type for the exceptions thrown by the runtime. Everything else is standard C++ exception handling, so will be familiar to most developers. A slightly more complete example is provided in the figure below, where additional classes of exception are handled, as well as the program being ended by returning from main(). On top of the C++ error handling mechanisms, SYCL adds a * ``sycl::exception`` *type for the exceptions thrown by the runtime. Everything else is standard C++ exception handling, so will be familiar to most developers. A slightly more complete example is provided in the figure below, where additional classes of exception are handled, as well as the program being ended by returning from main().
Figure. Pattern to catch exceptions from a block of code
try{ buffer<int> B{ range{16} }; // ERROR: Create sub-buffer larger than size of parent buffer // An exception is thrown from within the buffer constructor buffer<int> B2(B, id{8}, range{16}); } catch (sycl::exception &e) { // Do something to output or handle the exception std::cout << "Caught sync SYCL exception: " << e.what() << "\n"; return 1; } catch (std::exception &e) { std::cout << "Caught std exception: " << e.what() << "\n"; return 2; } catch (...) { std::cout << "Caught unknown exception\n"; return 3; } return 0; Example output: Caught sync SYCL exception: Requested sub-buffer size exceeds the size of the parent buffer -30 (CL_INVALID_VALUE)
Asynchronous Error Handling
Asynchronous errors are detected by the SYCL runtime (or an underlying backend), and the errors occur independently of execution of commands in the host program. The errors are stored in lists internal to the SYCL runtime and only released for processing at specific points that the programmer can control. There are two topics that we need to discuss to cover handling of asynchronous errors
1. The asynchronous handlerthat is invoked when there are outstanding asynchronous errors to process
2. Whenthe asynchronous handler is invoked The Asynchronous Handle
The asynchronous handler is a function that the application defines, which is registered with SYCL contexts and/or queues. At the times defined by the next section, if there are any unprocessed asynchronous exceptions that are available to be handled, then the asynchronous handler is invoked by the SYCL runtime and passed a list of these exceptions. The asynchronous handler is passed to a context or queue constructor as astd::functionand can be defined in ways such as a regular function, lambda, or functor, depending on our preference. The handler must accept asycl::exception_listargument, such as in the example handler shown in the figure below
Figure. Example asynchronous handler implementation defined as a lambda
// Our simple asynchronous handler function auto handle_async_error = [](exception_list elist) { for (auto &e : elist) { try{ std::rethrow_exception(e); } catch ( sycl::exception& e ) { std::cout << "ASYNC EXCEPTION!!\n"; std::cout << e.what() << "\n"; } } };
In the figure above, thestd::rethrow_exceptionfollowed by catch of a specific exception type provides filtering of the type of exception, in this case to the onlysycl::exception. We can also use alternative filtering approaches in C++ or just choose to handle all exceptions regardless of the type The handler is associated with a queue or context (low-level detail covered more inChapter 6) at construction time. For example, to register the handler defined in the figure above with a queue that we are creating, we could writequeue my_queue{ gpu_selector{}, handle_async_error }Likewise, to register the handler defined in the figure above with a context that we are creating, we could writecontext my_context{ handle_async_error }Most applications do not need contexts to be explicitly created or managed (they are created behind the scenes for us automatically), so if an asynchronous handler is going to be used, most developers should associate such handlers with queues that are being constructed for specific devices (and not explicit contexts).
NOTE: In defining asynchronous handlers, most developers should define them on queues unless already explicitly managing contexts for other reasons.
If an asynchronous handler is not defined for a queue or the queue’s parent context and an asynchronous error occurs on that queue (or in the context) that must be processed, then the default asynchronous handler is invoked. The default handler operates as if it was coded as shown in the figure below.
Figure. Example of how the default asynchronous handler behaves
// Our simple asynchronous handler function auto handle_async_error = [](exception_list elist) { for (auto &e : elist) { try{ std::rethrow_exception(e); } catch ( sycl::exception& e ) { // Print information about the asynchronous exception } } // Terminate abnormally to make clear to user // that something unhandled happened std::terminate(); };
The default handler should display some information to the user on any errors in the exception list and then will terminate the application abnormally, which should also cause the operating system to report that termination was abnormal.
What we put within an asynchronous handler is up to us. It can range from logging of an error to application termination to recovery of the error condition so that an application can continue executing normally.
The common case is to report any details of the error available by callingsycl::exception::what(), followed by termination of the application. Although it’s up to us to decide what an asynchronous handler does internally, a common mistake is to print an error message (that may be missed in the noise of other messages from the program), followed by completion of the handler function. Unless we have error management principles in place that allow us to recover known program state and to be confident that it’s safe to continue execution, we should consider terminating the application within our asynchronous handler function(s).
This reduces the chance that incorrect results will appear from a program where an error was detected, but where the application was inadvertently allowed to continue with execution regardless. In many programs, abnormal termination is the preferred result once we have experienced asynchronous exceptions.
Example: Zero Sized Object
The source code below shows how the SYCL handler will produce an error when a zero-sized object is passed.
#include <cstdio> #include <CL/sycl.hpp> template <bool non_empty> static void fill(sycl::buffer<int> buf, sycl::queue & q) { q.submit([&](sycl::handler & h) { auto acc = sycl::accessor { buf, h, sycl::read_write }; h.single_task([=]() { if constexpr(non_empty) { acc[0] = 1; } } ); } ); q.wait(); } int main(int argc, char *argv[]) { sycl::queue q; sycl::buffer<int, 1> buf_zero ( 0 ); fprintf(stderr, "buf_zero.count() = %zu\n", buf_zero.get_count()); fill<false>(buf_zero, q); fprintf(stdout, "PASS\n"); return 0; }
When the application encounters the zero-sized object at runtime, the program aborts and produces an error message:
$ dpcpp zero.cpp $ ./a.out buf_zero.count() = 0 submit... terminate called after throwing an instance of 'cl::sycl::invalid_object_error' what(): SYCL buffer size is zero. To create a device accessor, SYCL buffer size must be greater than zero. -30 (CL_INVALID_VALUE) Aborted (core dumped)
The programmer can then locate the programming error by catching the exception in the debugger and looking at the backtrace for the source line that triggered the error.
Example: Illegal Null Pointer
Consider code that does the following:
deviceQueue.memset(mdlReal, 0, mdlXYZ \* sizeof(XFLOAT)); deviceQueue.memcpy(mdlImag, 0, mdlXYZ \* sizeof(XFLOAT)); // coding error
The compiler will not flag the bad (null pointer) value specified in deviceQueue.memcpy. This error will not be caught until runtime.
terminate called after throwing an instance of 'cl::sycl::runtime_error' what(): NULL pointer argument in memory copy operation. -30 (CL_INVALID_VALUE) Aborted (core dumped)
The example code that follows shows a way the user can control the format of the exception output when it is detected at runtime on a given queue, implemented in a standalone program that demonstrates the null pointer error.
#include "stdlib.h" #include "stdio.h" #include <cmath> #include <signal.h> #include <fstream> #include <iostream> #include <vector> #include <CL/sycl.hpp> #define XFLOAT float #define mdlXYZ 1000 #define MEM_ALIGN 64 int main(int argc, char *argv[]) { XFLOAT *mdlReal, *mdlImag; cl::sycl::property_list propList = cl::sycl::property_list{cl::sycl::property::queue::enable_profiling()}; cl::sycl::queue deviceQueue(cl::sycl::gpu_selector { }, [&](cl::sycl::exception_list eL) { bool error = false; for (auto e : eL) { try { std::rethrow_exception(e); } catch (const cl::sycl::exception& e) { auto clError = e.get_cl_code(); bool hascontext = e.has_context(); std::cout << e.what() << "CL ERROR CODE : " << clError << std::endl; error = true; if (hascontext) { std::cout << "We got a context with this exception" << std::endl; } } } if (error) { throw std::runtime_error("SYCL errors detected"); } }, propList); mdlReal = sycl::malloc_device<XFLOAT>(mdlXYZ, deviceQueue); mdlImag = sycl::malloc_device<XFLOAT>(mdlXYZ, deviceQueue); deviceQueue.memset(mdlReal, 0, mdlXYZ * sizeof(XFLOAT)); deviceQueue.memcpy(mdlImag, 0, mdlXYZ * sizeof(XFLOAT)); // coding error deviceQueue.wait(); exit(0); }
Resources
For a a guided approach to debugging SYCL exceptions from incorrect use of the SYCL* API, see the Guided Matrix Multiplication Exception Sample.
To troubleshoot your applications that use OpenMP* or the SYCL* API with extensions to offload resources, see Troubleshoot Highly Parallel Applications.