Host Code Modification
This topic describes some checks and best-known methods you should consider when converting your OpenCL host program to SYCL*.
Device Selection
Your design can target the Intel® FPGA Emulation Platform for OpenCL™ software (FPGA emulator) for functional testing before targeting the FPGA hardware. To target the emulator device, you must make small changes in your queue-creation host code. You can address this by using preprocessor macros. The following table depicts the method for selecting between the FPGA emulator and hardware by using the FPGA_EMULATOR macro:
OpenCL | SYCL |
---|---|
|
|
With the above code in place, when compiling your OpenCL host code or your SYCL single-source file, add the -DFPGA_EMULATOR flag to your compile command to target the emulator. If you want to compile for the FPGA hardware target, add the -Xshardware flag. See FPGA Compilation Flags in the Intel® oneAPI Programming Guide for more information.
Enable Queue Profiling
The following table shows how to enable queue profiling in both OpenCL and SYCL:
OpenCL | SYCL |
---|---|
|
|
Querying the profiling information from queue events is discussed in the Events and Synchronization section.
To enable profiling during design compilation and add profiling counters to the SYCL kernel pipeline, include the -Xsprofile flag in your icpx command. For additional details, see Intel® FPGA Dynamic Profiler for DPC++ section in the FPGA Optimization Guide for Intel® oneAPI Toolkits.
Error Handling
In OpenCL, most runtime API functions return an error code, and you perform the error handling to check whether that API call was successful or not. In SYCL, runtime errors are reported by throwing an exception caught either by an error handler or a try-catch block.
SYCL Example: Device Queue Creation in SYCL with an Error Handler to Catch Exceptions
The code example is combined with the queue profiling code from the previous section.
auto handler = [](exception_list e_list) {
for (auto& e : e_list) {
try {
std::rethrow_exception(e);
} catch (exception& e) {
std::cout << "I have caught an exception!" << std::endl;
std::cout << e.what() << std::endl;
}
}
};
auto prop_list = property_list{property::queue::enable_profiling};
queue device_queue(ext::intel::fpga_selector{}, handler, prop_list);
The following table shows how you would handle errors for submitting a single-task kernel to the device queue in OpenCL and SYCL. In SYCL, you can either construct your device queue with an error handler (as depicted in the previous code snippet) or wrap the command in a try-catch block.
OpenCL | SYCL |
---|---|
|
|
|
For brevity, the table above shows only wait_and_throw() handling errors for submitting a single-task kernel to the queue. However, the code is very similar for other queue operations, such as memory allocations, memory transfer operations, submitting NDRange kernels, and so on.
Events and Synchronization
In both OpenCL and SYCL, synchronization allows your host program to synchronize with the asynchronous operations running on or interacting with the device. The most basic form of synchronization is to wait for all events in the device queue to finish. The following table depicts how to synchronize in OpenCL and SYCL:
OpenCL | SYCL |
---|---|
|
|
|
The wait_and_throw() method throws asynchronous exceptions to the error handler, while the wait() method does not.
In both OpenCL and SYCL, an event represents the status of an operation that the runtime executes. Events allow you to control the scheduling of queue operations explicitly and to query their progress status. The following table demonstrates how to capture the events of a few OpenCL and SYCL operations:
OpenCL | SYCL |
---|---|
|
|
|
|
|
|
Events provide a fine-grain synchronization method rather than waiting on all outstanding queue operations to finish. The following table depicts how to wait on an individual event:
OpenCL | SYCL |
---|---|
|
|
Additionally, you can use events to create dependencies and control the scheduling of operations. For example, the following table depicts how you would enqueue a single-task kernel to start after the some_event event finishes:
OpenCL | SYCL |
---|---|
|
|
Lastly, events are used to access profiling information for the operation they represent. If you enabled queue profiling, you can access profiling information using the event returned when the operation was enqueued. The following table depicts how to access the profiling information of an event in OpenCL and SYCL:
OpenCL | SYCL |
---|---|
|
|
|
|
|
|
SYCL Buffers and Accessors
Like OpenCL buffers, SYCL buffers are shared memory of one, two, or three dimensions that you can use in a kernel. Unlike OpenCL buffers, SYCL buffers must be accessed using SYCL accessors. Using these accessors, the SYCL runtime analyzes the accesses of the buffers and creates a dependency graph of the host and device operations. This allows the runtime to schedule data movement and kernel events automatically.
For example, the following table depicts how to enqueue two kernels, KernelA and KernelB, that operate sequentially on the same buffer, buf. Assume that the device queue is properly set up and the data in buf is already transferred to the device.
OpenCL | SYCL |
---|---|
|
|
In the SYCL code, since both kernels can write to buf (via buf_acc), the runtime implicitly adds a dependency between the kernels, and KernelA runs and completes before KernelB starts. In OpenCL, you must add this dependency manually using the event_list argument of the clEnqueueTask function.
One of the benefits of using SYCL buffers and accessors is that the runtime can automatically schedule both kernel and data movement operations. For example, the following code snippet shows a basic design that copies input data to the device, enqueues a kernel that reads from the input buffer, writes to an output buffer, and finally copies the output data back from the device:
int in_data[N], out_data[N];
{
buffer<int, 1> in_buf(in_data, N);
buffer<int, 1> out_buf(out_data, N);
device_queue.submit([&](handler &h) {
accessor in(in_buf, h, read_only);
accessor out(out_buf, h, write_only, no_init);
h.single_task<Kernel>([=]() { });
)).wait();
// CAUTION: The kernel has finished, but the data has not been copied back to out_data yet!
}
// out_buf is out of scope, so the contents have been copied back to out_data
For more details about buffer properties, accessors, dependency rules, constructors, and destructors, refer to the Buffers section of the Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL book.
SYCL buffers have convenience constructors that accept std::array and std::vector objects as arguments and infer the type and size of the buffer, an example of which is shown in the following code snippet:
std::array<int, N> my_std_array;
std::vector<int> my_std_vector;
{
//expands to:
// buffer<int, 1> my_std_array_buf(my_std_array.data(), my_std_array.size());
buffer my_std_array_buf(my_std_array);
//expands to:
//buffer<int, 1> my_std_vector_buf(my_std_vector.data(), my_std_vector.size());
buffer my_std_vector_buf(my_std_vector);
// …
}