Histogram Design Example Walkthrough
Design Overview
For the ease of understanding how to migrate an OpenCL FPGA design to SYCL*, refer to the following links where an OpenCL sample is migrated to SYCL:
The purpose of this design is to demonstrate important differences between OpenCL and SYCL for FPGA targets. The Histogram design implements a simple histogram function. The input data is a one-dimensional array of randomly generated integers with values between 0 and 99. You can choose the number of input values by passing in a command-line argument. The output is a histogram of this data using 10 bins. The histogram is calculated by the kernel function, which is offloaded to run on the FPGA. The resulting histogram is verified against a reference version calculated by the host.
This simple design allows you to observe some similarities and differences between OpenCL and SYCL as listed in the following tables:
Source Code Organization, Compilation, and Execution
OpenCL | SYCL | |
---|---|---|
Organization of the Source Code | OpenCL programs consist of a C++ host program and kernel functions written in C within .cl files. In the Histogram design example, you see them as host.cpp and histogram.cl files. | SYCL programs can be single-sourced. You can write the host and device code in the SYCL language within a single file. In the Histogram design example, this is the main.cpp file. |
Compilation | You must compile histogram.cl and host.cpp files individually using device_fpga and host_fpga targets in the Makefile, respectively. | You can compile a SYCL program to run on the FPGA device using a single command. You can see this icpx command in the fpga target of the Makefile. |
Execution of the Program | You must manually program the FPGA device with the aocx file that the device_fpga generates before running the host executable the host_fpga target generates. You can perform this either with a command-line operation before running the executable or by adding code into the host program. | Running a SYCL program on the FPGA device simply involves running the executable produced by the fpga target of the Makefile. This executable contains the .aocx file and running the executable programs the FPGA device with the .aocx automatically. |
Emulation | Compile for emulation using the -march=emulator flag as shown in the device_emu target of the Makefile. | Compile for emulation using the -DFPGA_EMULATOR flag in the icpx command as shown in the fpga_emu target of the Makefile. |
Optimization Report Generation | Generate the reports using the device_report target, where the -rtl flag stops compilation after generating the report. | Generate the report using the report target of the Makefile, where the -fsycl-link=early flag stops compilation after generating the reports. |
Compilation for an FPGA Hardware Device | Use device_fpga and host_fpga targets in the Makefile respectively to compile for an FPGA hardware device. | Use the fpga target in the Makefile to compile for an FPGA hardware device. Since SYCL programs can be single-sourced, changes to the host code may trigger a full recompilation of the kernel code, including the time-consuming generation of the FPGA bitstream by the Intel® Quartus® Prime software. To avoid expensive and unnecessary recompilation of the kernel code, the fpga target uses the -reuse-exe=main.fpga flag that causes the icpx command to attempt to reuse the existing FPGA bitstream contained in the main.fpga executable if it can determine that the kernel code in the main.cpp has not changed. Alternatively, use the Device Link method described in detail in the Separating Device and Host Code Compilation section of the Intel oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs. |
Host Code
Similar to OpenCL programs, SYCL programs have contexts, platforms, devices, queues, buffers, and kernels, as explained in the Modify Your Design chapter. However, as you can observe by comparing the SYCL program's main.cpp file to the OpenCL program's host.cpp file, choosing a device and launching a kernel is much simpler in the SYCL version of the example design.
OpenCL | SYCL | |
---|---|---|
Selecting the Device | You select FPGA hardware or emulator device by creating an explicit context and a queue to run on the selected device (see lines 84 - 94 of the host.cpp file). | FPGA hardware or emulator device is selected using a device_selector (see lines 34-38 of the main.cpp file). You do not need to create a platform or context explicitly. A queue is created from the device_selector (see line 41), and therefore kernels submitted to that queue run on the selected device. |
Passing Data To and From the Kernel | You create buffers for input and output data using the clCreateBuffer function and provide the size of the buffer, the context, and so on (see line 107 of the host.cpp file). | See lines 43-52 of the main.cpp file for creating buffers for input and output data. |
Accessing Buffers | Each kernel argument must be set to a buffer (or constant) explicitly using the clSetKernelArg function. | The kernel accesses the buffers through accessor objects. See lines 63-64 of the main.cpp file to understand how accessors are created from buffers. The kernel can then use these accessors as if they were pointers, for example, reading from the accessor (in on line 79), and writing to the accessor (bins on line 85). |
Copying Kernel Output Data | You must copy the output data of the kernel back to the host explicitly using the clEnqueueReadBuffer function (see line 130). | Data is copied back to the host array bins_h automatically by the SYCL runtime when the SYCL buffer is destroyed if a host pointer was provided when the buffer was created. |
Explicit Data Movement | Explicit data movement happens when you manually call clEnqueueWriteBuffer and clEnqueueReadBuffer functions. See lines 107-132 of the host.cpp file. | The buffer and accessor approach for passing data to and from the kernel is designed to simplify SYCL programs because the SYCL runtime handles copying the data to and from the device for you. However, to achieve high performance for more complex FPGA designs, Intel recommends that you become familiar with explicit data movement. The SYCL Sample Code With Explicit Data Movement shows a third version of the histogram design that uses explicit data movement. In this case:
|
Error Handling | The runtime APIs each have a return value indicating whether the operation was successful. | Runtime errors are reported by throwing an exception. For example, the buffers are created within a try-catch block, so if buffer creation fails by throwing an exception, the exception is caught, and an error message displays (see line 96 of the main.cpp file). |
Resource Cleanup | You must clean up runtime objects, including cl_context and buffers (see lines 142-143). | You need not explicitly release runtime objects, such as buffers that are statically allocated. You can rely upon the object's destructor to clean up resources when the object goes out of scope. |
Kernel Code
OpenCL | SYCL | |
---|---|---|
Body of the Kernel Function | The kernel source code is in a separate .cl file (see histogram.cl). | A kernel is either a lambda function or a functor. See main.cpp file, which contains the body of the kernel function. On line 61, observe the special syntax of a C++ lambda function.
NOTE:
The lambda capture [=] indicates that all captures are by copy. It is mandatory for kernel lambda functions. |
Pragmas, Attribute, Directives, and Extensions | Most of the pragmas available in OpenCL kernel code have equivalent pragmas or attributes in SYCL, but some syntaxes differ in SYCL. In the example code, the restrict keyword on kernel arguments indicate that the input and output buffers do not overlap (shown on lines 3 and 4 of the histogram.cl file). You can also observe other pragmas in this file, such as #pragma unroll, #pragma ii 1, and __attribute__((register)). | Some pragmas used in the kernel code are the same, but the syntax is slightly different in other cases. For example, to indicate that the input and output buffers do not overlap (or alias), the kernel attribute [[intel::kernel_args_restrict]] is placed on the lambda function (see line 61 of the main.cpp). You can also observe other pragmas directives in this file, such as #pragma unroll, [[intel::initiation_interval(1)]], and [[intel::fpga_register]]. For a detailed list of all flags, pragmas, and attributes, refer to Flags, Attributes, Directives, and Extensions. |