Overview
SYCL* is designed for data-parallel programming and heterogenous computing. It is a high-level language with broad heterogeneous support including GPU, CPU, and FPGA devices. SYCL itself is based on C++ with some restrictions regarding what you can write in the device code. If you have further questions about SYCL after reading this guide, you can consult the SYCL 2020 Specification. See also SYCL* 2020 Specification Features and DPC++ Language Extensions Supported in Intel® oneAPI DPC++/C++ Compiler.
SYCL provides a consistent programming language and interface across CPU, GPU, FPGA, and AI accelerators. You can program and use each architecture either in isolation or together. Learning this language once allows you to program a variety of accelerators. Each accelerator class requires an appropriate formulation and tuning of the algorithms for best performance, but the language and programming model remains consistent, regardless of the target device.
One of the primary motivations for SYCL is to provide a higher-level programming language than OpenCL™ code. If you are familiar with OpenCL™ programs, you can see many similarities and differences from the OpenCL™ code. The following are the two significant differences:
- SYCL* is based on C++
Moving to C++ requires only minor syntax changes. For instance, you might observe familiar pragmas and attributes with a new syntax. SYCL also provides a wide variety of C++ features that are not available in C. See Flags, Attributes, Directives, and Extensions that lists all flags, attributes, directives, and extensions in OpenCL™ and SYCL.
- SYCL programs can be single-sourced
This means that you can contain the host and device code in the same file and compile into a single executable. Recall that OpenCL programs consist of a .cl file containing kernels written in OpenCL and a separate host program.
Highlights of Programming With SYCL
- In SYCL, the FPGA programming image file (aocx) is hidden inside the executable. Running this executable runs the host code, programs the board with the aocx, and runs the kernels. However, you must learn new methods of predicting and controlling which kernels are contained in each aocx, and when programming of the board occurs. See FPGA BSPs and Boards in the Intel oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs.
- Since a SYCL program can be single-sourced, you can place the host and the device kernel code in the same .cpp file. Therefore, any change to that .cpp file triggers a recompile of the entire file, including the kernel code. The compile-time for kernel code on FPGA can be many hours, so you must learn new strategies to avoid recompiling the kernel code if you made changes only in the host code. See Separating Device and Host Code Compilation in the Intel® oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs for more information about these strategies.
- Like with OpenCL™, you can emulate your SYCL program on a CPU to verify its functional correctness quickly. A SYCL program chooses which device to run on using a device_selector. The SYCL program can either choose the Intel® FPGA Emulation Platform for OpenCL™ software (also referred as the emulator or the FPGA emulator) or the FPGA hardware device. For convenience, Intel® recommends using a macro to help switch between emulation and hardware devices. See Device Selectors for FPGA in the Intel® oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs for more information.
- The SYCL runtime is a higher-level abstraction than the OpenCL™ runtime. For example, it can automatically handle data transfer between host and kernels. This can significantly simplify your code. However, for high performance, a more manual approach to data transfer can sometimes be beneficial. See Memory Accesses and its subtopics in the Intel® oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs for more information about how you can create buffers in SYCL and how you can use Unified Shared Memory (USM) allocations for manually controlling data transfers.
- You submit kernels to a queue in OpenCL™, but in SYCL, the queue is out-of-order by default. Recall that in OpenCL™, a queue is in-order by default. If you want in-order queue behavior, create the queue with the sycl::property::queue::in_order property.
- Like OpenCL™, SYCL allows you to have either single-task kernels or NDRange kernels. As in OpenCL™, Intel® recommends using single-task kernels for FPGA targets. See Single Work-item Kernels section in the Intel® oneAPI DPC++/C++ Compiler Handbook for Intel FPGAs and sycl::parallel_for.