This document provides a summary of new and changed product features and includes notes about features and problems not described in the product documentation.
Where to Find the Release
Please follow the steps to download the toolkit from the Web Configurator, and follow the installation instructions to install.
2021.4.0 Release
- New OpenMP 5.0/5.1 support including #pragma omp prefetch support
- Performance improvements for Icelake
New Features in DPC++
- Added support for SYCL 2020 Features: Specialization constants, sub_group algorithms, USM features, interoperability API and others. Complete list of supported SYCL 2020 features in this release can be found here.
- Implemented support for Device UUID from Intel's Extensions for Device Information
- Numerous debuggability improvements for host and CPU device debugging
- Numerous ExplicitSIMD improvements and features.
-
A new compiler driver dpcpp-cl was added in Windows to support windows style syntax. Users wanting linux style command line syntax should continue to use dpcpp driver.
- Added support for reporting of the following FIFO types in the System Viewer FPGA optimization report:
- FIFO created by feedback nodes. The FIFO depth and width are reported on the details panel of feedback nodes.
- FIFO created for capacity balancing in stallable regions. The FIFO depth and width are reported on the details panel of new capacity FIFO nodes.
- Added support for the -Xsrounding=<rounding_type> flag that helps modify the rounding mode of floating-point operations.
- Added support for Questa*-Intel® FPGA Edition and Questa*-Intel® FPGA Starter Edition simulators.
- Added support for FPGA simulation flow.
- Added support for the [[intel::use_stall_enable_clusters]] kernel attribute that enables the compiler to reduce kernel area and latency.
Bug Fixes
- Fixed an FPGA issue where the compiler ignored the ivdep attribute when it contained a pointer declared outside the scope where the attribute was applied.
- Fixed an FPGA issue where the compiler would error out when very long kernel names were declared.
- Fixed an FPGA issue where RHEL, CentOS, and SLES installations would encounter an error if the lsb_release package was not installed.
Known Issues and Limitations
- SYCL 2020 specialization constant feature support in compiler has the following limitations
- Building a program, which uses specialization constants for both JIT and AOT targets at the same time could result in an exception thrown with the following message: Native API failed. Native API returns: -49 (CL_INVALID_ARG_INDEX) -49 (CL_INVALID_ARG_INDEX).
- sycl::link API could fail to JIT-compile your code if input kernel bundle/s contain more than one device image within them and specialization constants are used. The same behavior can be observed with -fno-sycl-early-optimizations compiler flag. The amount of device images within a kernel bundle is controlled with -fsycl-device-code-split flag. So, if you are using specialization constants + sycl::link API + device code split feature, you could encounter "symbol multiply defined" error coming from JIT compiler.
The only known workaround is to avoid having more than one device image per kernel bundle, i.e. to avoid using device code split feature.
- Error of undefined reference to sinpif and cospif functions such as Compilation from IR - skipping loading of FCLerror: undefined reference to `sinpif' without them being used in application code is caused by a compiler optimization phase. Workaround is to use compiler flags -mllvm -enable-transform-sin-cos=0 which disables the faulty optimization.
-
Sourcing oneAPI setvars.sh or setvars.bat will override the existing clang/clang++ driver. Please see article for workaround and fix.
- Using #pragma omp declare simd on a member template is currently not supported and can lead to the error "error: function declaration is expected after 'declare simd' directive`. Non-template member functions and template function which are not a member of a class are not affected".
- Using Microsoft Visual Studio* as a host compiler for DPC++ with C++17 enabled causes the error C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\include\sycl\CL/sycl/ONEAPI/accessor_property_list.hpp(199): error C2686: cannot overload static and non-static member functions with the same parameter types. Refer to the article here on how to workaround this issue.
-
Using -Qlong-double on windows has many potential for errors, since MSVC has never supported long double as an 80 fit FP type. The Microsoft math libraries, as well as formatted input and output, have no support for 80 bit long double.
In addition, the Microsoft C++ standard libraries (libcpmt.lib, libcpmtd.lib) define several symbols that conflict with standard double extended math function names, for example: frexpl, expl, logl, sinl, cosl, atanl (and others). These symbols have double precision (FP64) implementations in the Microsoft libraries, and double extended precision (FP80) implementations in the Intel math library (libmmt.lib). Because of this, correct behavior of FP80 math functions is not guaranteed when libcpmt.lib is linked first. Conversely, some Microsoft C++ standard library functions may malfunction when the Intel libmmt.lib is linked in first. Due to the fact that symbols such as frexpl are defined in the same module as standard C++ functions, link errors are known to occur for simple C++ programs that are compiled with the -Qlong-double option; these can be avoided by linking in libcpmt.lib before libmmt.lib. - USM support for implicit migrations of shared-allocations between device and host is currently implemented in SW using access violation mechanisms (e.g. SIGSEV) to identify access from host. Undefined behavior may occur if applications rely on similar access-violation mechanisms, or they use system calls to access shared-memory allocations before being migrated to host by the GPU driver.
- icx compiler does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.
- Attempt to use Link Time Optimization (LTO) is causing a linker failure. To successfully link, make sure you have the recommended versions of binutils for your OS listed at Intel® oneAPI DPC++/C++ Compiler and Intel® oneAPI DPC++ Library System Requirements.
- User-defined functions with the same name and signature (exact match of arguments, return type does not matter) as of an OpenCL C built-in function, can lead to Undefined Behavior. More details about this issue can be found at Known Issue: User-defined Functions with the Same Signature as OpenCL C built-in functions.
- #pragma float_control that occurs at file scope are not correctly effective for statement blocks that are nested within class definitions. The same issue exists for #pragma clang fp.
- When compiling for FPGA, if you declare kernel names locally, the kernel name is mangled in FPGA optimization reports. To work around this issue, declare kernel names globally.
- When debugging FPGA emulator code in Microsoft Visual Studio* on a Windows* system, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
- When compiling for FPGA and using a read-only accessor for a very wide struct, the compile times can be large. As a workaround to address long compile times, use a read-write accessor instead.
- When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed, but the compiled binary might fail at runtime. There is no workaround available for this issue currently.
-
Compiles for the FPGA emulator might fail if you have also installed a GPU platform (oneAPI-specific). You might see the Error: Compiler Error: OpenCL kernel compile/link FAILED error message. To work around this error and to achieve a successful FPGA emulator compilation, perform one of the following solutions:
-
Solution 1: Add the -Xsfast-emulator flag to every dpcpp command when targeting the emulation flow.
-
Solution 2: Execute one of the OS-specific command listed in the following:
-
On Ubuntu 18.04: export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
-
On Ubuntu 20.04: export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6
-
On SLES 15: export LD_PRELOAD=/usr/lib64/libstdc++.so.6
-
On CentOS 8.x/RHEL 8.1: export LD_PRELOAD=/lib64/libstdc++.so.6
-
-
NOTE: Intel® recommends Solution 2 when working with code samples as it is inconvenient to modify the dpcpp command in the code sample CMake file.
-
When you perform FPGA compile and link stages with a single dpcpp command (for example, dpcpp -fintelfpga <other arguments> -Xshardware src/kernel.cpp), if the source code is not located in the current directory, you might observe that the source code browser is missing in the generated FPGA optimization reports. To work around this issue, compile and link the executable in separate stages, as follows:
dpcpp -fintelfpga <other arguments> -Xshardware -c src/kernel.cpp -o kernel.o dpcpp -fintelfpga <other arguments> -Xshardware -kernel.o
-
When compiling for FPGA, the debug support on Windows is not available when using device-side libraries. To avoid this issue, do not run a debugger on the emulator platform on Windows.
-
In the FPGA optimization report, the Loop Viewer (Alpha) can only handle loops with 100 iterations or less currently. For designs with loops greater than 100 iterations, the optimization reports hang. There is no known workaround for this issue.
-
The modulefiles-setup.sh script is not supported for FPGA in this release. As a workaround, use the setvars.sh script.
-
FPGA optimization reports are not displayed correctly within Microsoft Visual Studio on Windows. To view the reports, open the report.html file generated in the project directory.
-
On Windows, compiling FPGA designs in a directory with a long path name might fail and you might see the following error:
dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
NMAKE : fatal error U1077: ‘…\oneAPI\compiler\latest\windows\bin\dpcpp.EXE' : return code '0x1'As a workaround, either compile the design in a directory with a short path name or reset TMP and TEMP environment variables to point to a shorter path (for example, C:\temp).
-
When compiling for FPGA, the Windows emulator flow using -c to create object files, linking through to an archive file, and then generating an executable from that archive might result in an executable that fails to launch device kernels. As a workaround for this issue, add the -fsycl-device-code-split=none flag to the archive step as shown in the following:
# generate .obj files dpcpp /EHsc -fintelfpga -c host.cpp device.cpp device_adder.cpp -DFPGA_EMULATOR # generate host.a dpcpp -fintelfpga -fsycl-link=image -fsycl-device-code-split=none host.obj device.obj device_adder.obj # generate .exe dpcpp -fintelfpga host.a /link /wholearchive # emulator executable host.exe
2021.3.0 Release
- New attribute allow_cpu_features which allows the use of intrinsics function and architecture-specific functionality in the attributed function.
#include <immintrin.h> // contains all of the _FEATURE values. _attribute_((allow_cpu_features(_FEATURE_AVX2))) void my_function() { // Code that either uses AVX2 features, or we want to be optimized for AVX2. }
New Features in DPC++
- Added support for DPC++ extension SYCL_INTEL_local_memory extension to allocate static local memory in SYCL kernels.
- Added support to DPC++ extension ExplicitSIMD for
- Coexistence of ESIMD and regular SYCL kernels in the same source
- Indirect read and write methods in ESIMD class
- DPC++ reduction extension(ONEAPI::reduction) now has support for
- Operator +=, *=, |=, ^=, &= for custom type reducers
- Multiple reduction variables
- Added major support for SYCL 2020 reduction on par with ONEAPI::reduction extension:
- Reduction constructor functions accept SYCL buffer or USM pointer instead f SYCL accessor or USM pointer as in ONEAPI::reduction
- Reduction constructor functions accept an optional property sycl::property::reduction::initialize_to_identity used to specify if the original reduction variable is discarded or added to the final reduction result;
- Added support for SYCL 2020 features sycl::kernel_bundle and a new buffer constructor from shared_ptr<T[]>. A complete list of SYCL2020 features and DPC++ extensions supported can be found here.
- Added support for the group_local_memory_for_overwrite function in FPGA to allocate local memory that is accessible to and shared by all work items of a workgroup.
- In FPGA optimization reports, added support for viewing global memory in the System Viewer (previously known as the Graph Viewer).
- Added support for -qactypes and /Qactypes flags to link against AC type libraries in FPGA.
- Added support for the -Xsauto-pipeline flag to pipeline loops in non-task kernels in FPGA.
- Added support for the -Xsffp-contract=fast flag to reduce floating-point rounding operations in FPGA.
- Added support for the atomic_fence function in FPGA.
- Added support for integers with widths greater than 2048 bits in FPGA.
- Added support for the FPGA accessor property no_alias.
New Features for OpenMP offload
- OpenMP 5.0/5.1 user-defined mapper (declare mapper)
- OpenMP 5.1 dispatch variant support
- Improved OpenMP and DPC++ USM composability
Bug Fixes
- Fixed an FPGA compilation error on the emulator platform that needed removing or renaming the libstdc++.so.6 file.
- Fixed an FPGA clang error that required installing the required compatibility library.
- Fixed an FPGA compilation failure on the Windows system due to the space in the default installation directory.
- Fixed an issue with FPGAs where the software stack for Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 could not be installed on the same system.
Known Issues and Limitations
- Sourcing oneAPI setvars.sh or setvars.bat will override the existing clang/clang++ driver. Please see article for workaround and fix.
- Using Microsoft Visual Studio* as a host compiler for DPC++ with C++17 enabled causes the error C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\include\sycl\CL/sycl/ONEAPI/accessor_property_list.hpp(199): error C2686: cannot overload static and non-static member functions with the same parameter types. Refer to the article here on how to workaround this issue.
-
Using -Qlong-double on windows has many potential for errors, since MSVC has never supported long double as an 80 fit FP type. The Microsoft math libraries, as well as formatted input and output, have no support for 80 bit long double.
In addition, the Microsoft C++ standard libraries (libcpmt.lib, libcpmtd.lib) define several symbols that conflict with standard double extended math function names, for example: frexpl, expl, logl, sinl, cosl, atanl (and others). These symbols have double precision (FP64) implementations in the Microsoft libraries, and double extended precision (FP80) implementations in the Intel math library (libmmt.lib). Because of this, correct behavior of FP80 math functions is not guaranteed when libcpmt.lib is linked first. Conversely, some Microsoft C++ standard library functions may malfunction when the Intel libmmt.lib is linked in first. Due to the fact that symbols such as frexpl are defined in the same module as standard C++ functions, link errors are known to occur for simple C++ programs that are compiled with the -Qlong-double option; these can be avoided by linking in libcpmt.lib before libmmt.lib. - USM support for implicit migrations of shared-allocations between device and host is currently implemented in SW using access violation mechanisms (e.g. SIGSEV) to identify access from host. Undefined behavior may occur if applications rely on similar access-violation mechanisms, or they use system calls to access shared-memory allocations before being migrated to host by the GPU driver.
- Specialization constants with a size less than 8 bytes are not supported on the level zero backend.
- Invoking GPU offload code from a global object destructor in DPC++ leads to undefined behavior.
- User-Defined Reduction(UDR) is not currently supported in SIMD and will be enabled in a future release.
- icx compiler does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.
- Attempt to use Link Time Optimization (LTO) is causing a linker failure. To successfully link, make sure you have the recommended versions of binutils for your OS listed at Intel® oneAPI DPC++/C++ Compiler and Intel® oneAPI DPC++ Library System Requirements.
- User-defined functions with the same name and signature (exact match of arguments, return type does not matter) as of an OpenCL C built-in function, can lead to Undefined Behavior. More details about this issue can be found at Known Issue: User-defined Functions with the Same Signature as OpenCL C built-in functions.
- #pragma float_control that occurs at file scope are not correctly effective for statement blocks that are nested within class definitions. The same issue exists for #pragma clang fp.
- A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device:
- is_endian_little
- global_mem_size
- local_mem_size
- max_constant_buffer_size
- max_mem_alloc_size
- vendor
- name
- is_available
- When compiling for FPGA, if you declare kernel names locally, the kernel name is mangled in FPGA optimization reports, such as Summary, FMAX II Report, Area Analysis of System, Graph Viewer, Kernel Memory Viewer, and Schedule Viewer. To work around this issue, declare kernel names globally.
- When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
- When compiling for FPGA and using a read-only accessor for a very wide struct, the compile-time to RTL (prior to the FPGA hardware image creation stage) can be large. As a workaround to address this long compile time, use a read-write accessor instead.
- When compiling for FPGA, if you declare very long kernel names, the compiler errors out. As a workaround, keep your kernel names shorter than 260 characters.
- On Windows, the FPGA emulator can silently fail by running out of memory. As a workaround, to catch this error, write your kernel code using the try-catch syntax.
-
When compiling for FPGA, the compiler may sometimes ignore the ivdep attribute when it contains a pointer that is declared outside the scope where the attribute is applied. For example:
int *p = ... // enter new scope { [[intel::ivdep(p)]] for (int i = 0; i < N; i++) { // accesses to p } }
In this example, the compiler might still find dependences on accesses to p in the loop despite the application of the ivdep attribute. To work around this limitation, declare the pointer (intended to be used in the ivdep attribute) within the same scope where the attribute is applied, if possible. This limitation does not result in functional errors on correctly written code but may affect performance on the generated hardware.
- When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed but the compiled binary might fail at runtime. There is no workaround available for this issue currently.
- Compiles for the FPGA emulator might fail if you have also installed a GPU platform (oneAPI-specific). You might see the Error: Compiler Error: OpenCL kernel compile/link FAILED error message. To work around this error and to achieve a successful FPGA emulator compilation, perform one of the following solutions:
-
Solution 1: Add the -Xsfast-emulator flag to every dpcpp command when targeting the emulation flow.
-
Solution 2: Execute one of the OS-specific command listed in the following:
-
On Ubuntu 18.04: export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
-
On Ubuntu 20.04: export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6
-
On SLES 15: export LD_PRELOAD=/usr/lib64/libstdc++.so.6
-
On CentOS 8.x/RHEL 8.1: export LD_PRELOAD=/lib64/libstdc++.so.6
-
-
-
NOTE: Intel® recommends Solution 2 when working with code samples as it is inconvenient to modify the dpcpp command in the code sample CMake file.
-
When you perform FPGA compile and link stages with a single dpcpp command (for example, dpcpp -fintelfpga <other arguments> -Xshardware src/kernel.cpp), if the source code is not located in the current directory, you might observe that the source code browser is missing in the generated FPGA optimization reports. To work around this issue, compile and link the executable in separate stages, as follows:
dpcpp -fintelfpga <other arguments> -Xshardware -c src/kernel.cpp -o kernel.o dpcpp -fintelfpga <other arguments> -Xshardware -kernel.o
-
When compiling for FPGA, the debug support on Windows is not available when using device-side libraries. To avoid this issue, do not run a debugger on the emulator platform on Windows.
-
In the FPGA optimization report, the Loop Viewer (Alpha) can only handle loops with 100 iterations or less currently. For designs with loops greater than 100 iterations, the optimization reports hang. There is no known workaround for this issue.
-
The modulefiles-setup.sh script is not supported for FPGA in this release. As a workaround, use the setvars.sh script.
- FPGA optimization reports are not displayed correctly within Microsoft Visual Studio on Windows. To view the reports, open the report.html file generated in the project directory.
-
RHEL, CentOS, and SLES installations might encounter the following error if the lsb_release package is not installed:
dpcpp: error: unable to execute command: Executable "aoc" doesn't exist!
dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)As a workaround for this issue, install the lsb_release package using one of the following OS-specific commands:
-
RHEL/CentOS 7 and 8: sudo yum -y install redhat-lsb-core
-
SLES 15 SP1: sudo zypper --non-interactive install lsb-release
-
-
On Windows, compiling FPGA designs in a directory with a long path name might fail and you might see the following error:
dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
NMAKE : fatal error U1077: ‘…\oneAPI\compiler\latest\windows\bin\dpcpp.EXE' : return code '0x1'
As a workaround, compile the design in a directory with a short path name.
2021.2.0 Release
- Support for Alderlake and Sapphire rapids ISA. Following compiler options added:
- -mavxvnni
- -mcldemote
- -mhreset
- -mptwrite
- -mserialize
- -mwaitpkg
- -march=alderlake, -xalderlake
-
-march=sapphirerapids, -xsapphirerapids
- CMake support for icx and icpx with compiler id INTELLLVM starting with CMake 3.20.0 version.
New Features in DPC++
- Added support for SYCL2020 features device_has(), aspects, math array, and global work offset in kernel enqueue. A complete list of SYCL2020 features and DPC++ extensions supported can be found here.
- Added support for DPC++ extension for pinned memory property
- Added support for Experimental Explicit SIMD (ESIMD) extension with Level Zero runtime. Also added support on Windows host.
- Partial support for #pragma vector aligned/unaligned
- Auto mode for device code split feature which will now be the default mode.
- Compiler IDE integration support for Microsoft* Visual Studio 16.9.
- Fast math is enabled by default (i.e., -fp-model=fast), which means the compiler can make various out-of-box optimizations for floating-point math (float or double). With this optimization enabled, you might observe different bitwise results when compared to results from the oneAPI 2021.1 release or GCC.
- Added support for Algorithmic C data types (ac_int, ac_fixed, ac_fixed_math, hls_float, hls_float_math, and ac_complex).
- Added support for targeting multiple homogeneous FPGA devices with the same or different device codes.
- Added support for viewing loop bottlenecks using the Bottlenecks viewer in the FPGA optimization report.
- Added support for [[intel::scheduler_target_fmax_mhz(N)]] kernel attribute.
- Added support for fp contract and fp reassociate pragmas to handle kernel’s arithmetic and floating-point operations at a finer granularity.
New features for OpenMP offload
- Bug fixes and performance improvements this release.
Changes to Existing Features
-
Intel compiler changed oneMKL implicit link option to -qmkl to avoid conflict with LLVM option -mkl (see LLVM documentation)
Bug Fixes
- Fixed an issue with the FPGA-specific flag -reuse-exe=. It is now supported on both Windows and Linux systems.
- Fixed link warnings that were observed when compiling for FPGA and creating device code archive on Windows.
- Fixed issues in the Bottleneck Viewer in the FPGA optimization report.
- Fixed the aocl diagnose command error related to ICD diagnostics.
Known Issues and Limitations
- YUM/DNF/APT/ZYPPER packages oneAPI 2021.1 Gold (initial release) bug will prevent UPGRADEs. More details on this can be found here.
- Using Microsoft Visual Studio* as a host compiler for DPC++ with C++17 enabled causes the error C:\Program Files (x86)\Intel\oneAPI\compiler\latest\windows\include\sycl\CL/sycl/ONEAPI/accessor_property_list.hpp(199): error C2686: cannot overload static and non-static member functions with the same parameter types. Refer to the article here on how to workaround this issue.
- USM support for implicit migrations of shared-allocations between device and host is currently implemented in SW using access violation mechanisms (e.g. SIGSEV) to identify access from host. Undefined behavior may occur if applications rely on similar access-violation mechanisms, or they use system calls to access shared-memory allocations before being migrated to host by the GPU driver.
- Specialization constants with a size less than 8 bytes are not supported on the level zero backend.
- Invoking GPU offload code from a global object destructor in DPC++ leads to undefined behavior.
- User-Defined Reduction(UDR) is not currently supported in SIMD and will be enabled in a future release.
- #pragma float_control that occurs at file scope are not correctly effective for statement blocks that are nested within class definitions. The same issue exists for #pragma clang fp.
- A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device:
- is_endian_little
- global_mem_size
- local_mem_size
- max_constant_buffer_size
- max_mem_alloc_size
- vendor
- name
- is_available
- When compiling for FPGA, if you declare kernel names in an unnamed namespace, the kernel name does not display properly in FPGA optimization reports, such as Summary, FMAX II Report, Area Analysis of System, Graph Viewer, Kernel Memory Viewer, and Schedule Viewer. To work around this issue, declare kernel names globally.
- When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
- When compiling for FPGA and using a read-only accessor for a very wide struct, the compile-time to RTL (prior to the FPGA hardware image creation stage) can be large. As a workaround to address this long compile time, use a read-write accessor instead.
- When compiling for FPGA, if you declare very long kernel names, the compiler errors out. As a workaround, keep your kernel names shorter than 260 characters.
- On Windows, the FPGA emulator can silently fail by running out of memory. As a workaround, to catch this error, write your kernel code using the try-catch syntax.
-
When compiling for FPGA, the compiler may sometimes ignore the ivdep attribute when it contains a pointer that is declared outside the scope where the attribute is applied. For example:
int *p = ...
// enter new scope
{
[[intel::ivdep(p)]]
for (int i = 0; i < N; i++) {
// accesses to p
}
}
In this example, the compiler might still find dependences on accesses to p in the loop despite the application of the ivdep attribute. To work around this limitation, declare the pointer (intended to be used in the ivdep attribute) within the same scope where the attribute is applied, if possible. This limitation does not result in functional errors on correctly written code but may affect performance on the generated hardware. -
When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed but the compiled binary might fail at runtime. There is no workaround available for this issue currently.
-
The software stack for Intel® PAC with Intel Arria® 10 GX FPGA and that for Intel® FPGA PAC D5005 are not compatible with each other on the same machine. If you have installed one of them already on a system, you must first uninstall it by running the aocl uninstall command before installing the other.
-
Compiles for the FPGA emulator might fail if you have also installed a GPU platform (oneAPI-specific). You might see the Error: Compiler Error: OpenCL kernel compile/link FAILED error message. To work around this error and to achieve a successful FPGA emulator compilation, perform one of the following solutions:
-
Solution 1: Add the -Xsfast-emulator flag to every dpcpp command when targeting the emulation flow.
-
Solution 2: Execute one of the OS-specific command listed in the following:
-
On Ubuntu 18.04: export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
-
On Ubuntu 20.04: export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6
-
On SLES 15: export LD_PRELOAD=/usr/lib64/libstdc++.so.6
-
On CentOS 8.x/RHEL 8.1: export LD_PRELOAD=/lib64/libstdc++.so.6
-
-
NOTE: Intel® recommends Solution 2 when working with code samples as it is inconvenient to modify the dpcpp command in the code sample CMake file.
-
When you perform FPGA compile and link stages with a single dpcpp command (for example, dpcpp -fintelfpga -Xshardware src/kernel.cpp), if the source code is not located in the current directory, you might observe that the source code browser is missing in the generated FPGA optimization reports. To work around this issue, compile and link the executable in separate stages, as follows:
dpcpp -fintelfpga -Xshardware -c src/kernel.cpp -o kernel.o
dpcpp -fintelfpga -Xshardware -kernel.o -
When compiling for FPGA, the debug support on Windows is not available when using device-side libraries. To avoid this issue, do not run a debugger on the emulator platform on Windows.
-
In the FPGA optimization report, the Loop Viewer (Alpha) can only handle loops with 100 iterations or less currently. For designs with loops greater than 100 iterations, the optimization reports hang. There is no known workaround for this issue.
-
When compiling for FPGA or GPU, you might see the error clang: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory. To work around this issue, you must install the required compatibility library by executing one of the following OS-specific commands:
-
On Ubuntu 20.04: sudo apt install -y libncurses5 libncurses5-dev libncursesw5-dev
-
On RHEL/CentOS 8: sudo yum install ncurses-compat-libs
-
On SUSE 15: sudo zypper install libcurses5 ncurses5-devel
-
-
When you perform FPGA compilation for the emulator platform on a Linux-based OS, you may encounter the following error:
Error: Compiler Error: OpenCL kernel compile/link FAILED
dpcpp: error: fpga compiler command failed with exit code 1 (use -v to see invocation)
To work around this issue, you must either remove or rename the libstdc++.so.6 file by using one of the following commands:-
To remove the file:
rm -f /opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/linux64/lib/dspba/linux64/libstdc++.so.6 -
To rename the file:
mv /opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/linux64/lib/dspba/linux64/libstdc++.so.6 /opt/intel/oneapi/compiler/2021.2.0/linux/lib/oclfpga/linux64/lib/dspba/linux64/libstdc++.so.6.bak
-
NOTE: The libstdc++.so.6 file is not required by the Intel® oneAPI DPC++/C++ Compiler and it might be deleted even if you are not impacted by this error.
2021.1.2 Patch Release
- This 2021.1.2 is a PATCH release. It is not a full compiler and relies on updating an existing one. It is intended to install over an existing oneAPI Base Toolkit 2021.1.1 installation.
- This patch release fixes the known issue causing ICX OpenMP to offload to hang with the latest Level 0 driver. This patch is also recommended for DPCPP to work with the latest Level 0 driver. This patch is designed and tested to work with driver(s):
- Windows GO HERE. Select either the WIn 10 DCH driver 27.20.100.9030 or the Xe MAX driver 27.20.100.9039 Please update to this driver if you plan to use the DPCPP or ICX OpenMP offload compilers in this patch.
- Linux GO HERE This patch compiler is designed and tested to work with driver release 20201209. Please update to this driver if you use DPCPP or ICX OpenMP offload.
- When installing a patch release, install the latest patches for all the compilers that they use(Intel Fortran Compiler/Intel DPC++/C++ Compiler/Intel C++ Compiler classic).
- Intel® CPU Runtime for OpenCL™ Applications is also required to be re-installed. You can download the Intel® CPU Runtime for OpenCL™ Applications for Windows from here. For Linux, the package is distributed through APT and YUM, please follow the instructions on Installing Intel® oneAPI Toolkits via Linux* Package Managers to set up the repository and install the package "intel-oneapi-runtime-opencl".
2021.1.1 Release
Key Features in DPC++
- Compliance with DPC++ 1.0 specification
- Support of Ahead-Of-Time (AOT) compilation.
- Experimental Explicit SIMD programming support
- To align with SYCL standard evolution, CL_SYCL_LANGUAGE_VERSION is replaced with SYCL_LANGUAGE_VERSION.
- Integration with Visual Studio* 2017 & 2019, plus Eclipse* on Linux.
- Applications using std::* math function in the kernel code to be compiled with the option -fsycl-device-lib= that accepts arguments libc, libm-fp32, libm-fp64, all
- Detailed information and available intrinsics can be found in the Interactive Intrinsics Guide.
- Added support for installing the Intel® FPGA Add-On for oneAPI Base Toolkit via Linux package managers (YUM, APT, and Zypper).
- Added support for targeting multiple FPGA platforms.
Key Features in OpenMP offload
- OpenMP 4.5 and OpenMP 5.1 subset support
- OpenMP offloading support for multiple GPUs
- OpenMP Offloading opt-report
- OpenMP and DPC++ composability
- Support for Intel USM allocation API extensions
- Support for Intel extensions of invoking MKL for GPU execution
- Inline v-ISA support in OpenMP Offloading Region
Known Issues and Limitations
- OpenMP offload may not work on level0 with the initial release of oneAPI and certain drivers. The behavior you may see when reaching a TARGET directive is that the application may hang. Use Ctrl+C to abort. To work around this issue, please use the OpenCL driver for offload using this environment variable:
export LIBOMPTARGET_PLUGIN=OPENCL
The issue has been fixed in the oneAPI DPC++/C++ Compiler 2021.1.2 Patch Release. - credist.txt file for the DPC++/C++ compiler is available online only for gold release and will be part of compiler packages in a future release.
- When subgroup algorithms are used in a loop with a conditional statement ("if", for example), the results on CPU may be incorrect.
- Using scalbn() in OpenMP target code is causing a runtime failure. The workaround is to replace scalbn() with ldexp(). The problem will be fixed in a future release.
- icx compiler does not support linking library archives using the -l option for libraries that contain target offload code. More details and workaround for this issue can be found at Known Issue: Static Libraries and Target Offload.
- Attempt to use Link Time Optimization (LTO) is causing a linker failure. To successfully link, make sure you have the recommended versions of binutils for your OS listed at Intel® oneAPI DPC++/C++ Compiler and Intel® oneAPI DPC++ Library System Requirements.
- User-defined functions with the same name and signature (exact match of arguments, return type does not matter) as of an OpenCL C built-in function, can lead to Undefined Behavior. More details about this issue can be found at Known Issue: User-defined Functions with the Same Signature as OpenCL C built-in functions.
- DPC++ runtime library follows the Semantic Versioning scheme: MAJOR.MINOR.PATCH. MAJOR version indicates breaking change (Version X is backward incompatible with version X-1). MINOR indicates a non-breaking change. The workaround is to rebuild the application.
- The format of the object files produced by the compiler can change between versions. The workaround is to rebuild the application.
- Using cl::sycl::program API to refer to a kernel defined in another translation unit leads to undefined behavior.
- Employing a read sampler for the image accessor may result in sporadic issues with the Level Zero plugin/backend.
- Printing internal defines is not supported on Windows.
- Group algorithms for MUL/AND/OR/XOR cannot be enabled for group scope due to SPIR-V limitations, and are not enabled for sub-group scope yet as the SPIR-V version is not automatically raised from 1.1 to 1.3
- Dead Argument Elimination for ESIMD cannot be run since the pointers to SPIR kernel functions are saved in !genx.kernels metadata.
- Devices returned by passing the same filters to the filter_selector may not compare equal.
- On Windows, DPC++ compiler enforces using dynamic C++ runtime for application linked with SYCL library by:
- linking with msvcrt[d].dll when -fsycl switch is used.
- emitting an error on attempts to compile a program with static C++ RT using -fsycl and /MT or /MTd.
That protects you from complicated runtime errors caused by C++ objects crossing sycl[d].dll boundary and not always handled properly by different versions of C++ RT used on app and sycl[d].dll sides.
- Runtime exception like the following on Windows when the application is compiled in Debug mode. The workaround is to compile with /Od on the command line or add /Od to Linker > General > Pass additional options to device compilers in the IDE.
Unhandled exception at 0x00007FF930ED7247 (igc64.dll) in gamma-correction.exe: 0xC0000005: Access violation reading location 0x0000027455B6C000. - Read the whitepaper on Challenges, tips, and known issues when debugging heterogenous programs using DPC++ or OpenMP offload.
- A DPC++ system that has FPGAs installed does not support multi-process execution. Creating a context opens the device associated with the context and places a lock on it for that process. No other process may use that device. Some queries about the device through device.get_info<>() also opens up the device and locks it to that process since the runtime needs to query the actual device to obtain that information. The following are examples of queries that lock the device:
- is_endian_little
- global_mem_size
- local_mem_size
- max_constant_buffer_size
- max_mem_alloc_size
- vendor
- name
- is_available
- When compiling for FPGA, if you declare kernel names in an unnamed namespace, the kernel name does not display properly in FPGA optimization reports, such as Summary, FMAX II Report, Area Analysis of System, Graph Viewer, Kernel Memory Viewer, and Schedule Viewer. To work around this issue, declare kernel names globally.
- When debugging FPGA emulator code on Windows* in Microsoft Visual Studio*, the debugger does not stop at breakpoints set in kernel code. There is no workaround available for this issue currently.
- The FPGA command aocl diagnose might report the ICD diagnostics FAILED error. You can safely ignore this error because if you installed the Intel® oneAPI Base Toolkit as directed, it means that the ICD is also installed correctly. If you have installed the Intel® FPGA Add-on for oneAPI Base Toolkit package successfully, you should not observe any compile or FPGA hardware run failures due to this error.
- The Bottleneck Viewer in the FPGA optimization report appears blank without any data reported. To work around this issue, refer to the Loop Analysis report Details pane to identify bottlenecks.
- When compiling for FPGA, if you declare very long kernel names, the compiler errors out. As a workaround, keep your kernel names shorter than 260 characters.
- When compiling for FPGA and creating device code archive on Windows, you might see the following link warnings:
warning LNK4078: multiple '__CLANG_OFFLOAD_BUNDLE_SIZE__syc' sections found with different attributes (40100800)
warning LNK4078: multiple '__CLANG_OFFLOAD_BUNDLE__sycl-fpg' sections found with different attributes (40100800)
You can safely ignore these warnings. -
The FPGA-specific flag -reuse-exe= is not supported on Windows. Refer to the fast_recompile FPGA tutorial for an example on how to separate host and device code to minimize compile time when you change only the host code.
-
When compiling for FPGA and using a read-only accessor for a very wide struct, the compile-time to RTL (prior to the FPGA hardware image creation stage) can be large. As a workaround to address this long compile time, use a read-write accessor instead.
-
When running a design compiled for the FPGA emulator, you might encounter the OpenCL API failed error message. OpenCL API returns -5 (CL_OUT_OF_RESOURCES) error. To work around this issue, increase the amount of memory to the value the emulator runtime is permitted to allocate (the default value is 512 KB) using the following commands, where is an integer followed by KB for kilobytes (for example, 1024 KB) or MB for megabytes (for example, 32 MB):
-
On Linux: export CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=
-
On Windows: set CL_CONFIG_CPU_FORCE_PRIVATE_MEM_SIZE=
-
-
When the FPGA emulator runs out of memory on Windows (as described in the previous issue), the OpenCL API failed error message might not get generated sometimes, that is, the emulator run can silently fail. As a workaround, to ensure that this out-of-memory error is caught, write your kernel code using the try-catch syntax.
-
When compiling for FPGA, the compiler may sometimes ignore the ivdep attribute when it contains a pointer that is declared outside the scope where the attribute is applied. For example:
int *p = ...
// enter new scope
{
[[intel::ivdep(p)]]
for (int i = 0; i < N; i++) {
// accesses to p
}
}
In this example, the compiler might still find dependences on accesses to p in the loop despite the application of the ivdep attribute. To work around this limitation, declare the pointer (intended to be used in the ivdep attribute) within the same scope where the attribute is applied, if possible. This limitation does not result in functional errors on correctly written code but may affect performance on the generated hardware. -
When compiling for FPGA, you cannot use a system installed with Intel® FPGA PAC D5005 to compile a SYCL application that targets Intel® PAC with Intel® Arria® 10 FX FPGA. Compilation may succeed but the compiled binary might fail at runtime. There is no workaround available for this issue currently.
-
The software stack for Intel® PAC with Intel Arria® 10 GX FPGA and that for Intel® FPGA PAC D5005 are not compatible with each other on the same machine. If you have installed one of them already on a system, you must first uninstall it by running the aocl uninstall command before installing the other.
-
Compiles for the FPGA emulator might fail if you have also installed a GPU platform (oneAPI-specific). You might see the Error: Compiler Error: OpenCL kernel compile/link FAILED error message. To work around this error and to achieve a successful FPGA emulator compilation, perform one of the following solutions:
-
Solution 1: Add the -Xsfast-emulator flag to every dpcpp command when targeting the emulation flow.
-
Solution 2: Execute one of the OS-specific command listed in the following:
-
On Ubuntu 18.04: export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
-
On Ubuntu 20.04: export LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6
-
On SLES 15: export LD_PRELOAD=/usr/lib64/libstdc++.so.6
-
On CentOS 8.x/RHEL 8.1: export LD_PRELOAD=/lib64/libstdc++.so.6
-
-
NOTE: Intel® recommends Solution 2 when working with code samples as it is inconvenient to modify the dpcpp command in the code sample CMake file.
Support Deprecated
-mkl compiler option replaced with -qmkl
The compiler option on Linux -mkl is deprecated and may be removed in a future release. In a future release, the replacement will be -qmkl. This compiler option tells the compiler to link to certain libraries in the Intel® oneAPI Math Kernel Library.
Support for Intel® Xeon Phi™ Processor x200 “Knights Landing (KNL)” and Intel® Xeon Phi™ Processors “Knights Mill (KNM)” is deprecated and will be removed in a future release.
Intel® Xeon Phi™ customers should continue to use compilers, libraries, and tools from the Intel® Parallel Studio XE 2020 and older PSXE releases, or compilers from the Intel® oneAPI Base Toolkit and Intel® oneAPI HPC Toolkit versions 2021.2 or 2021.1.
Additional Documentation
- Get Started with the Intel® oneAPI Toolkits for Linux*
- Get Started with the Intel® oneAPI Toolkits for Windows*
- OneAPI Versioning Schema based on Semantic Versioning
- Intel® oneAPI DPC++/C++ Compiler 2021.1 Developer Guide and Reference
Notices and Disclaimers
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
Intel technologies may require enabled hardware, software, or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.