This document summarizes new and changed product features and includes notes about features and problems not described in the product documentation.
Where to Find the Release
Please follow the steps to download the Intel® oneAPI Base Toolkit from the Intel® oneAPI Base Toolkit Download page and follow the installation instructions to install.
The Intel® oneAPI DPC++/C++ Compiler’s integrated support for Altera FPGA has been removed as of the 2025.1 release. Altera® will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.
For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.
oneAPI 2025.1.0, Compiler Release 2025.1.0
Major New Features and Enhancements
MemorySanitizer Support: Extended CPU Memory Sanitizer support to device-side, including GPUs facilitating detection and troubleshooting of memory issues in both CPU and device code. This improves application reliability by ensuring comprehensive memory error checking across platforms.
ccache* Integration: Compiler now supports ccache* to significantly speed up build times for C++ and SYCL codes. By caching previous compilations and reusing them, developers can experience faster iterations and more efficient workflows.
Code Coverage Tool Enhancements: Compiler's code coverage tool now includes GPU support and enhanced CPU coverage for applications using C/C++, SYCL, and OpenMP. It offers you detailed analysis and comprehensive HTML reports to identify tested and untested code sections, ultimately improving test coverage and code quality while ensuring easy integration into workflows.
Floating Point Accuracy Controls: User control over accuracy floating-point operations and library calls is now extended to the device code.
SYCL Interoperability with Graphics APIs:Added initial support for SYCL interoperability with DirectX* 12 and Vulkan*, which enables developers to build efficient visual compute, media processing, and rendering applications on Intel® Graphics. For details on image-formats and platform support, refer to SYCL Interoperability Limited Support
New Features
SYCL Compiler:
- Implemented initial support for SYCL Virtual Functions support with the intent to gather initial feedback from users. Please refer to the Known Issues section for details on current limitations of this feature.
- Dynamic linking of device code is now supported via
-fsycl-allow-device-image-dependencies
command line option. This feature allows device code to be exported via a Windows DLL and includes support for dynamic linking of AOT compiled images for the OpenCL GPU backend. - Enhancements to free function kernel support include the addition of structs as kernel arguments and the inclusion of work group memory as a kernel parameter.
- Device sanitizer now supports invalid kernel argument detection, and address sanitizer has been enhanced to detect null pointers.
- A mechanism has been implemented to lift restrictions on SYCL device code in constant expressions via the option -fsycl-allow-all-features-in-constexpr.
SYCL Library:
- Enhanced SYCL Graph functionality with implicit recording mechanism and dynamic command-groups, and a new graph enqueue function,
execute_graph
, in accordance with the updated sycl_ext_oneapi_graph extension. - Added support for Intel® Arc™ B series and Intel® Core Ultra Series device architectures.
- Added additional devices with Joint Matrix support: Battlemage, Lunar Lake and Arrow Lake H. Added more types and shapes to PVC combinations for SYCL Matrix.
- New ESIMD features include mask compressed ESIMD load/store API, support for root group barriers, addition of
clamp
API for ESIMD, and support for theext::intel::experimental::esimd::frem
function - Implemented the following set of extensions:
- Added support for
sycl_ext_oneapi_enqueue_functions
to SYCL Graph. - Implemented
sycl_ext_oneapi_raw_kernel_arg
extension. - Added initial support for
sycl_ext_oneapi_atomic16
extension. - Implemented
sycl_ext_oneapi_get_kernel_info
extension. - Implemented
sycl_ext_oneapi_work_group_memory
extension. - Implemented
sycl_ext_oneapi_reduction_properties
extension.
- Added support for
Unified Runtime:
- To support NPU/GPU device coexistence in the same application, support for the new L0 init zeInitDrivers has been added in 2025.1. This enables for SYCL and OpenVINO™ and other NPU device libraries to coexist in the same application utilizing GPU + NPU functionality simultaneously.
- Updated the Mutable Command List support in the UR L0 Adapter to utilize the Level Zero Specification’s extension functionality instead of the driver experimental.
- For improved performance, usage of immediate command lists is the default behavior on Linux in the UR L0 adapter for Intel® Arc™ Series GPUs along with Intel® Core Ultra 200v Series.
On Windows, usage of immediate command lists is the default behavior on Intel® Arc™ B Series GPUs along with Intel® Core Ultra 200v Series.
OpenMP:
- Support the OMP6.0 interchange loop-transformation construct and the permutation clause.
- Emit opt-report remarks for load/store of variables listed in the nontemporal clause of the simd construct.
Misc:
- Added several enhancements in sanitizer support:
- New Numerical Stability Sanitizer (NSAN) for C++ Code adopted from community contributions
- Memory Sanitizer extended to support SYCL and OpenMP C/C++ Device Code (only USM device allocations)
- Major improvements to Address Sanitizer for Device Code – invalid kernel argument detection, null-pointer detection, memory leak detection, private memory support for openMP Offload
- For C/C++ compilations on Linux, added support for -q[no-]unknown-option-as-warning option which provides the ability to handle unknown options on the command line with a warning diagnostic. The default behavior is to error on unknown options.
Improvements
SYCL Compiler:
- Removed the need for the
SYCL_EXTERNAL
attribute in free function kernel definitions. - Enhanced compilation time for ESIMD kernels.
- Disabled attribute propagation from SYCL 1.2.1 and removed remaining SYCL 2017/1.2.1 compatibility elements, including
-Wsycl-strict
diagnostics. - Ensured compiler-generated integration headers/footers are warning-free to prevent
-Werror
build failures, especially with third-party host compilers. - Built basic functionality of the SYCL
joint_matrix
extension on theSPV_KHR_cooperative_matrix
extension. - Expanded supported aspects for the CPU AOT target.
- Added diagnostics for incorrect arguments with
-fsycl-device-obj
. - Introduced a warning for applying kernel-only attributes to non-kernel functions.
- Fixed misleading diagnostics for non-external functions/variables when using attributes like
[[sycl_device]]
or[[intel::device_indirectly_callable]]
. - Updated
-fsycl-link=image
to package host objects like-fsycl-link=early
, ensuring proper linking, especially on Windows. - Added extra optimization passes in the Native CPU pipeline.
- Updated
-fsycl-host-compiler
to use only user-provided hints (e.g.,PATH
) for locating the specified compiler, avoiding incorrect binary usage. - Deprecated
[[intel::reqd_sub_group_size]]
; use the SYCL 2020 spelling with thesycl::
namespace. - Disabled ITT annotations in device code by default to reduce code size.
- Enabled floating-point atomics via
atomicrmw
instructions for Native CPU. - Enabled nonsemantic debug info by default to improve the debugging experience.
SYCL Library:
- Added binary caching support to the
kernel_compiler
extension. - Enabled a check on Linux systems to inform users to use
SYCL_UR_TRACE
instead ofSYCL_PI_TRACE
. - Improved GDB printers for SYCL types and values.
- Renamed
ur
tour.call
in XPTI traces. - Refactored the XPTI framework to use 128-bit keys for collision elimination and added support for 64-bit universal IDs for backward compatibility.
- Made repeated calls to
command_graph::begin_recording
an error. - Aligned
sycl_ext_oneapi_address_cast
implementation with the specification. - Optimized the
atomic_ref
constructor for the SPIR-V target. - Enhanced handling of compile-time properties.
- Refined parsing of Device Sanitizer options via the
UR_LAYER_ASAN_OPTIONS
environment variable. - Improved detection of conflicts between kernel properties related to work group size.
- Enhanced framework/app software layers to provide code locations for SYCL-generated XPTI events.
- Improved performance of the
rsqrt
ESIMD API. - Added property validation to core SYCL object constructors.
- Deprecated
__SYCL_USE_VARIADIC_SPIRV_OCL_PRINTF__
. - Enforced data type restrictions in
marray
andvec
. - Improved
sycl_ext_oneapi_address_cast
by changing "dynamic" behavior to "static" where allowed. - Enhanced
sycl-ls
to reportext::intel::info::device::device_id
. - Added no-op implementations for runtime APIs for Native CPU, as programs are compiled offline.
- Updated the
local_accessor
GDB printer to display elements with a decorated pointer and address space qualifier. - Improved ESIMD
copy_to()
andcopy_from()
to useblock_load
/block_store
for better performance. - The OpenCL adapter now uses the local work size set in program IL when not specified in
clEnqueueNDRangeKernel
. - Improved OpenCL adapter to support older ICD loaders.
- Repurposed
SYCL_CACHE_TRACE
for fine-grained tracing of all SYCL program caches. - Enabled Sysman API by default in the L0 adapter, removing the need to set
ZES_ENABLE_SYSMAN
. - Allowed copy-construction of
device_global
without thedevice_image_scope
property. - Improved UR libraries to avoid unnecessary overhead if nothing is subscribed to the
ur.call
XPTI call stream. - Refactored copy engine usage checks in the L0 adapter for better performance.
- Implemented tracing for in-memory kernel and program cache.
- Improved error handling in the SYCL RT command enqueue function to provide clearer exceptions.
- Added address sanitizer AOT libraries for various GPU/CPU targets and renamed the device sanitizer library to
libsycl-asan
. - Undeprecated legacy
multi_ptr
as it is no longer deprecated in the SYCL specification. - Deprecated
info::device::atomic64
; usesycl::aspect::atomic64
instead. - Removed build options from the fast kernel cache key to reduce lookup overhead.
- Improved OpenCL adapter to use the extension version of
clGetKernelSubGroupInfo
when necessary. - Updated SYCL graph design documentation with a new command-list enqueue path.
- Enhanced
online_compiler::compile
to support pre-C++11 ABI.
Misc:
- Support for OpenCL __attribute__((blocking)) has been removed. This allows enabling support for the [[clang::nonblocking]], [[clang::nonallocating]], [[clang::blocking]] and [[clang::allocating]] function type attributes, as well as their GNU-style variants.
- For the functions which return structs by value, ABI requires passing a special parameter which contains the address of memory where that returned struct should be placed. This parameter is implicit, users don't see it and can't provide any vector specification for it. Support for allowing such functions and emitting vector-variants attribute for them has been added.
Bug Fixes
SYCL:
- Resolved false positives in Device Sanitizer by unpoisoning local/private shadow memory before function return.
- Added
ext_oneapi_ballot_group
aspect to thespir64_x86_64
target, supported since OpenCL CPU 2024.2. - Restored kernel instantiations on the host for debugger compatibility with SYCL code.
- Fixed local scope module variables for Native CPU.
- Corrected device libraries requirement mask for the SPIRV target to ensure proper linking.
- Suppressed system errors when loading adapters on Windows.
- Disabled internalization of kernels for dynamic linking to ensure visibility.
- Fixed a use-after-free bug in the
clang-linker-wrapper
. - Enforced SYCL headers to be included with
#include <sycl/sycl.hpp>
. - Fixed device module splitting for ESIMD related to using
assert
in user code. - Correctly assigned architectures to their respective targets with
-fsycl-targets
. - Fixed devicelib handling when linking multiple images.
- Matched
-device_options
with-device
for AOT GPU. - Stopped passing HEX values to
-device_options
due to IGC limitations. - Fixed crash with an empty
-fsycl-targets
option. - Set calling convention to
spir_func
for SPIRV function calls related to specialization constants and hierarchical parallelism. - Added a workaround for SPIRV
AccessChain
usage in SYCL matrix operations. - Addressed code splitting issues with FPGA archives.
- Fixed parsing of device values in backend target options.
- Limited Device Sanitizer to report only one error per kernel instance.
- Resolved issues with vector shuffle built-ins on the NativeCPU backend.
- Fixed incorrect symbolizer output for shared libraries in Device Sanitizer.
- Disabled Address Sanitizer on modules with ESIMD to prevent excessive kernel code size.
- Fixed iterator invalidation issue in the SYCL Joint Matrix pass on Windows debug builds.
- Corrected integration footer for
device_global
with explicit template specialization.
OpenMP:
- Fixed a bug related to mapping of variable-length arrays where the size is known at compile time.
- Fixed a performance issue when an unroll construct is in a loop nest bound to an outer parallel for construct.
- Fixed potential unsafe vectorization of some loops that are bound to parallel for.
- Improved performance of some collapsed loops by choosing a more optimal data size for the collapsed loop IV.
- Improved offload performance of some target teams distribute parallel for reduction loops with constant trip count.
- Fixed flaky fails due to race conditions when using dispatch construct with SYCL interop objects.
- Fixed a bug where the nogroup clause of a taskloop construct was not honored.
- Fixed a crash when running certain target nowait (asynchronous offload) kernels containing loops.
- Fixed an ICE in some cases where a tile construct is bound to the same loop bound to an outer for construct.
- Fixed an issue where the device clause was not honored for the dispatch construct.
- Improved performance of some low-trip-count loops bound to the loop construt.
- Fixed a bug where some for or simd loops with trip counts > MAX_INT were not being transformed correctly.
- GPU dispatch now supports “Battlemage” architecture integrated (Lunar Lake) and discrete graphics (Intel® Arc™ B-Series graphics cards) parts that utilize the Xe2 microarchitecture.
Known Issues & Limitations
SYCL:
- Following are the details on the limited support of SYCL interoperability:
- Platform Support: Intel® Arc™ B series Graphics(Battlemage), Intel® Iris® Xe Graphics (DG2), Intel® Core™ Ultra Processors (Lunarlake and Meterolake).
- Image channels: 1, 2 and 4-channel
- Image formats: VK_FORMAT_R16G16_SFLOAT, VK_FORMAT_R32_SFLOAT, VK_FORMAT_R16G16B16A16_SFLOAT, VK_FORMAT_R32G32_SFLOAT, VK_FORMAT_R16_SFLOAT
- Known issues
- On Intel® Iris® Xe Graphics and Intel® Core™ Ultra Series 1 (Meteorlake) Processors currently there is a known issue with compressed 2D and 3D images for 1,2 and 4 channels that are greater than 64KB in size, where if users try to export images from other APIs and import into SYCL for manipulation, it leads to data mismatches once SYCL operates (performs computations) on the images. This issue found in GPU driver version 2507.12 will be addressed in an upcoming GPU driver release.
- On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts on Windows. This is because on Windows the release of the plugin DLLs races against the release of static global variables (like the default context).
- Intel Graphic Compiler's Vector Compute backend does not support O0 code and often gets miscompiled, produces wrong answers and crashes. This issue directly affects ESIMD code at O0. As a temporary workaround, we have optimize ESIMD code even in O0 mode.
- C/C++ math built-ins (like
exp
ortanh
) can return incorrect results on Windows for some edge-case input. The problems have been fixed in the SYCL implementation, and the remaining issues are thought to be in MSVC. - [new] There are known issues and limitations in virtual functions functionality, such as:
- Optional kernel features handling implementation is not complete yet.
- AOT support is not complete yet.
- A virtual function definition and definitions of all kernels using it must be in the same translation unit. Please refer to
sycl/test-e2e/VirtualFunctions
to see the list of working and non-working examples.
- When running synthetic benchmarks, it is possible for performance on Intel's Flex and Arc A Series GPUs to be less than previously measured when running with the new defaults using Immediate Command Lists in SYCL/Unified Runtime L0 Adapter. To mitigate this issue on those workloads, one can regain the lost performance by creating the SYCL queue with the `no_immediate_command_list` queue property or by setting the environment variable UR_L0_USE_IMMEDIATE_COMMANDLISTS=0. These will enforce the usage of command batching in the Unified Runtime L0 adapter which may improve the performance of those workloads.
OpenMP:
- Offload code with reduction across teams may result in incorrect results or even hangs on some platforms with integrated GPUs.
- ICX and ICPX ignore "#pragma omp flush" for spir64 offload compilation.
Other Known Issues:
- The switch from a static to a dynamic sanitizer runtime in 2025.1 compiler has led to runtime crashes due to the missing clang_rt.asan_dynamic-x86_64.dll. The workaround is to add C:\Program Files (x86)\Intel\oneAPI\compiler\2025.1\lib\clang\20\lib\windows to the PATH environment variable.
API/ABI Breaking Changes
- Updated experimental
sycl_ext_oneapi_bindless_images
extension documentation and implementation: interoperability structs/funcs were renamed toexternal
keyword overinterop
. - Removed
sycl::ext::oneapi::experimental::is_property_key
. - Removed some
OSUtil::*
funcs from ABI under-fpreview-breaking-changes
, these are used internally in the DSO and don't need to be exposed outside. - Made
ext_oneapi_cl_profile
implementation to be ABI-neutral. - Fixed SYCL Graph API to be ABI-neutral to avoid dual-abi issues on Linux.
This patch release of the compiler consists of various bug fixes and quality improvements.
Deprecation Notice: The Intel® oneAPI DPC++/C++ Compiler integrated support for Altera FPGA is now deprecated and will be removed with the compiler's release in the first quarter of 2025. Altera* will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.
For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.
This patch release consists of the following new features, improvements and bug fixes:
- Added functionality to compress device images during compilation and decompress them at runtime as needed. More details on this feature and case studies can be found at C++ with SYCL Device Image Compression.
- The Unified Runtime Level Zero Adapter enabled the usage of Level Zero System Management functionality by default.
- Created the launch API to SYCL Compat API library.
- ABI neutral version of modifiable_command_graph::print_graph has been enabled under preview option and will be enabled by default in the next major release.
- Fixed "-ipp" / "-qipp" switch linkage error.
- Added the following missing option values in IDE for -x, -ax, /arch, /Qx, /Qax flags:
-
[-x|-ax][SIERRAFOREST|GRANDRIDGE|GRANITERAPIDS|EMERALDRAPIDS|GRANITERAPIDS-D|ARROWLAKE|ARROWLAKE-S|LUNARLAKE|PANTHERLAKE|CLEARWATERFOREST] // Linux
[/arch:|/Qx|/Qax][SIERRAFOREST|GRANDRIDGE|GRANITERAPIDS|EMERALDRAPIDS|GRANITERAPIDS-D|ARROWLAKE|ARROWLAKE-S|LUNARLAKE|PANTHERLAKE|CLEARWATERFOREST] // Windows
-
- SYCLcompat introduces a new experimental launch API which allows the user to pass kernel properties, launch properties, and required local memory size in a launch_policy struct. These requirements are passed down to the SYCL runtime to define how the kernel is launched.
- Other small usability improvements
Major Enhancements and New Features
New Features:
- Hardware Enablement: Optimized for new Intel hardware including EMR, GNR, BMG, and LNL, with features such as cache hints and new data types for AI applications, delivering improved efficiency and computing power.
- Bindless Textures Support: Implemented Bindless Textures for Intel GPUs (DG2, Arc), allowing dynamic texture usage at runtime without compile-time knowledge, enabling enhanced performance and scalability.
Performance Tuning and Enhancements:
- AI and HPC Optimization: Tuned performance for AI frameworks and HPC applications.
- OpenMP Enhancements: Early support for OpenMP 6.0 features, including the DEVICE_TYPE clause for TARGET construct and mandatory offloading support. Also, fixed the OpenMP loop rotation issue. Checkout Advanced OpenMP* Device Offload with Intel® Compilers for more details.
- Compiler Reports: Enhanced opt-report for better user experience, now providing detailed information on OpenMP offloading and integrating with the open-source optimization remark framework. Details on recent enhancements can be found at Develop Highly Optimized Applications Faster with Compiler Optimization Reports
- Sanitizers for Device Code: Device code now supports LLVM sanitizers to help detect and resolve issues during development. It includes a compiler instrumentation module and runtime support, allowing it to detect issues such as out-of-bounds memory access on USM, SYCL buffers, local memory, and device globals, as well as bad-free, use-after-free, bad context, and more. In this release, PVC GPUs and CPUs are supported on Linux OS. More details on how and when to use sanitizers can be found at Find Bugs Quickly Using Sanitizers with the Intel® oneAPI DPC++/C++ Compiler.
- Comprehensive Performance Insights: Upgraded optimization reports now cover SYCL, OpenMP, and AOT compilation, offering developers deeper insights into application performance.
- Hardware Profile Guided Optimization (HWPGO): Key improvements include enhanced profile propagation for better accuracy, additional profile-driven optimizations to further boost performance, and early support for "pseudo probes" on Windows as an alternative to DWARF for profiling. Additionally, HWPGO has introduced selective function outlining, allowing for specific functions to be optimized based on profiling data, further enhancing runtime efficiency.
New Features
SYCL Compiler:
- SYCL Offload Model: Introduced a new SYCL offload driver mechanism with
--offload-new-driver
to improve infrastructure for better link times by reducing I/O and external processes. - Range Rounding Control: Added
-fsycl-range-rounding
option for managing range rounding, including forcing full rounding to reduce binary size. Additionally, the experimental-fsycl-exp-range-rounding
option performs rounding across all dimensions. - Double Type Emulation: Added
-fsycl-fp64-conv-emu
option for partial emulation of double data types on Intel GPUs. - Dynamic Linking: Initial support added for dynamic linking, though some features like
kernel_bundle API
andAOT mode
are not yet supported.
SYCL Library:
- Extensions: Implemented multiple extensions, including
sycl_ext_oneapi_prod
,sycl_ext_oneapi_profiling_tag
,sycl_ext_oneapi_forward_progress
,sycl_ext_oneapi_private_alloca
,syclext_codeplay_enqueue_native_command
, andsycl_ext_oneapi_enqueue_functions
. - Group Load/Store: Added support for
sycl_ext_oneapi_group_load_store
, enabling native hardware block read/write capabilities where applicable. - Free Function Kernels: Initial support for
sycl_ext_oneapi_free_function_kernels
extension, with known limitations around argument types and diagnostics. - Fused Multiply-Add (FMA): Added experimental ESIMD function
fma
which results in a guaranteed fused multiply-add operation performed. - Improved
sycl_ext_oneapi_group_sort
extension: Updated implementation ofsycl_ext_oneapi_group_sort
extension to match revision 2 of the specification. Previous version 1 is not available anymore and some code changes may be required.
Improvements
SYCL Compiler
-
Improved Compilation Flow: The process of generating integration footers has been optimized when no third-party host compiler is used, resulting in fewer temporary files and faster compilation times.
-
Additional Math Function Support: New support for math functions like
truncf
,sinpif
,rsqrtf
,exp10f
,ceilf
,copysignf
,cospif
,fmaxf
, andfminf
in SYCL kernels has been added as part of the C-CXX-StandardLibrary extension. More Intel Math Functions (IMF),::rand
and::srand
in device code on Intel devices, have also been integrated. -
Enhanced Error Messaging: Error messages have been improved for scenarios involving implicit
this
capture in kernels and missing architecture information when multiple targets are passed into the-fsycl-targets
flag. -
Optimized Compilation Flow: The number of commands needed for generating dependencies using the
-MD
flag has been reduced, streamlining the build process. -
Security and Debugging: Security-related compilation flags for libraries and tools have been strengthened, and the debugging experience has been improved for both Linux and Windows environments.
SYCL Library
- Support for ESIMD functions: Added support for
sqrt
andrsqrt
functions for double data types in ESIMD. - Cubemap and Sampled Image Arrays Support: Updated
sycl_ext_oneapi_bindless_images
extension to support cubemap images and sampled image arrays. - Named Barrier Allocation in ESIMD: Introduced ESIMD API for dynamic allocation of named barriers.
- Executable Command Graph Update: Added support for whole graph updates using
executable_command_graph::update
. - Deprecation Warning: A warning has been added for the use of the deprecated
<CL/sycl.hpp>
header. - Accessor Improvements:
local_accessor::get_pointer
andlocal_accessor::get_multi_ptr
now throw an invalid exception if called on the host. - Queue Operations Detection: Extended detection of nested queue operations to support shortcut methods.
- Simplified ESIMD API Interface: Added overloads of various ESIMD APIs (e.g.,
atomic_update
,block_load
,block_store
) allowing omission of some template arguments. - Bfloat16 Math Functions: Updated
sycl_ext_oneapi_bfloat16_math_functions
to support vectors ofbfloat16
passed to math functions. - Optimized
sycl::vec::as
: Improved the performance ofsycl::vec::as
by optimizing the implementation ofsycl::detail::memcpy
. - SYCL 2020 Exception Updates: Updated the implementation to throw SYCL 2020 exceptions instead of legacy SYCL 1.2.1 exceptions across the board.
sycl::vec::convert
Support: Added support forsycl::vec::convert
to and fromvec<bfloat16, N>
.- Deprecations:
marray<bool, n>::operator++/--
andaccessor::get_multi_ptr
for non-device accessors have been deprecated. - ESIMD Named Barriers: Moved ESIMD named barrier APIs out of the experimental namespace.
- SYCL Extensions and API Enhancements:
- Implemented the latest revision of
sycl_ext_oneapi_free_function_queries
. - Extended
sycl-ls --verbose
to print detailed device information, including UUIDs and architecture. - Introduced support for compile-time properties in
copy_to
andcopy_from
ESIMD APIs.
- Implemented the latest revision of
- Non-Variadic
printf
Interface: Switchedexperimental::printf
to a non-variadic interface to improve usability when printing float values. - Enhanced ESIMD API Validation: Improved validation for
rdregion
andwrregion
APIs using static assertions on template arguments. - SYCL 2020 Specification Alignment: Updated mutating swizzle operators and scalar conversions for
vec
to align with the SYCL 2020 specification. - Miscellaneous ESIMD Improvements:
- Added support for 1- and 2-byte data types to ESIMD prefetch APIs.
- Enabled
ext_intel_matrix
support for Intel GNR devices. - Introduced new overloads of
load_2d
,store_2d
, andprefetch_2d
ESIMD APIs with compile-time properties. - Added support for group shift algorithms (e.g.,
shift_group_left
,permute_group_by_xor
) for non-uniform groups. - Lifted restrictions on the ESIMD
block_store
API and enhanced theslm_atomic_update
API to supportfsub
andfadd
.
- Graph and Semaphore Support:
- Added support for graph update functionality and external semaphore wait/signal operations with values in the bindless images extension.
- Introduced device-to-device copying of
image_device_handle
.
- Unified Runtime: Removed the Plugin Interface, replacing it with the Unified Runtime, which reduces the number and size of redistributable libraries.
- Performance Improvements: Reduced startup overhead of
libsycl.so
by outlining the SYCL JIT compiler into a standalone library, dynamically loaded on first use.
Bug Fixes
SYCL Compiler
- Fixed a bug where using the
-fsycl-link-targets
flag would inadvertently trigger additional device code linking steps. - Resolved an issue where AOT-compiling for Intel GPUs would pass PVC-specific flags even if the target device was not a PVC.
- Fixed a bug with incorrect file extensions being emitted in AOT compilation when using
--save-temps
. - Fixed an issue where performing separate compilation and linking with
-fsycl-link
resulted in a "number of output files and targets should match in unbundling mode" error during the link step. - Resolved an issue where passing pointers in the generic address space to certain built-in math functions could cause compilation failure.
- Fixed a bug where compiling kernels with different
reqd_work_group_size
attributes using-fsycl-device-code-split=none
could result in a runtime exception about mismatching work-group sizes. - Resolved a bug where using the
reqd_work_group_size
attribute with fewer than three arguments caused a crash. - Addressed issues with
shift_group_[right|left]
,permute_by_xor
, andselect_from_group
algorithms returning invalid values when used with thehalf
data type.
SYCL Library
- Fixed a situation where querying
sycl::ext::oneapi::experimental::info::device
could result in an exception instead of returning an empty vector. - Corrected the
esimd::atan
implementation under the-ffast-math
flag. - Fixed an issue where component devices were not correctly identified as descendants of composite devices when creating a queue.
- Addressed an issue where querying for composite devices could return duplicate entries.
- Fixed bugs in the copy-constructor of the
config_2d_mem_access
ESIMD class, which led to compilation errors. - Resolved an issue where the use of
atomic_ref<T*>
was not detected as using theatomic64
aspect, leading to errors. - Fixed bugs with
ctanh
andcexp
returning incorrect values in edge cases. - Fixed an issue where values passed to the
-Xs
option viabuild_options
were not passed down to the device compiler. - Fixed a compilation error when defining kernels as named functors while using
-fno-sycl-unnamed-lambda
. - Corrected compilation issues with the
-fpreview-breaking-changes
flag caused by conflicts with macros inwindows.h
. - Resolved strict aliasing violations in the implementation of
sycl::vec<sycl::half, N>::operator[]
that caused errors. - Fixed bugs where barriers submitted to a command queue with host tasks ignored them, and improved synchronization of host tasks with barriers.
- Fixed issues where the compiler could emit unsupported SPIR-V instructions for bit-reversal.
- Addressed a bug where default-constructed
local_accessor
arguments could cause runtime errors, especially on Windows and under-O0
optimization on Linux. - Resolved a hang when invalid values were passed to the
ONEAPI_DEVICE_SELECTOR
. - Fixed issues with persistent cache functionality where certain setups would prevent necessary directories from being created.
- Corrected a bug where querying a kernel by name from a kernel bundle could crash the program.
- Fixed an error handling bug where non-blocking pipe operations would mistakenly throw exceptions.
- Addressed compilation issues when using non-uniform group built-ins with
marray
andvec
. - Resolved a bug where memory attributes applied to a
struct
used as a type of adevice_global
variable were ignored. - Added missing
value_type
andvector_t
member type aliases to swizzles. - Fixed shutdown sequence issues when SYCL RT was used in applications or libraries with custom shutdown processes.
- Resolved a crash when calling
event::get_backend()
on a default-constructed event in environments with malformedONEAPI_DEVICE_SELECTOR
. - Fixed a bug where
sycl-ls
with--ignore-device-selectors
would not properly ignore the environment variable. - Corrected memory order capabilities returned by the Native CPU backend.
- Fixed the variadic constructor of
sycl::ext::oneapi::experimental::properties
to match the extension specification. - Fixed build program failures when using ESIMD functions like
load_2d
,store_2d
, orprefetch_2d
. - Resolved a bug where querying free device memory on integrated Intel GPUs returned 0 instead of throwing an exception for unsupported features.
- Addressed a heap buffer overflow in the
sycl_ext_oneapi_kernel_compiler_opencl
extension implementation. - Corrected a bug where the
sycl_ext_oneapi_graph
extension ignored the access mode of accessors, creating unnecessary graph edges. - Fixed issues where graph submissions involving barriers could result in runtime errors or cause resource leaks.
- Addressed performance regressions when kernels without dependencies were submitted to in-order queues.
- Fixed profiling issues in
Level Zero
backend where timestamps could be zeros or incorrect for in-order queues. - Resolved crashes when using multiple queues with immediate command list properties--
immediate_command_list
andno_immediate_command_list.
. - Fixed a bug where
info::kernel_device_specific::work_group_size
would return the device-specific limit, ignoring the kernel on theLevel Zero
backend.
Misc
SYCL Compiler
- Reverted changes previously made on Windows to support a separate compilation scenario where the compilation step was performed without the
-fsycl
flag, but the link step included the-fsycl
flag. This scenario is now considered unsupported, as the compiler does not know which version of the standard library to link during the link step.
API/ABI Breaking Changes in 2025.0
This release is an ABI-breaking release, meaning that any applications built with older versions of the toolchain must be recompiled to run with newer versions of the SYCL runtime library.
- Bumped the major version of the SYCL runtime library to 8.
- Cleaned up the list of symbols exported from the SYCL runtime library by dropping some legacy symbols and hiding others that should not have been exported.
- Updated the ABI of several functions and methods to avoid using
std::string
and other objects in the library interface, allowing SYCL RT to be used in applications built with pre-C++11 ABI. - Changed the
ext_oneapi_copy
API from the experimentalsycl_ext_oneapi_bindless_images
extension to accept const-qualified types for theSrc
parameter.
Several API breaking changes were made, including dropping support for previously deprecated APIs and switching implementations of some classes to a preview implementation. Code modification recommendations for some of these breaking changes can be found here.
- Removed the
sycl::abs
overload taking a floating-point argument. - Removed
sycl::host_ptr
andsycl::device_ptr
. - Removed
queue::discard_or_return
. - Removed
sycl::make_unique_ptr
. - Removed the
use_primary_context
property and methods related to the previously removed host device. - Removed SYCL 1.2.1 exception subclasses, including
runtime_error
,nd_range_error
,invalid_parameter_error
,device_error
, andfeature_not_supported
. - Removed
queue::mem_advice
overload acceptingpi_mem_advice
. - Removed several deprecated ESIMD APIs.
- Removed the non-standard
sycl::id -> sycl::range
conversion operator. - Removed deprecated APIs from the
sycl_ext_oneapi_bindless_images
extension implementation. - Renamed the experimental
destroy_external_semaphore
API from thesycl_ext_oneapi_bindless_images
extension torelease_external_semaphore
. - Replaced the
image_channel_order
field of theimage_descriptor
struct with the number of channels in the experimentalsycl_ext_oneapi_bindless_images
extension. - Enforced restrictions on the first argument of lambdas/functors passed to
parallel_for(range)
andparallel_for(nd_range)
. - Switched the
sycl::vec
implementation to its preview version, which uses a different storage type to fix several strict aliasing rule violations. - Restricted math operations available to
vec<std::byte, N>
to those applicable tostd::byte
. - Switched the
sycl::exception
implementation to its preview version. - Switched math built-ins implementation to use their preview version.
- Switched
bfloat16
implementation to use its preview version. - Switched
sycl::nd_item
implementation to use its preview version. - Enforced a restriction that a buffer's element type must be device copyable.
- Restructured SYCL headers to exclude
<cmath>
and<complex>
. - Dropped support for the
SYCL_DEVICE_FILTER
environment variable. - Updated the
accessor::get_pointer
interface to returnglobal_ptr<value_type>
, which can be const-qualified if the accessor data type is const-qualified or if the accessor is read-only. - Removed deprecated APIs related to
sycl_ext_oneapi_free_function_queries
. - Moved
slm_allocator
ESIMD APIs into the experimental namespace. - Removed the deprecated
usm_system_allocator
aspect. - Removed
get_child_group
API from the experimentalsycl_ext_oneapi_root_group
extension. - Simplified template arguments related to
simd_view
of many ESIMD APIs. - Removed
ESIMD atomic_op::predec
. - Dropped interfaces from revision 1 of the experimental
sycl_ext_oneapi_group_sort
extension. - Changed the return type of
command_graph::begin_recording
andcommand_graph::end_recording
fromvoid
tobool
in the experimentalsycl_ext_oneapi_graph
extension.
Breaking changes were also made to compiler flags:
- Removed the deprecated
-fsycl-link-huge-device-code
,-fsycl-[add|link]-targets
,-foffload-static-lib
,-foffload-whole-static-lib
,-fsycl-disable-range-rounding
,-sycl-std
flags.
SYCL Known Issues
- On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts. This is due to the release of the plugin DLLs racing against the release of static global variables, such as the default context.
- The Intel Graphic Compiler's Vector Compute backend does not support certain optimization levels and often produces incorrect results or crashes. This issue directly affects ESIMD code. As a temporary workaround, optimize ESIMD code even in the affected mode.
- When using the sycl_ext_oneapi_matrix extension, it is important for some devices to use the appropriate settings corresponding to the device that will run the program, particularly for matrix operations using half data type.
- When using queue shortcut functions with in-order queues, dependencies between commands submitted to different queues may be ignored. A workaround is to explicitly call
.wait()
. This issue will be fixed in the next release. In below example, the seocnd kernel will start execution before the first completes its execution.
- C/C++ math built-ins can return incorrect results for some edge-case inputs when called from SYCL kernels.
- To enhance performance on Intel® GPUs using the Unified Runtime Level Zero Adapter, support for driver-optimized in-order lists has been introduced in version 2025.0. However, when running workloads with sycl::property::queue::enable_profiling, some performance overhead from these lists is expected. If this overhead negatively impacts performance, it can be mitigated by disabling the driver in-order lists. To do so, set UR_L0_USE_DRIVER_INORDER_LISTS=0.
-
To ensure compatibility with the Intel® oneAPI DPC++ Compiler on Windows*, which requires OpenCL 3.0, it is essential to address potential issues caused by older versions of
opencl.dll
on your system. If an outdatedopencl.dll
is present in system directories or takes precedence in the library path, it may lead to failures, including SYCL-related issues and crashes in tools like Intel® VTune™ and Intel® Advisor when specific OpenCL 3.0 features are used. The recommended solution is to replace the oldopencl.dll
with the one installed in the DPC++ package. You can do this by copying the neweropencl.dll
from$oneAPI_Install_Folder\compiler\latest\bin
to your system folder. Be sure to back up the originalopencl.dll
in case it's needed for other applications. sycl_ext_oneapi_free_function_kernels
has limitations including:- free function kernels are only supported if defined at file scope
SYCL_EXTERNAL
has to be used alongsideSYCL_EXT_ONEAPI_FUNCTION_PROPERTY
to define free function kernel- compiler won't emit any diagnostics if some restrictions from the extension specification are violated
- arguments of a free function kernels cannot be composite data types like structs or SYCL classes like
accessor
- using
-fsycl-dead-args-optimization
(ON by default) can lead to failures info::kernel::num_args
won't return the right result for free function kernels
New OpenMP Features
- Support for the
-fopenmp-offload-mandatory
compiler flag to omit creation of host-fallback code and emit a runtime error if OpenMP offload to the device fails. - Improved optimization report support for OpenMP constructs.
- Enhanced conversion scheme of nested loop constructs to consider loop trip counts.
- Updates to the
declare
variant for a dispatch construct to include GPUs with the Xe2 architecture when the match clause specifiesdevice={arch(gen)}
. - Support for the
device_type(host|nohost|any)
clause for the target construct. - Inclusion of the
if
clause for the teams construct. - Change of the map-type property to "default," allowing map-type modifiers to be specified without a map-type. For example,
map(always : x)
is equivalent tomap(always, tofrom : x)
. - Support for the Intel extension
ompx_sub_group_size
clause for the target construct to set the SIMD width of the kernel. - Support for the Intel extension
ompx_dyn_cgroup_mem
clause for the target construct, allowing dynamic allocation in SLM for GPU offloading. - Extension of environment variables
OMP_THREAD_LIMIT
,OMP_TEAMS_THREAD_LIMIT
, andOMP_NUM_THREADS
to support abstract names. For example,OMP_THREAD_LIMIT=n_cores
. - Extension of the syntax of the environment variable
OMP_PLACES
to support bound and stride for abstract names. For example,OMP_PLACES=threads(4:2)
. - Host runtime support for the environment variable
OMP_AVAILABLE_DEVICES
. - Extension of the environment variable
OMP_DEFAULT_DEVICE
to support device selection by traits.
Notable OpenMP Fixes
- Fixed a bug where the dispatch construct’s device clause was not updating OpenMP’s default-device-var ICV.
- Resolved an internal compiler error when the declare variant for a dispatch construct did not specify an
adjust_args
clause. - Fixed an optimization bug in OpenMP
for
andsimd
loops with large trip counts. - Corrected a regression where enclosing task constructs inside a teams construct triggered a compiler error message.
- When
thread_limit
is specified for both target and teams, the compiler now correctly chooses their minimum instead of always using the one specified for target. - Fixed an internal compiler error related to the initialization of global variables allocated in GPU’s SLM.
- Addressed a problem in offload runtime where the reference counts of variables mapped using
declare mapper
were not decremented correctly. - Fixed a GPU offload performance issue related to L1 cache being affected by temporary copies of reduction variables.
- Resolved a bug where user-defined reduction variables were not properly constructed or destructed.
OpenMP Known Issues
- Implicit barriers at the end of parallel regions do not act as synchronization points for the tasks associated with target nowait and dispatch nowait constructs. This may result in incorrect results or crashes. A workaround is to use #pragma omp taskwait at the end of parallel region to ensure synchronization of target/dispatch nowait regions, where it would otherwise have happened due to the presence of a parallel region’s implicit barrier.
Other Known Issues and Limitations
- Visual Studio IDE Integration: Users will encounter an error while building the C++ project using 'Intel C++ Compiler 2025' for Win32 platform. Please note that Win32 platform is not supported with 'Intel C++ Compiler 2025' and project should be compiled for x64 platform only. If Win32 platform is selected, an error will be raised that ICX compiler not found.
Hardware Support:
- -march=lunarlake
- -march=graniterapids
Please check here for details about -march usage.
Toolchain Support to Intel Platforms
Granite Rapids | Granite Rapids-D | Lunar Lake |
GCC13.1 | GCC13.1 | GCC14.1 |
Binutils 2.40 | Binutils 2.41 | Binutils 2.42 |
Glibc2.37 | Glibc2.37 | Glibc2.39 |
LLVM 16.0 | LLVM 17.0 | LLVM 18.0 |
ICX 2023.1 | ICX 2023.2 | ICX 2024.0 |
C/C++ Standard
- Intel® oneAPI DPC++/C++ Compiler version 2025.0 supports the C/C++ standards through the Clang 19 front end.
- Initiated support for C++2c, the next release of C++ after C++23, and C2y, the next release of C after C23
- Finalized the implementation of “deducing this” (C++23)
- Relaxed some constexpr restrictions (C++23)
- Implemented the [[assume]] attribute (C++23)
- Completed support for Concepts (C++20)
- Added support for char8_t (C23)
- Implemented the constexpr keyword for object declarations (C23)
- Implemented #embed for embedding binary resources in source (C23)
System Requirements
Additional Documentation
- Get Started with the Intel® oneAPI Toolkit for Linux*
- Get Started with the Intel® oneAPI Toolkit for Windows*
- OneAPI Versioning Schema based on Semantic Versioning
- Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference
- Intel® oneAPI Programming Guide
- SYCL* 2020 Specification Features and DPC++ Language Extensions Supported
-
OpenMP* Features and Extensions Supported in Intel® oneAPI DPC++/C++ Compiler
Previous oneAPI Releases
- Intel® oneAPI DPC++/C++ Compiler 2024
- Intel® oneAPI DPC++/C++ Compiler 2023
- Intel® oneAPI DPC++/C++ Compiler 2022
- Intel® oneAPI DPC++/C++ Compiler 2021
Notices and Disclaimers
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
Intel technologies may require enabled hardware, software, or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.