Intel® oneAPI DPC++/C++ Compiler Release Notes

ID 768207
Updated 3/18/2025
Version 2025.1.0
Public

This document summarizes new and changed product features and includes notes about features and problems not described in the product documentation.

Where to Find the Release

Please follow the steps to download the Intel® oneAPI Base Toolkit from the Intel® oneAPI Base Toolkit Download page and follow the installation instructions to install.

The Intel® oneAPI DPC++/C++ Compiler’s integrated support for Altera FPGA has been removed as of the 2025.1 release. Altera® will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account.
For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative.

oneAPI 2025.1.0, Compiler Release 2025.1.0

Major New Features and Enhancements

MemorySanitizer Support: Extended CPU Memory Sanitizer support to device-side, including GPUs facilitating detection and troubleshooting of memory issues in both CPU and device code. This improves application reliability by ensuring comprehensive memory error checking across platforms.

ccache* Integration: Compiler now supports ccache* to significantly speed up build times for C++ and SYCL codes. By caching previous compilations and reusing them, developers can experience faster iterations and more efficient workflows. 

Code Coverage Tool Enhancements: Compiler's code coverage tool now includes GPU support and enhanced CPU coverage for applications using C/C++, SYCL, and OpenMP. It offers you detailed analysis and comprehensive HTML reports to identify tested and untested code sections, ultimately improving test coverage and code quality while ensuring easy integration into workflows.

Floating Point Accuracy Controls: User control over accuracy floating-point operations and library calls is now extended to the device code.

SYCL Interoperability with Graphics APIs:Added initial support for SYCL interoperability with DirectX* 12 and Vulkan*, which enables developers to build efficient visual compute, media processing, and rendering applications on Intel® Graphics. For details on image-formats and platform support, refer to SYCL Interoperability Limited Support

New Features

SYCL Compiler:

  • Implemented initial support for SYCL Virtual Functions support with the intent to gather initial feedback from users. Please refer to the Known Issues section for details on current limitations of this feature. 
  • Dynamic linking of device code is now supported via -fsycl-allow-device-image-dependencies command line option. This feature allows device code to be exported via a Windows DLL and includes support for dynamic linking of AOT compiled images for the OpenCL GPU backend.
  • Enhancements to free function kernel support include the addition of structs as kernel arguments and the inclusion of work group memory as a kernel parameter.
  • Device sanitizer now supports invalid kernel argument detection, and address sanitizer has been enhanced to detect null pointers.
  • A mechanism has been implemented to lift restrictions on SYCL device code in constant expressions via the option  -fsycl-allow-all-features-in-constexpr.

SYCL Library:

  • Enhanced SYCL Graph functionality with implicit recording mechanism and dynamic command-groups, and a new graph enqueue function, execute_graph, in accordance with the updated sycl_ext_oneapi_graph extension.
  • Added support for Intel® Arc™ B series and Intel® Core Ultra Series device architectures.
  • Added additional devices with Joint Matrix support: Battlemage, Lunar Lake and Arrow Lake H. Added more types and shapes to PVC combinations for SYCL Matrix.
  • New ESIMD features include mask compressed ESIMD load/store API, support for root group barriers, addition of  clamp API for ESIMD, and support for the ext::intel::experimental::esimd::frem function
  • Implemented the following set of extensions:

Unified Runtime:

  • To support NPU/GPU device coexistence in the same application, support for the new L0 init zeInitDrivers has been added in 2025.1. This enables for SYCL and OpenVINO™ and other NPU device libraries to coexist in the same application utilizing GPU + NPU functionality simultaneously.
  • Updated the Mutable Command List support in the UR L0 Adapter to utilize the Level Zero Specification’s extension functionality instead of the driver experimental.
  • For improved performance, usage of immediate command lists is the default behavior on Linux in the UR L0 adapter for  Intel® Arc™ Series GPUs along with Intel® Core Ultra 200v Series.
    On Windows, usage of immediate command lists is the default behavior on Intel® Arc™ B Series GPUs along with Intel® Core Ultra 200v Series.

OpenMP:

  • Support the OMP6.0 interchange loop-transformation construct and the permutation clause.
  • Emit opt-report remarks for load/store of variables listed in the nontemporal clause of the simd construct.

Misc:

  • Added several enhancements in sanitizer support:
    • New Numerical Stability  Sanitizer (NSAN) for C++ Code  adopted from community contributions
    • Memory Sanitizer extended to support SYCL and OpenMP C/C++ Device Code  (only USM device allocations)
    • Major improvements to Address Sanitizer for Device Code – invalid kernel argument detection, null-pointer detection, memory leak detection, private memory support for openMP Offload
  • For C/C++ compilations on Linux, added support for -q[no-]unknown-option-as-warning option which provides the ability to handle unknown options on the command line with a warning diagnostic.  The default behavior is to error on unknown options.

Improvements

SYCL Compiler:

  • Removed the need for the SYCL_EXTERNAL attribute in free function kernel definitions.
  • Enhanced compilation time for ESIMD kernels.
  • Disabled attribute propagation from SYCL 1.2.1 and removed remaining SYCL 2017/1.2.1 compatibility elements, including -Wsycl-strict diagnostics.
  • Ensured compiler-generated integration headers/footers are warning-free to prevent -Werror build failures, especially with third-party host compilers.
  • Built basic functionality of the SYCL joint_matrix extension on the SPV_KHR_cooperative_matrix extension.
  • Expanded supported aspects for the CPU AOT target.
  • Added diagnostics for incorrect arguments with -fsycl-device-obj.
  • Introduced a warning for applying kernel-only attributes to non-kernel functions.
  • Fixed misleading diagnostics for non-external functions/variables when using attributes like [[sycl_device]] or [[intel::device_indirectly_callable]].
  • Updated -fsycl-link=image to package host objects like -fsycl-link=early, ensuring proper linking, especially on Windows.
  • Added extra optimization passes in the Native CPU pipeline.
  • Updated -fsycl-host-compiler to use only user-provided hints (e.g., PATH) for locating the specified compiler, avoiding incorrect binary usage.
  • Deprecated [[intel::reqd_sub_group_size]]; use the SYCL 2020 spelling with the sycl:: namespace.
  • Disabled ITT annotations in device code by default to reduce code size.
  • Enabled floating-point atomics via atomicrmw instructions for Native CPU.
  • Enabled nonsemantic debug info by default to improve the debugging experience.

SYCL Library:

  • Added binary caching support to the kernel_compiler extension.
  • Enabled a check on Linux systems to inform users to use SYCL_UR_TRACE instead of SYCL_PI_TRACE.
  • Improved GDB printers for SYCL types and values.
  • Renamed ur to ur.call in XPTI traces.
  • Refactored the XPTI framework to use 128-bit keys for collision elimination and added support for 64-bit universal IDs for backward compatibility.
  • Made repeated calls to command_graph::begin_recording an error.
  • Aligned sycl_ext_oneapi_address_cast implementation with the specification.
  • Optimized the atomic_ref constructor for the SPIR-V target.
  • Enhanced handling of compile-time properties.
  • Refined parsing of Device Sanitizer options via the UR_LAYER_ASAN_OPTIONS environment variable.
  • Improved detection of conflicts between kernel properties related to work group size.
  • Enhanced framework/app software layers to provide code locations for SYCL-generated XPTI events.
  • Improved performance of the rsqrt ESIMD API.
  • Added property validation to core SYCL object constructors.
  • Deprecated __SYCL_USE_VARIADIC_SPIRV_OCL_PRINTF__.
  • Enforced data type restrictions in marray and vec.
  • Improved sycl_ext_oneapi_address_cast by changing "dynamic" behavior to "static" where allowed.
  • Enhanced sycl-ls to report ext::intel::info::device::device_id.
  • Added no-op implementations for runtime APIs for Native CPU, as programs are compiled offline.
  • Updated the local_accessor GDB printer to display elements with a decorated pointer and address space qualifier.
  • Improved ESIMD copy_to() and copy_from() to use block_load/block_store for better performance.
  • The OpenCL adapter now uses the local work size set in program IL when not specified in clEnqueueNDRangeKernel.
  • Improved OpenCL adapter to support older ICD loaders.
  • Repurposed SYCL_CACHE_TRACE for fine-grained tracing of all SYCL program caches.
  • Enabled Sysman API by default in the L0 adapter, removing the need to set ZES_ENABLE_SYSMAN.
  • Allowed copy-construction of device_global without the device_image_scope property.
  • Improved UR libraries to avoid unnecessary overhead if nothing is subscribed to the ur.call XPTI call stream.
  • Refactored copy engine usage checks in the L0 adapter for better performance.
  • Implemented tracing for in-memory kernel and program cache.
  • Improved error handling in the SYCL RT command enqueue function to provide clearer exceptions.
  • Added address sanitizer AOT libraries for various GPU/CPU targets and renamed the device sanitizer library to libsycl-asan.
  • Undeprecated legacy multi_ptr as it is no longer deprecated in the SYCL specification.
  • Deprecated info::device::atomic64; use sycl::aspect::atomic64 instead.
  • Removed build options from the fast kernel cache key to reduce lookup overhead.
  • Improved OpenCL adapter to use the extension version of clGetKernelSubGroupInfo when necessary.
  • Updated SYCL graph design documentation with a new command-list enqueue path.
  • Enhanced online_compiler::compile to support pre-C++11 ABI.

Misc:

  • Support for OpenCL __attribute__((blocking)) has been removed.   This allows enabling support for the [[clang::nonblocking]], [[clang::nonallocating]], [[clang::blocking]] and [[clang::allocating]] function type attributes, as well as their GNU-style variants.
  • For the functions which return structs by value, ABI requires passing a special parameter which contains the address of memory where that returned struct should be placed. This parameter is implicit, users don't see it and can't provide any vector specification for it.  Support for allowing such functions and emitting vector-variants attribute for them has been added.

Bug Fixes

SYCL:

  • Resolved false positives in Device Sanitizer by unpoisoning local/private shadow memory before function return.
  • Added ext_oneapi_ballot_group aspect to the spir64_x86_64 target, supported since OpenCL CPU 2024.2.
  • Restored kernel instantiations on the host for debugger compatibility with SYCL code.
  • Fixed local scope module variables for Native CPU.
  • Corrected device libraries requirement mask for the SPIRV target to ensure proper linking.
  • Suppressed system errors when loading adapters on Windows.
  • Disabled internalization of kernels for dynamic linking to ensure visibility.
  • Fixed a use-after-free bug in the clang-linker-wrapper.
  • Enforced SYCL headers to be included with #include <sycl/sycl.hpp>.
  • Fixed device module splitting for ESIMD related to using assert in user code.
  • Correctly assigned architectures to their respective targets with -fsycl-targets.
  • Fixed devicelib handling when linking multiple images.
  • Matched -device_options with -device for AOT GPU.
  • Stopped passing HEX values to -device_options due to IGC limitations.
  • Fixed crash with an empty -fsycl-targets option.
  • Set calling convention to spir_func for SPIRV function calls related to specialization constants and hierarchical parallelism.
  • Added a workaround for SPIRV AccessChain usage in SYCL matrix operations.
  • Addressed code splitting issues with FPGA archives.
  • Fixed parsing of device values in backend target options.
  • Limited Device Sanitizer to report only one error per kernel instance.
  • Resolved issues with vector shuffle built-ins on the NativeCPU backend.
  • Fixed incorrect symbolizer output for shared libraries in Device Sanitizer.
  • Disabled Address Sanitizer on modules with ESIMD to prevent excessive kernel code size.
  • Fixed iterator invalidation issue in the SYCL Joint Matrix pass on Windows debug builds.
  • Corrected integration footer for device_global with explicit template specialization.

OpenMP:

  • Fixed a bug related to mapping of variable-length arrays where the size is known at compile time. 
  • Fixed a performance issue when an unroll construct is in a loop nest bound to an outer parallel for construct.
  • Fixed potential unsafe vectorization of some loops that are bound to parallel for.
  • Improved performance of some collapsed loops by choosing a more optimal data size for the collapsed loop IV. 
  • Improved offload performance of some target teams distribute parallel for reduction loops with constant trip count. 
  • Fixed flaky fails due to race conditions when using dispatch construct with SYCL interop objects. 
  • Fixed a bug where the nogroup clause of a taskloop construct was not honored.
  • Fixed a crash when running certain target nowait (asynchronous offload) kernels containing loops. 
  • Fixed an ICE in some cases where a tile construct is bound to the same loop bound to an outer for construct.
  • Fixed an issue where the device clause was not honored for the dispatch construct.
  • Improved performance of some low-trip-count loops bound to the loop construt. 
  • Fixed a bug where some for or simd loops with trip counts > MAX_INT were not being transformed correctly.
  • GPU dispatch now supports “Battlemage” architecture integrated (Lunar Lake) and discrete graphics (Intel® Arc™ B-Series graphics cards) parts that utilize the Xe2 microarchitecture.

Known Issues & Limitations

SYCL:

  • Following are the details on the limited support of SYCL interoperability: 
    • Platform Support: Intel® Arc™ B series Graphics(Battlemage), Intel® Iris® Xe Graphics (DG2), Intel® Core™ Ultra Processors (Lunarlake and Meterolake).
    • Image channels: 1, 2 and 4-channel
    • Image formats: VK_FORMAT_R16G16_SFLOAT, VK_FORMAT_R32_SFLOAT, VK_FORMAT_R16G16B16A16_SFLOAT, VK_FORMAT_R32G32_SFLOAT, VK_FORMAT_R16_SFLOAT
    • Known issues
      • On Intel® Iris® Xe Graphics and Intel® Core™ Ultra Series 1 (Meteorlake) Processors currently there is a known issue with compressed 2D and 3D images for 1,2 and 4 channels that are greater than 64KB in size, where if users try to export images from other APIs and import into SYCL for manipulation, it leads to data mismatches once SYCL operates (performs computations) on the images. This issue found in GPU driver version 2507.12 will be addressed in an upcoming GPU driver release.
  • On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts on Windows. This is because on Windows the release of the plugin DLLs races against the release of static global variables (like the default context).
  • Intel Graphic Compiler's Vector Compute backend does not support O0 code and often gets miscompiled, produces wrong answers and crashes. This issue directly affects ESIMD code at O0. As a temporary workaround, we have optimize ESIMD code even in O0 mode.
  • C/C++ math built-ins (like exp or tanh) can return incorrect results on Windows for some edge-case input. The problems have been fixed in the SYCL implementation, and the remaining issues are thought to be in MSVC.
  • [new] There are known issues and limitations in virtual functions functionality, such as:
    • Optional kernel features handling implementation is not complete yet.
    • AOT support is not complete yet.
    • A virtual function definition and definitions of all kernels using it must be in the same translation unit. Please refer to sycl/test-e2e/VirtualFunctions to see the list of working and non-working examples.
  • When running synthetic benchmarks, it is possible for performance on Intel's Flex and Arc A Series GPUs to be less than previously measured when running with the new defaults using Immediate Command Lists in SYCL/Unified Runtime L0 Adapter. To mitigate this issue on those workloads, one can regain the lost performance by creating the SYCL queue with the `no_immediate_command_list` queue property or by setting the environment variable UR_L0_USE_IMMEDIATE_COMMANDLISTS=0. These will enforce the usage of command batching in the Unified Runtime L0 adapter which may improve the performance of those workloads.

OpenMP:

  • Offload code with reduction across teams may result in incorrect results or even hangs on some platforms with integrated GPUs.
  • ICX and ICPX ignore "#pragma omp flush" for spir64 offload compilation.

Other Known Issues:

  • The switch from a static to a dynamic sanitizer runtime in 2025.1 compiler has led to runtime crashes due to the missing clang_rt.asan_dynamic-x86_64.dll. The workaround is to add C:\Program Files (x86)\Intel\oneAPI\compiler\2025.1\lib\clang\20\lib\windows to the PATH environment variable.

API/ABI Breaking Changes

  • Updated experimental sycl_ext_oneapi_bindless_images extension documentation and implementation: interoperability structs/funcs were renamed to external keyword over interop.
  • Removed sycl::ext::oneapi::experimental::is_property_key
  • Removed some OSUtil::* funcs from ABI under -fpreview-breaking-changes, these are used internally in the DSO and don't need to be exposed outside.
  • Made ext_oneapi_cl_profile implementation to be ABI-neutral.
  • Fixed SYCL Graph API to be ABI-neutral to avoid dual-abi issues on Linux. 

 

This patch release of the compiler consists of various bug fixes and quality improvements.

Deprecation Notice: The Intel® oneAPI DPC++/C++ Compiler integrated support for Altera FPGA is now deprecated and will be removed with the compiler's release in the first quarter of 2025. Altera* will continue to provide FPGA support through their dedicated FPGA software development tools. Existing customers can continue to use the Intel® oneAPI DPC++/C++ Compiler 2025.0 release which supports FPGA development and is available through Linux* package managers such as APT, YUM/DNF, or Zypper. Additionally, customers with an active support license can access the Intel® oneAPI DPC++/C++ Compiler 2025.0 via their customer support account

For more information and assistance with transitioning to the Altera development tools, please contact your Altera representative. 

 Major Enhancements and New Features

New Features:

  • Hardware Enablement: Optimized for new Intel hardware including EMR, GNR, BMG, and LNL, with features such as cache hints and new data types for AI applications, delivering improved efficiency and computing power.
  • Bindless Textures Support: Implemented Bindless Textures for Intel GPUs (DG2, Arc), allowing dynamic texture usage at runtime without compile-time knowledge, enabling enhanced performance and scalability.

Performance Tuning and Enhancements:

  • AI and HPC Optimization: Tuned performance for AI frameworks and HPC applications.
  • OpenMP Enhancements: Early support for OpenMP 6.0 features, including the DEVICE_TYPE clause for TARGET construct and mandatory offloading support. Also, fixed the OpenMP loop rotation issue. Checkout Advanced OpenMP* Device Offload with Intel® Compilers for more details. 
  • Compiler Reports: Enhanced opt-report for better user experience, now providing detailed information on OpenMP offloading and integrating with the open-source optimization remark framework. Details on recent enhancements can be found at Develop Highly Optimized Applications Faster with Compiler Optimization Reports
  • Sanitizers for Device Code: Device code now supports LLVM sanitizers to help detect and resolve issues during development. It includes a compiler instrumentation module and runtime support, allowing it to detect issues such as out-of-bounds memory access on USM, SYCL buffers, local memory, and device globals, as well as bad-free, use-after-free, bad context, and more. In this release, PVC GPUs and CPUs are supported on Linux OS. More details on how and when to use sanitizers can be found at Find Bugs Quickly Using Sanitizers with the Intel® oneAPI DPC++/C++ Compiler.
  • Comprehensive Performance Insights: Upgraded optimization reports now cover SYCL, OpenMP, and AOT compilation, offering developers deeper insights into application performance.
  • Hardware Profile Guided Optimization (HWPGO): Key improvements include enhanced profile propagation for better accuracy, additional profile-driven optimizations to further boost performance, and early support for "pseudo probes" on Windows as an alternative to DWARF for profiling. Additionally, HWPGO has introduced selective function outlining, allowing for specific functions to be optimized based on profiling data, further enhancing runtime efficiency.​

New Features

SYCL Compiler:

  • SYCL Offload Model: Introduced a new SYCL offload driver mechanism with --offload-new-driver to improve infrastructure for better link times by reducing I/O and external processes.
  • Range Rounding Control: Added -fsycl-range-rounding option for managing range rounding, including forcing full rounding to reduce binary size. Additionally, the experimental -fsycl-exp-range-rounding option performs rounding across all dimensions.
  • Double Type Emulation: Added -fsycl-fp64-conv-emu option for partial emulation of double data types on Intel GPUs.
  • Dynamic Linking: Initial support added for dynamic linking, though some features like kernel_bundle API and AOT mode are not yet supported.

SYCL Library:

  • Extensions: Implemented multiple extensions, including sycl_ext_oneapi_prod, sycl_ext_oneapi_profiling_tag, sycl_ext_oneapi_forward_progress, sycl_ext_oneapi_private_alloca, syclext_codeplay_enqueue_native_command, and sycl_ext_oneapi_enqueue_functions.
  • Group Load/Store: Added support for sycl_ext_oneapi_group_load_store, enabling native hardware block read/write capabilities where applicable.
  • Free Function Kernels: Initial support for sycl_ext_oneapi_free_function_kernels extension, with known limitations around argument types and diagnostics.
  • Fused Multiply-Add (FMA): Added experimental ESIMD function fma which results in a guaranteed fused multiply-add operation performed.
  • Improvedsycl_ext_oneapi_group_sortextension: Updated implementation of sycl_ext_oneapi_group_sort extension to match revision 2 of the specification. Previous version 1 is not available anymore and some code changes may be required.

Improvements

SYCL Compiler

  • Improved Compilation Flow: The process of generating integration footers has been optimized when no third-party host compiler is used, resulting in fewer temporary files and faster compilation times.

  • Additional Math Function Support: New support for math functions like truncf, sinpif, rsqrtf, exp10f, ceilf, copysignf, cospif, fmaxf, and fminf in SYCL kernels has been added as part of the C-CXX-StandardLibrary extension. More Intel Math Functions (IMF), ::rand and ::srand in device code on Intel devices, have also been integrated.

  • Enhanced Error Messaging: Error messages have been improved for scenarios involving implicit this capture in kernels and missing architecture information when multiple targets are passed into the -fsycl-targets flag.

  • Optimized Compilation Flow: The number of commands needed for generating dependencies using the -MD flag has been reduced, streamlining the build process.

  • Security and Debugging: Security-related compilation flags for libraries and tools have been strengthened, and the debugging experience has been improved for both Linux and Windows environments.

SYCL Library

  • Support for ESIMD functions: Added support for sqrt and rsqrt functions for double data types in ESIMD.
  • Cubemap and Sampled Image Arrays Support: Updated sycl_ext_oneapi_bindless_images extension to support cubemap images and sampled image arrays.
  • Named Barrier Allocation in ESIMD: Introduced ESIMD API for dynamic allocation of named barriers.
  • Executable Command Graph Update: Added support for whole graph updates using executable_command_graph::update.
  • Deprecation Warning: A warning has been added for the use of the deprecated <CL/sycl.hpp> header.
  • Accessor Improvements: local_accessor::get_pointer and local_accessor::get_multi_ptr now throw an invalid exception if called on the host.
  • Queue Operations Detection: Extended detection of nested queue operations to support shortcut methods.
  • Simplified ESIMD API Interface: Added overloads of various ESIMD APIs (e.g., atomic_update, block_load, block_store) allowing omission of some template arguments.
  • Bfloat16 Math Functions: Updated sycl_ext_oneapi_bfloat16_math_functions to support vectors of bfloat16 passed to math functions.
  • Optimized sycl::vec::as: Improved the performance of sycl::vec::as by optimizing the implementation of sycl::detail::memcpy.
  • SYCL 2020 Exception Updates: Updated the implementation to throw SYCL 2020 exceptions instead of legacy SYCL 1.2.1 exceptions across the board.
  • sycl::vec::convert Support: Added support for sycl::vec::convert to and from vec<bfloat16, N>.
  • Deprecations: marray<bool, n>::operator++/-- and accessor::get_multi_ptr for non-device accessors have been deprecated.
  • ESIMD Named Barriers: Moved ESIMD named barrier APIs out of the experimental namespace.
  • SYCL Extensions and API Enhancements:
    • Implemented the latest revision of sycl_ext_oneapi_free_function_queries.
    • Extended sycl-ls --verbose to print detailed device information, including UUIDs and architecture.
    • Introduced support for compile-time properties in copy_to and copy_from ESIMD APIs.
  • Non-Variadic printf Interface: Switched experimental::printf to a non-variadic interface to improve usability when printing float values.
  • Enhanced ESIMD API Validation: Improved validation for rdregion and wrregion APIs using static assertions on template arguments.
  • SYCL 2020 Specification Alignment: Updated mutating swizzle operators and scalar conversions for vec to align with the SYCL 2020 specification.
  • Miscellaneous ESIMD Improvements:
    • Added support for 1- and 2-byte data types to ESIMD prefetch APIs.
    • Enabled ext_intel_matrix support for Intel GNR devices.
    • Introduced new overloads of load_2d, store_2d, and prefetch_2d ESIMD APIs with compile-time properties.
    • Added support for group shift algorithms (e.g., shift_group_left, permute_group_by_xor) for non-uniform groups.
    • Lifted restrictions on the ESIMD block_store API and enhanced the slm_atomic_update API to support fsub and fadd.
  • Graph and Semaphore Support:
    • Added support for graph update functionality and external semaphore wait/signal operations with values in the bindless images extension.
    • Introduced device-to-device copying of image_device_handle.
  • Unified Runtime: Removed the Plugin Interface, replacing it with the Unified Runtime, which reduces the number and size of redistributable libraries.
  • Performance Improvements: Reduced startup overhead of libsycl.so by outlining the SYCL JIT compiler into a standalone library, dynamically loaded on first use.

Bug Fixes

SYCL Compiler

  • Fixed a bug where using the -fsycl-link-targets flag would inadvertently trigger additional device code linking steps.
  • Resolved an issue where AOT-compiling for Intel GPUs would pass PVC-specific flags even if the target device was not a PVC.
  • Fixed a bug with incorrect file extensions being emitted in AOT compilation when using --save-temps.
  • Fixed an issue where performing separate compilation and linking with -fsycl-link resulted in a "number of output files and targets should match in unbundling mode" error during the link step.
  • Resolved an issue where passing pointers in the generic address space to certain built-in math functions could cause compilation failure.
  • Fixed a bug where compiling kernels with different reqd_work_group_size attributes using -fsycl-device-code-split=none could result in a runtime exception about mismatching work-group sizes.
  • Resolved a bug where using the reqd_work_group_size attribute with fewer than three arguments caused a crash.
  • Addressed issues with shift_group_[right|left], permute_by_xor, and select_from_group algorithms returning invalid values when used with the half data type.

SYCL Library

  • Fixed a situation where querying sycl::ext::oneapi::experimental::info::device could result in an exception instead of returning an empty vector.
  • Corrected the esimd::atan implementation under the -ffast-math flag.
  • Fixed an issue where component devices were not correctly identified as descendants of composite devices when creating a queue.
  • Addressed an issue where querying for composite devices could return duplicate entries.
  • Fixed bugs in the copy-constructor of the config_2d_mem_access ESIMD class, which led to compilation errors.
  • Resolved an issue where the use of atomic_ref<T*> was not detected as using the atomic64 aspect, leading to errors.
  • Fixed bugs with ctanh and cexp returning incorrect values in edge cases.
  • Fixed an issue where values passed to the -Xs option via build_options were not passed down to the device compiler.
  • Fixed a compilation error when defining kernels as named functors while using -fno-sycl-unnamed-lambda.
  • Corrected compilation issues with the -fpreview-breaking-changes flag caused by conflicts with macros in windows.h.
  • Resolved strict aliasing violations in the implementation of sycl::vec<sycl::half, N>::operator[] that caused errors.
  • Fixed bugs where barriers submitted to a command queue with host tasks ignored them, and improved synchronization of host tasks with barriers.
  • Fixed issues where the compiler could emit unsupported SPIR-V instructions for bit-reversal.
  • Addressed a bug where default-constructed local_accessor arguments could cause runtime errors, especially on Windows and under -O0 optimization on Linux.
  • Resolved a hang when invalid values were passed to the ONEAPI_DEVICE_SELECTOR.
  • Fixed issues with persistent cache functionality where certain setups would prevent necessary directories from being created.
  • Corrected a bug where querying a kernel by name from a kernel bundle could crash the program.
  • Fixed an error handling bug where non-blocking pipe operations would mistakenly throw exceptions.
  • Addressed compilation issues when using non-uniform group built-ins with marray and vec.
  • Resolved a bug where memory attributes applied to a struct used as a type of a device_global variable were ignored.
  • Added missing value_type and vector_t member type aliases to swizzles.
  • Fixed shutdown sequence issues when SYCL RT was used in applications or libraries with custom shutdown processes.
  • Resolved a crash when calling event::get_backend() on a default-constructed event in environments with malformed ONEAPI_DEVICE_SELECTOR.
  • Fixed a bug where sycl-ls with --ignore-device-selectors would not properly ignore the environment variable.
  • Corrected memory order capabilities returned by the Native CPU backend.
  • Fixed the variadic constructor of sycl::ext::oneapi::experimental::properties to match the extension specification.
  • Fixed build program failures when using ESIMD functions like load_2d, store_2d, or prefetch_2d.
  • Resolved a bug where querying free device memory on integrated Intel GPUs returned 0 instead of throwing an exception for unsupported features.
  • Addressed a heap buffer overflow in the sycl_ext_oneapi_kernel_compiler_opencl extension implementation.
  • Corrected a bug where the sycl_ext_oneapi_graph extension ignored the access mode of accessors, creating unnecessary graph edges.
  • Fixed issues where graph submissions involving barriers could result in runtime errors or cause resource leaks.
  • Addressed performance regressions when kernels without dependencies were submitted to in-order queues.
  • Fixed profiling issues in Level Zero backend where timestamps could be zeros or incorrect for in-order queues.
  • Resolved crashes when using multiple queues with immediate command list properties--immediate_command_list and no_immediate_command_list..
  • Fixed a bug where info::kernel_device_specific::work_group_size would return the device-specific limit, ignoring the kernel on the Level Zero backend.

Misc

SYCL Compiler

  • Reverted changes previously made on Windows to support a separate compilation scenario where the compilation step was performed without the -fsycl flag, but the link step included the -fsycl flag. This scenario is now considered unsupported, as the compiler does not know which version of the standard library to link during the link step.

API/ABI Breaking Changes in 2025.0

This release is an ABI-breaking release, meaning that any applications built with older versions of the toolchain must be recompiled to run with newer versions of the SYCL runtime library.

  • Bumped the major version of the SYCL runtime library to 8.
  • Cleaned up the list of symbols exported from the SYCL runtime library by dropping some legacy symbols and hiding others that should not have been exported.
  • Updated the ABI of several functions and methods to avoid using std::string and other objects in the library interface, allowing SYCL RT to be used in applications built with pre-C++11 ABI.
  • Changed the ext_oneapi_copy API from the experimental sycl_ext_oneapi_bindless_images extension to accept const-qualified types for the Src parameter.

Several API breaking changes were made, including dropping support for previously deprecated APIs and switching implementations of some classes to a preview implementation. Code modification recommendations for some of these breaking changes can be found here.

  • Removed the sycl::abs overload taking a floating-point argument.
  • Removed sycl::host_ptr and sycl::device_ptr.
  • Removed queue::discard_or_return.
  • Removed sycl::make_unique_ptr.
  • Removed the use_primary_context property and methods related to the previously removed host device.
  • Removed SYCL 1.2.1 exception subclasses, including runtime_error, nd_range_error, invalid_parameter_error, device_error, and feature_not_supported.
  • Removed queue::mem_advice overload accepting pi_mem_advice.
  • Removed several deprecated ESIMD APIs.
  • Removed the non-standard sycl::id -> sycl::range conversion operator.
  • Removed deprecated APIs from the sycl_ext_oneapi_bindless_images extension implementation.
  • Renamed the experimental destroy_external_semaphore API from the sycl_ext_oneapi_bindless_images extension to release_external_semaphore.
  • Replaced the image_channel_order field of the image_descriptor struct with the number of channels in the experimental sycl_ext_oneapi_bindless_images extension.
  • Enforced restrictions on the first argument of lambdas/functors passed to parallel_for(range) and parallel_for(nd_range).
  • Switched the sycl::vec implementation to its preview version, which uses a different storage type to fix several strict aliasing rule violations.
  • Restricted math operations available to vec<std::byte, N> to those applicable to std::byte.
  • Switched the sycl::exception implementation to its preview version.
  • Switched math built-ins implementation to use their preview version.
  • Switched bfloat16 implementation to use its preview version.
  • Switched sycl::nd_item implementation to use its preview version.
  • Enforced a restriction that a buffer's element type must be device copyable.
  • Restructured SYCL headers to exclude <cmath> and <complex>.
  • Dropped support for the SYCL_DEVICE_FILTER environment variable.
  • Updated the accessor::get_pointer interface to return global_ptr<value_type>, which can be const-qualified if the accessor data type is const-qualified or if the accessor is read-only.
  • Removed deprecated APIs related to sycl_ext_oneapi_free_function_queries.
  • Moved slm_allocator ESIMD APIs into the experimental namespace.
  • Removed the deprecated usm_system_allocator aspect.
  • Removed get_child_group API from the experimental sycl_ext_oneapi_root_group extension.
  • Simplified template arguments related to simd_view of many ESIMD APIs.
  • Removed ESIMD atomic_op::predec.
  • Dropped interfaces from revision 1 of the experimental sycl_ext_oneapi_group_sort extension.
  • Changed the return type of command_graph::begin_recording and command_graph::end_recording from void to bool in the experimental sycl_ext_oneapi_graph extension.

Breaking changes were also made to compiler flags:

  • Removed the deprecated -fsycl-link-huge-device-code, -fsycl-[add|link]-targets , -foffload-static-lib , -foffload-whole-static-lib , -fsycl-disable-range-rounding , -sycl-std flags.

 SYCL Known Issues

  • On Windows, the Unified Runtime's Level Zero leak check does not work correctly with the default contexts. This is due to the release of the plugin DLLs racing against the release of static global variables, such as the default context.
  • The Intel Graphic Compiler's Vector Compute backend does not support certain optimization levels and often produces incorrect results or crashes. This issue directly affects ESIMD code. As a temporary workaround, optimize ESIMD code even in the affected mode.
  • When using the sycl_ext_oneapi_matrix extension, it is important for some devices to use the appropriate settings corresponding to the device that will run the program, particularly for matrix operations using half data type.
  • When using queue shortcut functions with in-order queues, dependencies between commands submitted to different queues may be ignored. A workaround is to explicitly call .wait(). This issue will be fixed in the next release. In below example, the seocnd kernel will start execution before the first completes its execution. 
// q1 long running task sycl::event e = q1.single_task([=](){ /* ... */ }); // q2 task q2.single_task(e, [=](){ /* ... */ });
  • C/C++ math built-ins can return incorrect results for some edge-case inputs when called from SYCL kernels.
  • To enhance performance on Intel® GPUs using the Unified Runtime Level Zero Adapter, support for driver-optimized in-order lists has been introduced in version 2025.0. However, when running workloads with sycl::property::queue::enable_profiling, some performance overhead from these lists is expected. If this overhead negatively impacts performance, it can be mitigated by disabling the driver in-order lists. To do so, set UR_L0_USE_DRIVER_INORDER_LISTS=0.
  • To ensure compatibility with the Intel® oneAPI DPC++ Compiler on Windows*, which requires OpenCL 3.0, it is essential to address potential issues caused by older versions of opencl.dll on your system. If an outdated opencl.dll is present in system directories or takes precedence in the library path, it may lead to failures, including SYCL-related issues and crashes in tools like Intel® VTune™ and Intel® Advisor when specific OpenCL 3.0 features are used. The recommended solution is to replace the old opencl.dll with the one installed in the DPC++ package. You can do this by copying the newer opencl.dll from $oneAPI_Install_Folder\compiler\latest\bin to your system folder. Be sure to back up the original opencl.dll in case it's needed for other applications.

  • sycl_ext_oneapi_free_function_kernels has limitations including:
    • free function kernels are only supported if defined at file scope
    • SYCL_EXTERNAL has to be used alongside SYCL_EXT_ONEAPI_FUNCTION_PROPERTY to define free function kernel
    • compiler won't emit any diagnostics if some restrictions from the extension specification are violated
    • arguments of a free function kernels cannot be composite data types like structs or SYCL classes like accessor
    • using -fsycl-dead-args-optimization (ON by default) can lead to failures
    • info::kernel::num_args won't return the right result for free function kernels

New OpenMP Features

  • Support for the -fopenmp-offload-mandatory compiler flag to omit creation of host-fallback code and emit a runtime error if OpenMP offload to the device fails.
  • Improved optimization report support for OpenMP constructs.
  • Enhanced conversion scheme of nested loop constructs to consider loop trip counts.
  • Updates to the declare variant for a dispatch construct to include GPUs with the Xe2 architecture when the match clause specifies device={arch(gen)}.
  • Support for the device_type(host|nohost|any) clause for the target construct.
  • Inclusion of the if clause for the teams construct.
  • Change of the map-type property to "default," allowing map-type modifiers to be specified without a map-type. For example, map(always : x) is equivalent to map(always, tofrom : x).
  • Support for the Intel extension ompx_sub_group_size clause for the target construct to set the SIMD width of the kernel.
  • Support for the Intel extension ompx_dyn_cgroup_mem clause for the target construct, allowing dynamic allocation in SLM for GPU offloading.
  • Extension of environment variables OMP_THREAD_LIMIT, OMP_TEAMS_THREAD_LIMIT, and OMP_NUM_THREADS to support abstract names. For example, OMP_THREAD_LIMIT=n_cores.
  • Extension of the syntax of the environment variable OMP_PLACES to support bound and stride for abstract names. For example, OMP_PLACES=threads(4:2).
  • Host runtime support for the environment variable OMP_AVAILABLE_DEVICES.
  • Extension of the environment variable OMP_DEFAULT_DEVICE to support device selection by traits.

Notable OpenMP Fixes

  • Fixed a bug where the dispatch construct’s device clause was not updating OpenMP’s default-device-var ICV.
  • Resolved an internal compiler error when the declare variant for a dispatch construct did not specify an adjust_args clause.
  • Fixed an optimization bug in OpenMP for and simd loops with large trip counts.
  • Corrected a regression where enclosing task constructs inside a teams construct triggered a compiler error message.
  • When thread_limit is specified for both target and teams, the compiler now correctly chooses their minimum instead of always using the one specified for target.
  • Fixed an internal compiler error related to the initialization of global variables allocated in GPU’s SLM.
  • Addressed a problem in offload runtime where the reference counts of variables mapped using declare mapper were not decremented correctly.
  • Fixed a GPU offload performance issue related to L1 cache being affected by temporary copies of reduction variables.
  • Resolved a bug where user-defined reduction variables were not properly constructed or destructed.

OpenMP Known Issues

  • Implicit barriers at the end of parallel regions do not act as synchronization points for the tasks associated with target nowait and dispatch nowait constructs. This may result in incorrect results or crashes. A workaround is to use #pragma omp taskwait at the end of parallel region to ensure synchronization of target/dispatch nowait regions, where it would otherwise have happened due to the presence of a parallel region’s implicit barrier.

Other Known Issues and Limitations

  • Visual Studio IDE Integration: Users will encounter an error while building the C++ project using 'Intel C++ Compiler 2025' for Win32 platform. Please note that Win32 platform is not supported with 'Intel C++ Compiler 2025' and project should be compiled for x64 platform only. If Win32 platform is selected, an error will be raised that ICX compiler not found.

Hardware Support:

  • -march=lunarlake
  • -march=graniterapids

Please check here for details about -march usage. 

Toolchain Support to Intel Platforms

Granite Rapids Granite Rapids-D Lunar Lake
GCC13.1 GCC13.1 GCC14.1
Binutils 2.40 Binutils 2.41 Binutils 2.42
Glibc2.37 Glibc2.37 Glibc2.39
LLVM 16.0 LLVM 17.0 LLVM 18.0
ICX 2023.1 ICX 2023.2 ICX 2024.0

C/C++ Standard

  • Intel® oneAPI DPC++/C++ Compiler version 2025.0 supports the C/C++ standards through the Clang 19 front end. 
  • Initiated support for C++2c, the next release of C++ after C++23, and C2y, the next release of C after C23
  • Finalized the implementation of “deducing this” (C++23)
  • Relaxed some constexpr restrictions (C++23)
  • Implemented the [[assume]] attribute (C++23)
  • Completed support for Concepts (C++20)
  • Added support for char8_t (C23)
  • Implemented the constexpr keyword for object declarations (C23)
  • Implemented #embed for embedding binary resources in source (C23)

System Requirements

Additional Documentation

Previous oneAPI Releases

Notices and Disclaimers

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from a course of performance, course of dealing, or usage in trade.