Intel® oneAPI Math Kernel Library (oneMKL) Release Notes

ID 765830
Updated 11/20/2024
Version 2025.0.1
Public

Where to Find the Release

Intel® oneAPI Math Kernel Library

2025.0.1

System Requirements

This is a bugfix release.  

Fixed Issues 

  • Fixed issues in some BLAS functions on AMD hardware in Windows*.
  • Fixed accumulated execution time in subsequent OpenMP* offload calls.
  • The runme scripts for the Intel® Optimized LINPACK* Benchmark and Intel® Distribution for LINPACK* Benchmark have been replaced with documentation to resolve security issues. 

Optimization

  • Improved OpenMP* offload performance in LAPACK.

2025.0

System Requirements  Bug Fix Log

CET Support

CET support has been enabled for all oneMKL domains. For more information about CET, please refer to the following article, A Technical Look at Intel’s Control-flow Enforcement Technology

New Features and Optimizations

  • BLAS

    • Features
      • New out-of-place TRMM and TRSM variants are available for C and Fortran, including support for OpenMP* offload to Intel® GPUs. 
    • Optimizations 
      • Improved cblas_gemm_s8u8s32 and cblas_gemm_bf16bf16f32 performance for large problem size on Intel® AMX (Advanced Matrix Extensions) architecture. 
  • Sparse BLAS 

    • Features
      • New API for computing addition of two sparse matrices, oneapi::mkl::sparse::omatadd, is available for SYCL with support for CSR sparse matrix format. 
      • New out-of-place sparse matrix conversion API, oneapi::mkl::sparse::omatconvert, is available for SYCL with support for conversion between the CSR and COO sparse matrix formats. 
    • Optimizations 
      • Improved performance for BSR/CSR oneMKL Inspector-Executor Sparse BLAS C/Fortran APIs, mkl_sparse_?_mv and mkl_sparse_?_mm with AVX512 ISA. 
  • LAPACK 

    • Features 
      • Enabled C/Fortran OpenMP* offload support for least squares solver (?gels).
      • Enabled Fortran OpenMP* offload support for batched group LU factorization (?getrf_batch).
      • Added support for mkl_progress in Relatively Robust Representations eigensolver (?syevr).
      • Introduced support for least squares using QR/LQ with T matrix (?gelst) to LAPACK95 interfaces.
      • Updated LAPACK SYCL* USM APIs to be const correct for input arrays.
    • Optimizations 
      • Improved performance for double real/complex precision expert eigensolver (lapack::syevx, lapack::heevx) and generalized eigensolver (lapack::sygvx, lapack::hegvx) USM APIs on Intel® Data Center GPU Max Series as well as for C and Fortran OpenMP* offloading (dsyevx, zheevx, dsygvx, zhegvx).
      • Improved performance for batched group LU factorization (lapack::getrf_batch) on Intel® Data Center GPU Max Series, for multiple groups when all matrix sizes are <= 96.
      • Improved performance for batched group LU solve (lapack::getrs_batch) on Intel® GPUs, for multiple groups when all matrix sizes are <=32.
      • Improved performance of eigensolver (?syev, ?heev) on CPU for very large matrices (n>100K).
  • DFT  

    • Features 
      • Introduced new type-safe SYCL* DFT APIs.
  • Vector Math 

    • Features 
      • Improved the oneMKL VM exception reporting mechanism, for certain functions which were not raising the overflow exceptions (e.g., vAdd, vSub, vMul). 
    • Optimizations 
      • Improved performance for 6 functions on Intel® Xeon® 6 Processors with Efficient-Cores (vdAcosh_EP, vsExpm1_HA, vsHypot_LA,  vdRound_LA, vdRound_EP, vdRound_HA) by adjusting the unrolling factors. 
  • Vector Statistics 

    • Features 
      • Introduced Beta and Gamma distributions support in RNG Device API.
      • Introduced uint64_t type support for uniform distribution in RNG Device API.
      • Introduced (u)int8/int16 types support for Bernoulli distribution in RNG Device API. 
  • Sparse Solvers 

    • Features 
      • Improved iterative refinement feature of oneMKL PARDISO and added optional printing of iterative refinement information.
    • Optimizations 
      • Improved performance of oneMKL PARDISO phase 1.
  • Library Engineering:  

    • Features 
      • A new macro __INTEL_MKL_PATCH__ and a new field PatchVersion in MKLVersion structure are introduced for oneMKL patch version. In addition, the existing macro INTEL_MKL_VERSION now follows a new format (__INTEL_MKL__ * 100 + __INTEL_MKL_UPDATE__) * 100 + __INTEL_MKL_PATCH__, which implies that INTEL_MKL_VERSION will be 20250100 in oneMKL 2025.1. 

Known Issues and Limitations  

  • oneMKL SYCL* functionality requires OpenCL* runtime in case of Level Zero SYCL* backend on Intel GPUs. 
  • BLAS gemm_batch_span may fail with complex double precision data on the Graphics for Intel® Core™ Ultra 200S series processor. 
  • BLAS gemm may produce wrong results for small matrices on the Graphics of Intel® Core™ Ultra Processors Series 2 if the beginning of the matrix data is not aligned to a 64-bit boundary. 
  • OpenMP offload of some BLAS functions may hang or crash when using the OpenCL* backend on Intel® Arc™ A-Series Graphics. It is recommended to use the Level Zero backend in this case. 
  • Performance regressions may be observed with this release compared to 2024.1 or older releases on AVX2 and older CPU architectures in C/Fortran Inspector-Executor Sparse BLAS oneMKL routines. 
  • oneMKL DFT SYCL* APIs using SYCL* buffer for data input do not support SYCL* sub-buffer inputs for a range of large power of two sizes [2²¹,2²⁶] 1D complex FFT. 
  • oneMKL FFT with a large prime factor (larger than 1024) may fail on Intel® Data Center GPU Max Series. 
  • Negative strides and distances are not supported with the oneMKL DFT SYCL* APIs. 
  • Some BLAS and FFT problems may crash on Intel® Arc™ B-Series Graphics with Linux when using the SYCL* buffer APIs. Use the SYCL* USM API instead on these platforms.
  • oneMKL FFT may crash on the Graphics for Intel® Core™ Ultra Processors Series 1 with Windows* when using the C or Fortran OpenMP* offload APIs with the OpenCL* runtime. Use the Level Zero runtime instead on these platforms.  
  • Inspector-Executor Sparse BLAS API mkl_sparse_?_mm() may give incorrect results when used with BSR format and column major blocks with block_size >= 6. 
  • Some C/Fortran OpenMP* offload examples are known to fail with oneMKL on Intel® Arc™ B-Series Graphics under Windows* when run in Debug mode due to a driver issue. Please use Release mode for this functionality on Intel® Arc™ B-Series Graphics under Windows*. 

Deprecation 

  • oneMKL support for Cloudera Distribution Channel has been deprecated since 2025.0 and will be removed starting from 2026.0. 
  • The INPUT_STRIDES and OUTPUT_STRIDES configuration parameters have been deprecated for the oneMKL SYCL* DFT APIs since the 2024.1 release, and will be removed in the oneMKL 2026.0 release. Please use the FWD_STRIDES and BWD_STRIDES configuration parameters instead.  
  • The variadic set_value and get_value member function of the oneapi::mkl::dft::descriptor class has been deprecated and will be removed in the oneMKL 2026.0 release. Use the new non-variadic functions instead. 
  • The oneapi/mkl/dfti.hpp header file has been deprecated and will be removed in the oneMKL 2026.0 release. Use the newly introduced oneapi/mkl/dft.hpp header file instead. 
  • oneapi::mkl::dft::config_param::VERSION has been deprecated and will be removed in the oneMKL 2026.0 release. Use MKL_Get_Version_String function instead. 
  • The CONJUGATE_EVEN_STORAGE and PACKED_FORMAT values have been deprecated from the oneapi::mkl::dft::config_param enum class and the COMPLEX_REAL, CCE_FORMAT, PERM_FORMAT, PACK_FORMAT and CCS_FORMAT have been deprecated from the oneapi::mkl::dft::config_value enum class. They will be removed in the oneMKL 2026.0 release. 

Removal 

  • Support for the target variant dispatch construct in the Intel Extensions to OpenMP* has been removed. Users should use OpenMP* specification syntax dispatch. 
  • The NUMBER_OF_USER_THREADS, TRANSPOSE, ORDERING and REAL_STORAGE values have been removed from the oneapi::mkl::dft::config_param enum class. The corresponding DFTI_ORDERED, DFTI_BACKWARD_SCRAMBLED, and DFTI_NONE have been removed from the set of possible configuration values for the SYCL* DFT APIs. 
  • The variants of oneapi::mkl::sparse::set_csr_data() and oneapi::mkl::sparse::release_matrix_handle() without a sycl::queue as an argument which were deprecated in the 2023.0 release have been removed in the 2025.0 release from Sparse BLAS. 
  • The undocumented LAPACK routines {S,D}COMBSSQ, which were removed from Netlib LAPACK 3.10.1 and deprecated in the oneMKL 2023.0 release, have been removed in the oneMKL 2025.0 release. 
  • Previously deprecated std::vector based constructors have been removed for gaussian_mv, multinomial and poisson_v Host API random number distributions. 

Previous oneAPI Releases

2024

Release Notes, System Requirements and Bug Fix Log

2023

Release Notes, System Requirements and Bug Fix Log

2022

Release Notes, System Requirements and Bug Fix Log

2021

Release Notes, System Requirements and Bug Fix Log

2017-2020

Release Notes, System Requirements and Bug Fix Log

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.