Intel® oneAPI Base Toolkit Release Notes

Sravani Konda

Intel® oneAPI Base Toolkit supports direct programming and API programming, and delivers a unified language and libraries that offer full native code support across a range of hardware including Intel® and compatible processors, Intel® Processor Graphics Gen9, Gen11, Intel® Iris® Xe MAX graphics, and Intel® Arria® 10 or Intel® Stratix® 10 SX FPGAs. It offers direct programming model as well as API-based programming model, and it also contains analysis & debug tools for development and performance tuning.

Major Features Supported

Please visit Intel® oneAPI Toolkit and Component Versioning Schema for semantic versioning schema detail.

New in 2021.4 Product Release

Key Features at toolkit level

The Intel® oneAPI DPC++/C++ Compiler and Intel® oneAPI DPC++ Library improved SYCL* 2020 feature set and conformance to improve programming productivity on various hardware accelerators.
oneMKL adds GPU support through DPC++ and OpenMP offload APIs to Random Number Generators (RNG) Multinomial, PoissonV, Hypergeometric, Negative  Binomial and  Binomial distribution.
Flame Graph for Hotspot Analysis in Intel® VTune™ Profiler allows visualization of hot code paths and time spent in each function and its callees. Intel® Advisor's GPU Roofline offers actionable recommendations to maximize GPU utilization for user code analysis.
FPGA Simulation Flow allows oneAPI designs to run on industry-standard RTL simulators.
Added new Diagnostics Utility for Intel® oneAPI Toolkits to diagnose the system status for using Intel® products. Learn more.

Intel® oneAPI DPC++/C++ Compiler 2021.4.0

Implemented performance improvements for Icelake
Implemented several new SYCL 2020 features including support for specialization constants, sub_group algorithms, USM features, and interoperability API among others.
Added numerous ExplicitSIMD improvements and features.
Added new OpenMP 5.0/5.1 support including #pragma omp prefetch support
Added support for FPGA simulation flow.

Intel® oneAPI DPC++ Library (oneDPL) 2021.5.0

Added new random number distributions: exponential_distribution, bernoulli_distribution, geometric_distribution, lognormal_distribution, weibull_distribution, cachy_distribution, extreme_value_distribution.
Added the serial-based versions of the following algorithms: all_of, any_of,none_of, count, count_if, for_each, find, find_if, find_if_not.
Improved performance of search and find_end algorithms on GPU devices.

Intel® DPC++ Compatibility Tool 2021.4.0

Enabled partial migration of CUB* API, and improved Thrust* API migration
Introduced three levels of helper header files customization (API, File, All) to make the header files more maintainable.
Enabled support for CUDA 11.4 header files.

Intel® oneAPI Math Kernel Library (oneMKL) 2021.4.0

Enabled oneMKL Random Number Generators (RNG) Multinomial, PoissonV, Hypergeometric, Negative  Binomial and  Binomial distribution on GPU (with DPC++ and OpenMP* offload APIs)
Added new DPC++ API for sparse-sparse matrix multiply (sparse::matmat) and column-major layout in DPC++ sparse:gemm and C/C++ openMP offload for Sparse BLAS Inspector Executed (IE) trsv
GPU-based implementations for single and double precision complex exp, log, sqrt are available now

Intel® oneAPI Threading Building Blocks (oneTBB) 2021.4.0

Added the collaborative_call_once algorithm to help boost the performance of application initialization routines by allowing them to be executed in parallel.
Enabled full support of Address Sanitizer and Thread Sanitizer which allows developers to use Sanitizers in their applications.

Intel® Distribution for GDB* 2021.4.0

Usability improvements and fixes in the Visual Studio integration
Performance improvements when debugging large applications on GPU
Improve variable length strings debugging for Fortran

Intel® Integrated Performance Primitives (Intel IPP) 2021.4.0

Enabled 16-bit float Discrete Fourier transform (DFT) and Fast Fourier transform (FFT) functions within Intel® IPP optimized for next generation Intel® Xeon® Scalable Processor, code named 'Sapphire Rapids' focused on various lengths supported in 5G environment.
Enabled Intel® IPP Cryptography Multi-buffer(MB) functions for 3rd Generation Intel® Xeon® Processor Scalable and 10th Gen Intel® Core™ Processors for the following:
- SM4 block cipher standard modes (OFB, CFB, ECB, CBC, CTR)
- ECDSA Ed25519 Verify API
- ECDH SM2 and ECDSA SM2 Sign universal and SSL APIs
- 1Kb, 2Kb, 3Kb and 4Kb modular exponentiations
Extended Intel® IPP Image and Signal Processing functions for 3rd Generation Intel® Xeon® Processor Scalable and 10th Gen Intel® Core™ Processors for the following:
- Multi-rate FIR filtering
- Resize Supersampling (one-channel, floating-point)
- Resize Antialiasing (one-channel, double-precision floating-point)

Intel® oneAPI Collective Communications Library (oneCCL) 2021.4.0

Optimized allreduce, broadcast and reduce for 2 ranks on ATS 2T.
Added support for memory binding of worker threads.
Added support for NIC filtering by name for OFI-based multi-NIC.
Added IPv6 support for KVS.
Bug fixes.

Intel® oneAPI Data Analytics Library (oneDAL) 2021.4.0

Intel® Extension for Scikit-learn*: Enabled Global patching of full Scikit-learn application
Intel® Extension for Scikit-learn*: Integration with dpctl for heterogeneous computing (support of dpctl.tensor.usm_ndarray for input and output)
Improved accuracy and performance of Random Forest, Nu-SVMs, SVR and KMeans

Intel® oneAPI Deep Neural Networks Library (oneDNN) 2021.4.0

Introduced initial performance optimizations for DG2 and ATS-M
Improved primitive cache performance on GPUs
Reduced library size by 7-29% depending on platform and configuration. DLLs and shared objects size does not exceed 150 Mb.

Intel® oneAPI Video Processing Library (oneVPL) 2021.6.0

oneVPL 2021.6.0 has been updated to include functional and security updates. Users should update to the latest version
Updated dispatcher and CPU runtime to API 2.5
Internal memory support added to dpcpp-blur sample
Added option to build dispatcher as a static library
Added ability to build dispatcher under MinGW
HEVC 4:2:2 decode support added to CPU runtime

Intel® Distribution for Python* 2021.4.0

Numba-dppy works as an extension to off-the-shelf Numba 0.54.0.
Pandas.MultiIndex support added in SDC.
Numba-dppy’s @dppy.kernel now support __sycl_usm_array_interface through dpctl’s usm_ndarray.

Intel® Advisor 2021.4.0

Get actionable Recommendations to maximize GPU utilization for user code analysis with GPU Roofline.
Estimate performance benefits of a future GPU for existing GPU code with Offload Advisor GPU-to-GPU performance modeling with both command line and GUI.
Execute profiling and performance projection with a single command line via Offload Modeling Collection Preset, and analyze results of GPU Roofline profiling and offload modeling in a browser with interactive HTML report.

Intel® VTune™ Profiler 2021.7.1

Gain insights into system configuration, performance and behavior with Platform Profiler - now fully integrated into Intel® VTune™ Profiler.
Analyze workloads at scale to identify outliers and pinpoint where they occurred using Intel® VTune™ Profiler's Application Performance Snapshot.
New Platform Diagram in the Memory Access Analysis: Visualize socket-to-DRAM and socket-to-socket memory bandwidth utilization on a system topology to understand NUMA related performance problems.
Occupancy Metrics in the GPU Compute/Media Hotspots Analysis: Identify and improve occupancy issues preventing your GPU offload code from efficient usage of available EU HW threads.
Expanded command line for perf tool: Profile Linux targets by easily generating command line parameters for the native perf tool for all analysis types, including custom analyses. Collect perf trace on a target with Linux Perf tool and import to Intel® VTune™ Profiler UI to analyze and visualize results for fast insights.
Flame Graph added for Hotspot Analysis visualizes hot code paths and time spent in each function and its callees.
The GPU Offload Analysis now presents a richer set of information about execution on the GPU, including data transfer data, to help identify inefficient code paths in your application.

Diagnostics Utility for Intel® oneAPI Toolkits

The Diagnostics Utility for Intel® oneAPI Toolkits is designed to diagnose the system status for using Intel® products. This is the first Preview release supported on Linux Ubuntu 20.04 LTS, SLES 15 SP2 and RHEL 8.2. With this utility, you can identify errors such as:

Permission errors for the current user
Missing driver or an incompatible version of a driver
Incompatible version or configuration of the Operating System

More details on usage can be found in the Diagnostics Utility for Intel® oneAPI Toolkits User Guide.

Intel® FPGA Add-On for oneAPI Base Toolkit 2021.4.0 (Optional)

Added support for the Intel® custom platforms with Intel® Quartus® Prime software version 21.2.
Added support for Questa*-Intel® FPGA Edition and Questa*-Intel® FPGA Starter Edition simulators.
Added support for FPGA simulation flow.

New in 2021.3 Product Release

Key Features at toolkit level

Support for the 3rd Gen Intel® Xeon® Scalable processors (code name Ice Lake Server).
New distributions available via Spack, a package manager for HPC, and via Anaconda Cloud distribution's anaconda-defaults channels.
New C APIs, preview C++ and Python APIs, and samples for the Intel oneAPI Video Processing Library
Installation option --continue-with-optional-error which was added by mistake in an earlier release is now removed.
Performance, stability, and security improvements

Key Features at component level

Intel® oneAPI DPC++/C++ Compiler 2021.3.0

Enabled support for SYCL 2020 features sycl::kernel_bundle and reduction(partial support).
Enabled support for DPC++ extension to allocation static local memory in SYCL kernels
DPC++ extension Explicit SIMD now supports
- Coexistence of ESIMD and regular SYCL kernels in the same source
- Indirect read and write methods in ESIMD class
Enabled support for OpenMP 5.0/5.1 user defined mapper (declare mapper) and OpenMP 5.1 dispatch variant support.
Now available via Spack, a package manager for HPC and via Anaconda Cloud distribution's anaconda-defaults channels.

Intel® oneAPI DPC++ Library (oneDPL) 2021.3.0 & 2021.4.0

Added the range-based versions of the following algorithms: adjacent_find, all_of, any_of, count, copy_if, count_if, equal, move, none_of, remove, remove_copy, remove_copy_if, remove_if, replace, replace_if, rotate_copy, reverse, reverse_copy, swap_ranges, unique, unique_copy.
Improved performance of discard_block_engine (including ranlux24, ranlux48, ranlux24_vec, ranlux48_vec predefined engines) and normal_distribution.

Intel® DPC++ Compatibility Tool 2021.3.0

Enabled support for CUDA 11.2 and 11.3 header files
More API migration coverage:
- More migration coverage of CUDA Driver API
- Partial migration of CUDA memory fence API
- More migration coverage for Thrust and cuRand
- Improved the migration of API, types, and macros to cover more scenarios
Reduced the time of the migration: 28%-49% improvement

Intel® oneAPI Math Kernel Library (oneMKL) 2021.3.0

New performance optimizations
Added GPU support for additional RNG algorithms
Introduced CMake config file support
Enabled DPC++ dynamic libraries support for all DPC++ enabled functionality on Windows*
Added debug versions of mkl_sycl and mkl_tbb_thread libraries on Windows*
Now available via Spack, a package manager for HPC.

Intel® oneAPI Threading Building Blocks (oneTBB) 2021.3.0

More C++20 support which allows now to enforce requirements on argument types to ensure that a developer uses those argument types correctly.
Preview of the following features:
- Extended the high-level task API to simplify migration from Intel(R) Threading Building Blocks (TBB) to Intel(r) oneAPI Threading Building Blocks (oneTBB).
- Added mutex and rw_mutex that are suitable for long critical sections and resistant to high contention.
- Added ability to customize the concurrent_hash_map mutex type.
- Added heterogeneous lookup, erase, and insert operations to concurrent_hash_map.

Intel® Distribution for GDB* 2021.3.0

Enhanced multi-GPU debug capabilities
Improved application debugging experience in MS Visual Studio* IDE.
Added a new sample to demonstrate application debug

Intel® Integrated Performance Primitives (Intel IPP) 2021.3.0

Image processing and decompression optimizations
Now available via Spack, a package manager for HPC and via Anaconda Cloud distribution's anaconda-defaults channels.

Intel® oneAPI Collective Communications Library (oneCCL) 2021.3.0

Added OFI-based multi-NIC support
Added OFI/psm3 provider support
Bug fixes

Intel® oneAPI Data Analytics Library (oneDAL) 2021.3.0

Graphic performance optimizations
Added support for Intel(R) Extension for Scikit-learn
Added new SVM regression and classification algorithms
Now available via Spack, a package manager for HPC.

Intel® oneAPI Deep Neural Networks Library (oneDNN) 2021.3.0

Introduced support for DPC++ debug configuration on Windows
Updated minimal supported CMake version to 2.8.12 (was 2.8.11)
Performance optimizations
Extended batch normalization and layer normalization primitives API to take separate scale and shift arguments.
Extended resampling primitive with post-ops support and mixed source and destination data types..
Now available via Spack, a package manager for HPC.

Intel® oneAPI Video Processing Library (oneVPL) 2021.3.0

C API implementation of oneVPL API 2.4
OpenVINO Interop Samples for Linux
C++ API and Samples (Preview)
Python API and Samples (Preview)
Samples demonstrating basic oneVPL, core API, and interop
Added logging capability to Dispatcher
Now available via Spack, a package manager for HPC.

Intel® Distribution for Python* 2021.3.0

Python 3.8 is now supported
Updated Numba to version 0.53
numba-dppy now supports native floating point atomics
Patches & security updates
Documentation updates

Intel® Advisor 2021.3.0

Get actionable recommendations to maximize GPU utilizations using Offload Advisor Guidance.
Improved GPU Roofline Analysis: Gain insights into memory bound codes to remove memory subsystems bottlenecks. Get instance breakdown of each kernel to compare performance characteristics of different workloads.
Technical Preview Feature: Estimate performance benefits of a future GPU for existing GPU code with Offload Advisor GPU-to-GPU performance modeling.

Intel® VTune™ Profiler 2021.5.0

Optimize GPU offload schema with improved data transfer analysis between CPU and GPU. Boost hottest compute kernels performance via automatic detection of reasons limiting the peak achievable GPU occupancy.
Analyze workloads at scale to identify outliers using Intel® VTune™ Profiler's Application Performance Snapshot.
Easily identify vectorization issues using enriched Performance Snapshot and HPC Performance Characterization

Intel® FPGA Add-On for oneAPI Base Toolkit 2021.3.0 (Optional)

Added support for Intel® custom platforms with Intel® Quartus® Prime software version 21.1.

New in 2021.2 Product Release

Key features at toolkit level

FPGA add-on component is integrated into Linux* online-installer
CMake support for performance libraries on Windows* and Linux

Key features in each component

Intel® oneAPI DPC++/C++ Compiler

More SYCL 2020 features implemented, including "device_has", aspects, math array, global work offset in kernel enqueue
Experimental Explicit SIMD (ESIMD) extension supports Level Zero runtime and available on Windows host
Fast math is enabled by default (i.e., -fp-model=fast)
FPGA support:
- Added support for targeting multiple homogeneous FPGA devices with the same or different device codes.
Intel C++ Compiler:
- Vectorization improvements

Intel® oneAPI DPC++ Library (oneDPL)

New experimental asynchronous interfaces to enable concurrent execution of STL-like algorithm and interoperability with SYCL event-based control flow, and support the "set" algorithms matching Thrust 1.11.0 and Boost/compute asynchronous algorithm.
New support on the followings:
- parallel, vector and DPC++ execution policies for those algorithms: shift_left, shift_right.
- Range-based version of those algorithms: sort, stable_sort, merge.
- a new macro ONEDPL_USE_PREDEFINED_POLICIES that can disable predefined policies objects and enable functions without arguments if needed.
Other performance improvements.

Intel® DPC++ Compatibility Tool

Enabled partial migration of cuFFT API calls to oneMKL API calls, added initial migration of CUDA Driver API calls, improved Thrust API calls migration coverage
The tool can now merge the migrated source code from different branches of a conditional compilation into a single DPC++ source
Better coverage for specific code cases, more hints for developers, bug fixes.

Intel® oneAPI Math Kernel Library (oneMKL)

Introduced Poisson and Exponential distributions with DPC++ device APIs.
Introduced Strided API support for DPC++ and C/Fortran OpenMP Offload.
Enabled OpenMP offload support for an extended set of functions across different components.

Intel® oneAPI Threading Building Blocks (oneTBB)

Added three-way comparison operators for concurrent ordered containers and concurrent_vector
Preview: Extended task_arena constraints to support Intel Hybrid Technology and Intel Hyper-Threading Technology.

Intel® Distribution for GDB*

New feature: Multi-GPU support
Bug fixes

Intel® Integrated Performance Primitives (Intel IPP)

Added support and optimized LZ4 1.9.3 version in Intel® IPP Data compression
Image processing:
- Added floating-point shift support in ippiResizeSuper function.
- New precise bilateral filter for image smoothing based on iterative least square method.

Intel® oneAPI Collective Communications Library (oneCCL)

Added float16 datatype support
Added ip-port hint for customization of KVS creation
Optimizations on communicator creation phase and multi-GPU collectives for single-node case

Intel® oneAPI Data Analytics Library (oneDAL)

Performance optimizations for Random Forest, PCA, SVM algorithms for CPU
Introduced bit-to-bit results reproducibility for CPU algorithms
Implemented Multi-Node Multi-GPU PCA, Low Order Moments, KMeans algorithms
Bug-fixes, performance improvements

Intel® oneAPI Deep Neural Networks Library (oneDNN)

Introduced initial optimizations for bfloat16 functionality for future Intel Xeon Scalable processor with Intel AMX support (code name Sapphire Rapids).
Introduced initial performance optimizations for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
Introduced binary post-op for (de)-convolution, pooling, eltwise, binary, inner product, matmul and reduction (GPU only) along with performance optimizations for CPUs and GPUs. Extended the number of supported post-ops for primitives to 20.
Extended eltwise with support for logsigmoid, mish, hardswish, and clip_v2 algorithms, and binary primitive with support for comparison operators.

Intel® oneAPI Video Processing Library (oneVPL)

Dispatcher and CPU implementation updated to align with oneVPL Specification 2.2
Intel(R) Media SDK to oneVPL migration guide
Windows* 32-bit support

Intel® Distribution for Python*

GPU support in XGBoost
numpy package update to v1.20.1
Support Level Zero driver in numba-dppy
Introduced bit-to-bit results reproducibility for Scikit-learn patches on CPU
Bug-fixes, performance improvements, and improvements in documentation, user guides and examples

Intel® Advisor

Advisor Vectorization and Roofline analysis support for Intel® microarchitecture processors code named Tiger Lake, Ice Lake, and Cooper Lake
Modernized Source View analysis for Offload Modeling and GPU Roofline
Detailed GPU Kernel Analytics view
GPU Performance Projection now considers instruction latencies, that increases accuracy

Intel® VTune™ Profiler

User Interface: a new main vertical toolbar to enhance user experience
Hardware Support: Support for Intel Atom® Processor P Series code named Snow Ridge, including Hotspots, Microarchitecture Exploration, Memory Access, and Input and Output analyses.
GPU Accelerators: Source-level analysis for DPC++ and OpenMP applications running on GPU over Level Zero
Input and Output analysis:
- a new Platform Diagram; extended Intel® Data Direct I/O (Intel DDIO) utilization efficiency metrics.
- Supporting none-root access Linux perf-based data collection on 1st and 2nd Generation Intel Xeon® Scalable Processors on Linux kernel versions 5.10 and newer.

Intel® FPGA Add-On for oneAPI Base Toolkit (Optional)

Add Quartus 20.4 support

New in 2021.1 Product Release

Key features at toolkit level

Support 3 platforms: Linux*, Windows*, and macOS*; but available products for each platform are different.
Please follow the Installation Guide of oneAPI Toolkit to install the latest GPU driver for your operating system.
The default installation path is following. We recommend to uninstall any previous beta releases first before installing this product release.
- Linux or macOS: /opt/intel/oneapi
- Windows: C:\Program Files (x86)\Intel\oneAPI
If you have been using the oneAPI Base Toolkit beta release or Intel® Parallel Studio or Intel® System Studio for your application, to move to the oneAPI Base Toolkit product release please rebuild your whole application with it.
For Linux and Windows the Intel® oneAPI DPC++/C++ Compiler is included that includes a C/C++ compiler (icx) and a DPC++ compiler (dpcpp).
For device offload code the Level-0 runtimes is the default backend. Following the instructions at Resource & Documentation Center to change the backend to OpenCL* if needed. Not all the library APIs or products support both Level-0 and OpenCL backend. Please read the product level Release Notes and documentation for details.
Intel® oneAPI Base Toolkits support co-existence or side-by-side installation with Intel® Parallel Studio XE or Intel® System Studio on Linux*, Windows* and macOS*.
Support of YUM and APT distribution for oneAPI Toolkits distribution and additional support of distribution channels for performance libraries in Conda, PIP, and NuGet.

Key features in each component

Intel® oneAPI DPC++/C++ Compiler

Intel® oneAPI DPC++ Compiler
- Support for DPC++ 1.0 specification
- Support of Ahead-Of-Time (AOT) compilation
- Experimental Explicit SIMD programming support
- Integration with Visual Studio* 2017 & 2019, plus Eclipse* on Linux
- Support for targeting multiple FPGA platforms.
Intel® C++ Compiler:
- Clang and LLVM based compiler with driver name "icx"
- OpenMP 4.5 and Subset of OpenMP 5.0 with offloading support
- Vectorization and Loop Optimizations

Intel® oneAPI DPC++ Library (oneDPL)

Support the oneDPL Specification_ v1.0, including parallel algorithms, DPC++ execution policies, special iterators, and other utilities.
oneDPL algorithms can work with data in DPC++ buffers as well as in unified shared memory (USM).
A subset of the standard C++ libraries is supported in DPC++ kernels, including "<array>", "<complex>", "<functional>", "<tuple>", "<utility>" and other standard library API
Standard C++ random number generators and distributions for use in DPC++ kernels.

Intel® DPC++ Compatibility Tool

Support for migration of CUDA* kernels, host and device API calls (for example, memory management, events, math, etc..) and library calls (cuBLAS, cuSPARSE, cuSolver, cuRand, Nvidia* Thrust*). Typically, 80%-90% of CUDA code migrates to DPC++ code by the tool.
Warning messages are emitted to command line output and inlined into the generated code, when the code requires manual work to help you finish the application.
Integration with Visual Studio* 2017 and 2019 on Windows and Eclipse* on Linux provides enhanced migration usability.

Intel® oneAPI Math Kernel Library (oneMKL)

With this release, the product previously known as the Intel® Math Kernel Library becomes the Intel® oneAPI Math Kernel Library (oneMKL).
Added support for following programming models: Data Parallel C++ (DPC++) API’s support programming for both the CPU and Intel GPUs, and C/Fortran OpenMP Offload interfaces to program Intel GPUs.
Introduced Unified Shared Memory (USM) support for Intel Processor Graphics and Xe architecture-based graphics.

Intel® oneAPI Threading Building Blocks (oneTBB)

Changes affecting backward compatibility
- The code base was revamped to improve the usability and simplify the library, see TBB Revamp: Background, Changes, and Modernization. This version of the library is not backward compatible with any of the previous releases.
New features:
- Concurrent ordered containers, task_arena interface extension for NUMA, flow graph API to support relative priorities for functional nodes and resumable tasks are fully supported now.
- Implemented task_arena interface extension to specify priority of the arena.

Intel® Distribution for GDB*

Supports debugging kernels offloaded to the CPU, GPU and FPGA-emulation devices.
Automatically attaches to the GPU device to listen to debug events.
Automatically detects JIT-compiled, or dynamically loaded, kernel code for debugging.
Supports DPC++, C++ OpenMP offload debugging, and OpenCL.
Provides ability to list active SIMD lanes and switching the current SIMD lane context per thread

Intel® Integrated Performance Primitives (Intel IPP)

CPU only support
Extended optimization for Intel® IPP Cryptography cipher AES, RSA support on 10th Generation Intel® Core™ processor family.
Added new universal CRC function to compute CRC8, CRC16, CRC24, CRC32 checksums

Intel® oneAPI Collective Communications Library (oneCCL)

Enables efficient implementations of collectives used for deep learning training (allgatherv, allreduce, alltoall(v), broadcast, reduce, reduce_scatter)
Provides C++ API and interoperability with DPC++
Deep Learning Optimizations include:
- Asynchronous progress for compute communication overlap
- Dedication of cores to ensure optimal network use
- Message prioritization, persistence, and out-of-order execution
- Collectives in low-precision data types (int[8,16,32,64], fp[32,64], bf16)
Linux* OS support only

Intel® oneAPI Data Analytics Library (oneDAL)

Renamed the library from Intel® Data Analytics Acceleration Library to oneAPI Data Analytics Library and changed the package names to reflect this.
Deprecated 32-bit version of the library.
Introduced Intel GPU support for both OpenCL and Level Zero backends.
Aligned the library with oneDAL Specification 1.0 for the following algorithms on both CPU/GPU:
K-means, PCA, Random Forest Classification and Regression, kNN and SVM
Introduced new Intel® DAAL and daal4py functionality on GPU :
- Batch algorithms: K-means, Covariance, PCA, Logistic Regression, Linear Regression, Random Forest Classification and Regression, Gradient Boosting Classification and Regression, kNN, SVM, DBSCAN and Low-order moments
- Online algorithms: Covariance, PCA, Linear Regression and Low-order moments
- Added Data Management functionality to support DPC++ APIs: a new table type for representation of SYCL-based numeric tables (SyclNumericTable) and an optimized CSV data source
- Added Technical Preview Features in Graph Analytics on CPU - Jaccard Similarity Coefficients

Intel® oneAPI Deep Neural Networks Library (oneDNN)

Introduced SYCL* API extensions compliant with oneAPI specification v1.0.
Introduced support for Intel(R) DPC++ Compiler and Level Zero runtime.
Introduced Unified Shared Memory (USM) support for Intel Processor Graphics and Xe architecture-based graphics.

Intel® oneAPI Video Processing Library (oneVPL)

AVC/H.264, HEVC/H.265, MJPEG, and AV1 software decode and encode
Video processing (resize, color conversion, and crop)
Frame memory management with user interface and internally allocated buffers
DPC++ kernel integration

Intel® Distribution for Python*

Machine Learning: XGBoost 1.2 with new CPU optimizations, and new Scikit-learn and daal4py optimizations including Random Forest Classification/Regression, kNN, sparse K-means, DBSCAN, SVM, SVC, Random Forest, Logistic Regression, and more.
Initial GPU support: GPU-enabled Data Parallel NumPy* (dpnp); DPCTL, a new Python package for device, queue, and USM data management with initial support in dpnp, scikit-learn, daal4py, and numba; daal4py optimizations for GPU; and GPU support in scikit-learn for DBSCAN, K-Means, Linear Regression and Logistic Regression.
Intel® Scalable Dataframe Compiler (Intel® SDC) Beta – Numba extension for accelerating Pandas*

Intel® Advisor

Offload Advisor: Get your code ready for efficient GPU offload even before you have the hardware. Identify offload opportunities, quantify potential speedup, locate bottlenecks, estimate data transfer costs, and get guidance on how to optimize.
Automated Roofline Analysis for GPUs: Visualize actual performance of GPU kernels against hardware-imposed performance limitations and get recommendations for effective memory vs. compute optimization.
Memory-level Roofline Analysis: Pinpoint exact memory hierarchy bottlenecks (L1, L2, L3 or DRAM).
Flow Graph Analyzer support for DPC++: Visualize asynchronous task graphs, diagnose performance issues, and get recommendations to fix them.
Intuitive User Interface: New interface workflows and toolbars incorporate Roofline Analysis for GPUs and Offload Advisor.
Intel® Iris® Xe MAX graphics support: Roofline analysis and Offload Advisor now supports Intel® Iris® Xe MAX graphics.

Intel® VTune™ Profiler

Find performance degrading memory transfers with offload cost profiling for both DPC++ and OpenMP.
Debug throttling issues and tune flops/watt using power analysis.
Find the module causing performance killing I/O writes using improved I/O analysis that identifies where slow MMIO writes are made.
Less guessing is needed when optimizing FPGA software performance as developers can now get stall and data transfer data for each compute unit in the FPGA.
A new Performance Snapshot is the first profiling step. It suggests the detailed analyses (memory, threading, etc.) that offer the most optimization opportunities.

Intel® FPGA Add-On for oneAPI Base Toolkit (Optional)

Support for installing the Intel® FPGA Add-On for oneAPI Base Toolkit via Linux package managers (YUM, APT, and Zypper).
Support three FPGA boards (including Intel® PAC with Intel® Arria® 10 GX, Intel® FPGA PAC D5005, and custom platform) with four add-on installers.

System Requirements

Please see Intel oneAPI Base Toolkit System Requirements

Installation Instructions

Please visit Installation Guide for Intel oneAPI Toolkits

How to Start Using the Tools

Please reference:

Known Issues, Limitations and Workarounds

Known Issue - The compiler and compiler32 Environment Module scripts (aka modulefiles) require version 4.1 or greater of Environment Modules: the compiler and compiler32 modulefiles fail when used with versions 4.0 and 3.x of the Tcl Modulefiles application. These older versions of the Environment Modules application are usually found on older Linux distributions, such as CentOS 6.x and CentOS 7.x. Type module --version to display the version of Environment Modules that is installed on your system.
Workarounds: If you only need to configure your environment for use with the Intel C++ Compiler classic (aka ICC) you can use the new icc and icc32 modulefiles. If you need to configure your environment for use with the Intel DPC++/C++ Compiler you can source the compiler’s env/vars.sh script. For example, if your oneAPI installation is in the default sudo/root install location: source /opt/intel/oneapi/compiler/latest/env/vars.sh. You can confirm that the environment was configured by checking the value of the $CMPLR_ROOT environment variable. If you have more then one version of the compiler installed and wish to configure your environment to use an earlier version, change the reference to latest to match the name of the folder that corresponds to the compiler version you wish to use. For example: source /opt/intel/oneapi/compiler/2021.3.0/env/vars.sh.
Known Issue - For users of 2021.2 version of the Intel® AI Analytics Toolkit who wish to install the Intel® oneapi Base Toolkit 2021.3 be aware of a known issue: If you have AI Toolkit 2021.2 installed and do not have the Base Toolkit 2021.2 installed, when you try to install Base Toolkit 2021.3 it will fail - the installation of Base Toolkit 2021.3 will fail, it will not install.
Workaround: To work around this issue: You have AI Toolkit 2021.2 installed. First install the the Intel® oneapi Base Toolkit 2021.2. Once Base Toolkit 2021.2 is installed alongside your AI Toolkit 2021.2, you can now install the Intel® oneapi Base Toolkit 2021.3. This issue will be fixed in a future Update.
Please read the whitepaper Challenges, tips, and known issues when debugging heterogeneous programs using DPC++ or OpenMP offload
Limitations
1. Running any GPU code on a Virtual Machine is not supported at this time.
2. If you have chosen to download the Get Started Guide to use offline, viewing it in Chrome may cause the text to disappear when the browser window is resized. To fix this problem, resize your browser window again, or use a different browser.
3. Eclipse* 4.12: the code sample project created by IDE plugin from Makefile will not build. It is a known issue with Eclipse 4.12. Please use Eclipse 4.9, 4.10 or 4.11.
Known issue- During the silent seamless installation of the Intel® FPGA Add-on package, no matter which add-on package you select for installation, all Intel® FPGA Add-on packages might get installed on your system.
As a workaround for this issue, specify the intel.oneapi.lin.fpga.group component in the component list before the desired add-on component. For example:
```
components=default:intel.oneapi.lin.fpga.group:intel.oneapi.lin.fpga.custom_platforms.quartus211
```
Known issue - Linux repositories serving package managers
- A bug is discovered in the 2021.1 Gold release Linux packages provided by the Linux repositories serving package managers YUM/DNF, APT, Zypper. You will not be able to use package manager “UPGRADE” process for oneAPI packages. Please read this article
  
  The following platform and distributions are not impacted:
  - Does not impact Windows or macOS
  - Does not impact IRC distributions
  - Does not impact containers
Known issue for FPGA and GPU regarding libtinfo.so.5 library - clang: error while loading shared libraries: libtinfo.so.5: cannot open shared object file: No such file or directory
- When compiling for FPGA or GPU, you might see this error. To work around this issue, you must install the required compatibility library by executing one of the following OS-specific commands:
  - On Ubuntu 20.04: sudo apt install -y libncurses5 libncurses5-dev libncursesw5-dev
  - On RHEL/CentOS 8: sudo yum install ncurses-compat-libs
  - On SUSE 15: sudo zypper install libcurses5 ncurses5-devel
Known issue - regarding namespace "oneapi" conflicting with older compilers - error: reference to 'tbb' is ambiguous
- This issue is only found with the following compilers:
  1. GNU* gcc 7.x or older
  2. LLVM* Clang 3.7 or older
  3. Intel® C++ Compiler 19.0 or older
  4. Visual Studio 2017 version 15.6 or older
- If your code uses the namespace in the following manner and one of the compilers above, you may get the compilation errors like "error: reference to 'tbb' is ambiguous".
  
  The "using namespace oneapi;" directive in a oneDPL|oneDNN|oneTBB program code may result in compilation errors with the compilers listed above.
  
  test_tbb.cpp:
```
namespace tbb { int bar(); }
namespace oneapi { namespace tbb = ::tbb; }

using namespace oneapi;
int zoo0() { return tbb::bar(); }
```
  Compiling: .
```
test_tbb.cpp: In function 'int zoo0()':
test_tbb.cpp:5:21: error: reference to 'tbb' is ambiguous
int zoo0() { return tbb::bar(); }
```
  Workarounds:
  
  Instead of the directive "using namespace oneapi;", please use full qualified names or namespace aliases.
  
  test_tbb_workaround.cpp:
```
namespace tbb { int bar(); }
namespace oneapi { namespace tbb = ::tbb; }

// using namespace oneapi;
int zoo0() { return tbb::bar(); }
```
  Additional Notes:
  
  The "using namespace oneapi;" directive is not recommended right now, as it may result in compilation errors when oneMKL| oneDAL| oneCCL is used with other oneAPI libraries. There're two workarounds:
  - Use the full qualified namespace like above
  - Use namespace alias for oneMKL| oneDAL| oneCCL, e.g.
    - namespace one[mkl|dal|ccl] = ::oneapi::[mkl|dal|ccl]; onemkl::blas::dgemm( … ); | onedal::train(); | onccl::allgathersv();
Known issue - the installation error on Windows "LoadLibrary failed with error 126: the specified module could not be found" in certain environment only
- Impacted environment: Windows with AMD graphics card
- Details:
  When a Windows system has AMD* graphics cards or AMD Radeon Vega* graphics units, the installer of oneAPI Toolkits may report the error "LoadLibrary failed with error 126: the specified module could not be found". This has been reported and is being investigated. Please use the workaround for this release.
- Workaround:
  
  Temporarily disable the Intel® HD Graphics during the installation of oneAPI Toolkits with the steps below:
  
  Open the Device Manager > Display Adapters; Right click on the listed display (common is the intel integrated graphics accelerator) and select DISABLE.
Known issue - "Debug Error!" from Microsoft Visual C++ Runtime Library
- Impacted environment: Windows, "Debug" build only, mixed use of DPC++ & oneAPI libraries (except oneTBB)
- Details: This error may occur only when the DPC++ program is built in "Debug" configuration and it uses one of the oneAPI libraries that do not have a dynamic debug libraries, e.g. oneVPL; The oneTBB is not impacted by this issue.
- The error is similar to the following:
- Workaround:
  - Use "Release" configuration to build the program for now.
More limitations on Windows
- For users who have Visual Studio* 2017 or 2019 installed, the installation of IDE integrations of oneAPI Base Toolkit is very slow. It may take 30+minutes just for the IDE integrations installation sometimes. Please be extra patient and it will be eventually installed.
- If you encounter a runtime error that "... ... sycl.dll was not found. ... ..." or similar like below
  when running your program within Visual Studio, please follow the instructions below to update the project property "Debugging > Environment" in order to run the program:
  - Open the project property "Debugging > Environment" property and click right drop-down & select Edit
  - Copy and paste the default PATH environment variable value from lower section to the above section.
    This step is very important because of how Visual Studio 2017 or newer handles the additional directory for the "PATH" environment variable.
  - Add any additional directories needed by the program for those dll files to the path like below
- Error when running a code sample program within Visual Studio: unable to start program 'xxx.exe"
  
  Please follow the instructions below for the workaround.
  - Open Tools Options dialog, select Debugging tab, and select the check-box of "Automatically close the console when debugging stops". See the dialog image below for details.

Deprecation Notices

Support for Intel® Xeon Phi™ Processor x200 “Knights Landing (KNL)” and Intel® Xeon Phi™ Processors “Knights Mill (KNM)” is deprecated and will be removed in a future release.

Intel® Xeon Phi™ customers should continue to use compilers, libraries, and tools from the Intel® Parallel Studio XE 2020 and older PSXE releases, or compilers from the Intel® oneAPI Base Toolkit and Intel® oneAPI HPC Toolkit versions 2021.2 or 2021.1.

OS Deprecation Notice

OS version	Notice of Change	Final Support Release
*SUSE Linux Enterprise Server 12 SUSE Linux Enterprise Server* 15 SP1**	Starting with the oneAPI 2021.4 release, SLES 12 and SLES 15 SP1 will no longer be supported. Support of SLES 15 SP3 begins 2021.4 and we will continue to support SLES SP2	2021.3 is the final release to support SLES 15 SP1 and SLES 12
Fedora 32 Fedora 33	Starting with the oneAPI 2022.1 release, Fedora 32 and Fedora 33 will no longer be supported. Support of Fedora 34 begins 2022.1	2021.4 is the final release to support Fedora 32 and Fedora 33
CentOS 8.x	Starting with the oneAPI 2022.1 software release, CentOS 8.x will no longer be supported. Please note that CentOS 7.x will continue to be supported. See EOL notice from Red Hat - [https://lists.centos.org/pipermail/centos-announce/2020-December/048208.html]	2021.4 will be the final release to support CentOS 8.x

OS version

Notice of Change

Final Support Release

SUSE Linux Enterprise Server* 12

SUSE Linux Enterprise Server* 15 SP1

Starting with the oneAPI 2021.4 release, SLES 12 and SLES 15 SP1 will no longer be supported.

Support of SLES 15 SP3 begins 2021.4 and we will continue to support SLES SP2

2021.3 is the final release to support SLES 15 SP1 and SLES 12

Fedora 32
Fedora 33

Starting with the oneAPI 2022.1 release, Fedora 32 and Fedora 33 will no longer be supported.

Support of Fedora 34 begins 2022.1

2021.4 is the final release to support Fedora 32 and Fedora 33

CentOS 8.x

Starting with the oneAPI 2022.1 software release, CentOS 8.x will no longer be supported.

Please note that CentOS 7.x will continue to be supported.
See EOL notice from Red Hat - [https://lists.centos.org/pipermail/centos-announce/2020-December/048208.html]

2021.4 will be the final release to support CentOS 8.x

Release Notes for All Tools included in Intel® oneAPI Base Toolkit

Notices and Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI Base Toolkit Release Notes

Major Features Supported

New in 2021.4 Product Release

Key Features at toolkit level

Intel® oneAPI DPC++/C++ Compiler 2021.4.0

Intel® oneAPI DPC++ Library (oneDPL) 2021.5.0

Intel® DPC++ Compatibility Tool 2021.4.0

Intel® oneAPI Math Kernel Library (oneMKL) 2021.4.0

Intel® oneAPI Threading Building Blocks (oneTBB) 2021.4.0

Intel® Distribution for GDB* 2021.4.0

Intel® Integrated Performance Primitives (Intel IPP) 2021.4.0

Intel® oneAPI Collective Communications Library (oneCCL) 2021.4.0

Intel® oneAPI Data Analytics Library (oneDAL) 2021.4.0

Intel® oneAPI Deep Neural Networks Library (oneDNN) 2021.4.0

Intel® oneAPI Video Processing Library (oneVPL) 2021.6.0

Intel® Distribution for Python* 2021.4.0

Intel® Advisor 2021.4.0

Intel® VTune™ Profiler 2021.7.1

Diagnostics Utility for Intel® oneAPI Toolkits

Intel® FPGA Add-On for oneAPI Base Toolkit 2021.4.0 (Optional)

New in 2021.3 Product Release

Key Features at toolkit level

Key Features at component level

New in 2021.2 Product Release

Key features at toolkit level

Key features in each component

New in 2021.1 Product Release

Key features at toolkit level

Key features in each component

System Requirements

Installation Instructions

How to Start Using the Tools

Known Issues, Limitations and Workarounds

Deprecation Notices

Release Notes for All Tools included in Intel® oneAPI Base Toolkit

Notices and Disclaimers

Product and Performance Information