Where to Find the Release
Please follow the steps to download the toolkit from the Base Toolkit Download, and follow the installation instructions.
Overview
The Intel® oneAPI DPC++ Library (oneDPL) accompanies the Intel® oneAPI DPC++/C++ Compiler and provides high-productivity APIs aimed to minimize programming efforts of C++ developers creating efficient heterogeneous applications.
2021.7.1
New Features
- Added possibility to construct a zip_iterator out of a std::tuple of iterators.
- Added 9 more serial-based versions of algorithms: is_heap, is_heap_until, make_heap, push_heap, pop_heap, is_sorted, is_sorted_until, partial_sort, partial_sort_copy. Please refer to Tested Standard C++ API Reference.
Fixed Issues
- Added namespace alias dpl = oneapi::dpl.
- Fixed error in reduce_by_segment algorithm.
- Fixed errors when data size is 0 in upper_bound, lower_bound and binary_search algorithms.
- Fixed wrong results error in algorithms call with permutation iterator.
Deprecation Notice
- None in this release.
Known Issues and Limitations
New in This Release
- None in this release.
Existing Issues
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option.
- STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
NOTE: See oneDPL Guide for other restrictions and known limitations.
2021.7.0
New Features
- No new feature in this release.
Fixed Issues
- Fixed compilation errors with C++20.
- Fixed a kernel name definition error in range-based algorithms and reduce_by_segment used with a device_policy object that has no explicit kernel name.
- Fixed CL_OUT_OF_RESOURCES issue for Radix sort algorithm executed on CPU devices.
- Fixed crashes in exclusive_scan_by_segment, inclusive_scan_by_segment, reduce_by_segment algorithms applied to device-allocated USM.
Deprecation Notice
- Deprecated support of C++11 for Parallel API with host execution policies (seq, unseq, par, par_unseq). C++17 is the minimal required version going forward.
Known Issues and Limitations
New in This Release
- STL algorithm functions (such as std::for_each) used in DPC++ kernels do not compile with the debug version of the Microsoft* Visual C++ standard library.
Existing Issues
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option.
NOTE: See oneDPL Guide for other restrictions and known limitations.
2021.6.0
New Features
- Added new implementation for par and par_unseq execution policies based on OpenMP* 4.5 pragmas, which can be enabled with ONEDPL_USE_OPENMP_BACKEND macro. For more details, please see the Macros page in the Library Guide.
- Added the range-based version of the reduce_by_segment algorithm and improved performance of the iterator-based reduce_by_segment APIs. This reduce_by_segment algorithm requires C++17.
- Added the serial-based versions of the following algorithms to the Tested Standard C++ API Reference: for_each_n, copy, copy_backward, copy_if, copy_n, is_permutation, fill, fill_n, move, move_backward.
Fixed Issues
- Fixed param_type API of random number distributions to satisfy C++ standard requirements.The new definitions of param_type are not compatible with incorrect definitions in previous library versions.
- Fixed hangs and errors when oneDPL is used together with oneMKL in DPC++ programs.
- Fixed possible data races in the following algorithms used with DPC++ execution policies: sort, stable_sort, partial_sort, nth_element.
Known Issues and Limitations
New in This Release
- No new issue in this release.
Existing Issues
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- The oneapi::dpl::experimental::ranges::reverse algorithm is not available with -fno-sycl-unnamed-lambda option.
NOTE: See the Library Guide for other restrictions and known limitations.
2021.5.0
New Features
- Added new random number distributions: exponential_distribution, bernoulli_distribution, geometric_distribution, lognormal_distribution, weibull_distribution, cachy_distribution, extreme_value_distribution.
- Added the serial-based versions of the following algorithms: all_of, any_of, none_of, count, count_if, for_each, find, find_if, find_if_not. For the detailed list, please refer to Tested Standard C++ API Reference.
- Improved performance of search and find_end algorithms on GPU devices.
Fixed Issues
- Fixed SYCL* 2020 features deprecation warnings.
- Fixed some corner cases of normal_distribution functionality.
- Fixed a floating point exception occurring on CPU devices when a program uses a lot of oneDPL algorithms and DPC++ kernels.
- Fixed possible hanging and data races of the following algorithms used with DPC++ execution policies: count, count_if, is_partitioned, lexicographical_compare, max_element, min_element, minmax_element, reduce, transform_reduce.
Known Issues and Limitations
New in This Release
- The definition of lambda functions used with parallel algorithms should not depend on preprocessor macros that makes it different for the host and the device. Otherwise, the behavior is undefined.
Existing Issues
- exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
- The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
- The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
- std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.
2021.4.0
New Features
- Added the range-based versions of the following algorithms: any_of, adjacent_find, copy_if, none_of , remove_copy_if, remove_copy, replace_copy, replace_copy_if, reverse, reverse_copy, rotate_copy, swap_ranges, unique, unique_copy.
- Added new asynchronous algorithms: inclusive_scan_async, exclusive_scan_async, transform_inclusive_scan_async, transform_exclusive_scan_async.
- Added structured binding support for zip_iterator::value_type.
Fixed Issues
- Fixed an issue with asynchronous algorithms returning future<ptr> with unified shared memory (USM).
Known Issues and Limitations
New in this Release
- With Intel® oneAPI DPC++/C++ Compiler, unseq and par_unseq execution policies do not use OpenMP SIMD pragmas due to compilation issues with the -fopenm-simd option, possibly resulting in suboptimal performance.
- The oneapi::dpl::experimental::ranges::reverse algorithm does not compile with -fno-sycl-unnamed-lambda option.
Existing Issues
- exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
- The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
- The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
- std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.
2021.3.0
New Features
- Added the range-based versions of the following algorithms: all_of, any_of, count, count_if, equal, move, remove, remove_if, replace, replace_if.
- Added the following utility ranges (views): generate, fill, rotate.
Changes to Existing Features
- Improved performance of discard_block_engine (including ranlux24, ranlux48, ranlux24_vec, ranlux48_vec predefined engines) and normal_distribution.
- Added two constructors to transform_iterator: the default constructor and a constructor from an iterator without a transformation. transform_iterator constructed these ways uses transformation functor of type passed in template arguments.
- transform_iterator can now work on top of forward iterators.
Fixed Issues
- Fixed execution of swap_ranges algorithm with unseq, par execution policies.
- Fixed an issue causing memory corruption and double freeing in scan-based algorithms compiled with -O0 and -g options and run on CPU devices.
- Fixed incorrect behavior in the exclusive_scan algorithm that occurred when the input and ouput iterator ranges overlapped.
- Fixed error propagation for async runtime exceptions by consistently calling sycl::event::wait_and_throw internally.
- Fixed the warning: local variable will be copied despite being returned by name [-Wreturn-std-move].
Known Issues and Limitations
- No new issues in this release.
Existing Issues
- exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
- The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
- The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
- std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.
2021.2.0
New Features
- Added support of parallel, vector and DPC++ execution policies for the following algorithms: shift_left, shift_right.
- Added the Range-based versions of the following algorithms: sort, stable_sort, merge.
- Added experimental asynchronous algorithms: copy_async, fill_async, for_each_async, reduce_async, sort_async, transform_async, transform_reduce_async. These algorithms are declared in oneapi::dpl::experimental namespace and implemented only for DPC++ policies. In order to make these algorithms available the <oneapi/dpl/async> header should be included. Use of the asynchronous API requires C++11.
- Utility function wait_for_all enables waiting for completion of an arbitrary number of events.
- Added the ONEDPL_USE_PREDEFINED_POLICIES macro, which enables predefined policy objects and make_device_policy, make_fpga_policy functions without arguments. It is turned on by default.
Changes to Existing Features
- Improved performance of the following algorithms: count, count_if, is_partitioned, lexicographical_compare, max_element, min_element, minmax_element, reduce, transform_reduce, and sort, stable_sort when using Radix sort.
Note: The sorting algorithms in oneDPL use Radix sort for arithmetic data types compared with std::less or std::greater, otherwise Merge sort. - Improved performance of the linear_congruential_engine RNG engine (including minstd_rand, minstd_rand0, minstd_rand_vec, minstd_rand0_vec predefined engines).
Fixed Issues
- Fixed runtime errors occurring with find_end, search, search_n algorithms when a program is built with -O0 option and executed on CPU devices.
- Fixed the majority of unused parameter warnings.
Known Issues and Limitations
- exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
- The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
- The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
- std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.
2021.1.1
New Features
- Added new random number distributions: exponential_distribution, bernoulli_distribution, geometric_distribution, lognormal_distribution, weibull_distribution, cachy_distribution, extreme_value_distribution.
- Added the serial-based versions of the following algorithms: all_of, any_of, none_of, count, count_if, for_each, find, find_if, find_if_not. For the detailed list, please refer to Tested Standard C++ API Reference.
- Improved performance of search and find_end algorithms on GPU devices.
Fixed Issues
- Fixed SYCL* 2020 features deprecation warnings.
- Fixed some corner cases of normal_distribution functionality.
- Fixed a floating point exception occurring on CPU devices when a program uses a lot of oneDPL algorithms and DPC++ kernels.
- Fixed possible hanging and data races of the following algorithms used with DPC++ execution policies: count, count_if, is_partitioned, lexicographical_compare, max_element, min_element, minmax_element, reduce, transform_reduce.
Known Issues and Limitations
New in This Release
- The definition of lambda functions used with parallel algorithms should not depend on preprocessor macros that makes it different for the host and the device. Otherwise, the behavior is undefined.
Existing Issues
- exclusive_scan and transform_exclusive_scan algorithms may provide wrong results with vector execution policies when building a program with GCC 10 and using -O0 option.
- Some algorithms may hang when a program is built with -O0 option, executed on GPU devices and large number of elements is to be processed.
- The use of oneDPL together with the GNU C++ standard library (libstdc++) version 9 or 10 may lead to compilation errors (caused by oneTBB API changes). To overcome these issues, include oneDPL header files before the standard C++ header files, or disable parallel algorithms support in the standard library. For more information, please see Intel® oneAPI Threading Building Blocks (oneTBB) Release Notes.
- The using namespace oneapi; directive in a oneDPL program code may result in compilation errors with some compilers including GCC 7 and earlier. Instead of this directive, explicitly use oneapi::dpl namespace, or create a namespace alias.
- The implementation does not yet provide namespace oneapi::std as defined in the oneDPL Specification.
- The use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).
- std::tuple, std::pair cannot be used with SYCL buffers to transfer data between host and device.
- When used within DPC++ kernels or transferred to/from a device, std::array can only hold objects whose type meets DPC++ requirements for use in kernels and for data transfer, respectively.
- std::array::at member function cannot be used in kernels because it may throw an exception; use std::array::operator[] instead.
- std::array cannot be swapped in DPC++ kernels with std::swap function or swap member function in the Microsoft* Visual C++ standard library.
- Due to specifics of Microsoft* Visual C++, some standard floating-point math functions (including std::ldexp, std::frexp, std::sqrt(std::complex<float>)) require device support for double precision.
Additional Documentation
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.