This page provides the current Release Notes for Intel® oneAPI Deep Neural Network Library. The notes are categorized by major version, from newest to oldest, with individual releases listed within each version section.
Where to Find the Release
Please follow the steps to download Intel® oneAPI Base toolkit contained oneDNN from the Main Portal of Intel® oneAPI Base toolkit, and follow the installation instructions to install.
What's New - 2025.0.1
- Updated execution units (EU) number detection logic for Intel® GPUs based on Xe2 architecture to accommodate behavioral changes in Linux* drivers and handled other bug fixes.
Third Party Programs File
oneAPI Deep Neural Network Library (oneDNN) Third Party Programs File
This file contains the list of third party software ("third party programs")
contained in the Intel software and their required notices and/or license
terms. This third party software, even if included with the distribution of
the Intel software, may be governed by separate license terms, including
without limitation, third party license terms, other Intel software license
terms, and open source software license terms. These separate license terms
govern your use of the third party programs as set forth in in the
"THIRD-PARTY-PROGRAMS" file.
Third party programs and their corresponding required notices and/or license
terms are listed below.
--------------------------------------------------------------------------------
1. XByak (src/cpu/xbyak/)
Copyright (c) 2007 MITSUNARI Shigeo
All rights reserved.
3-Clause BSD License
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
Neither the name of the copyright owner nor the names of its contributors may
be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.
ソースコード形式かバイナリ形式か、変更するかしないかを問わず、以下の条件を満た
す場合に限り、再頒布および使用が許可されます。
ソースコードを再頒布する場合、上記の著作権表示、本条件一覧、および下記免責条項
を含めること。
バイナリ形式で再頒布する場合、頒布物に付属のドキュメント等の資料に、上記の著作
権表示、本条件一覧、および下記免責条項を含めること。
書面による特別の許可なしに、本ソフトウェアから派生した製品の宣伝または販売促進
に、著作権者の名前またはコントリビューターの名前を使用してはならない。
本ソフトウェアは、著作権者およびコントリビューターによって「現状のまま」提供さ
れており、明示黙示を問わず、商業的な使用可能性、および特定の目的に対する適合性
に関する暗黙の保証も含め、またそれに限定されない、いかなる保証もありません。
著作権者もコントリビューターも、事由のいかんを問わず、 損害発生の原因いかんを
問わず、かつ責任の根拠が契約であるか厳格責任であるか(過失その他の)不法行為で
あるかを問わず、仮にそのような損害が発生する可能性を知らされていたとしても、
本ソフトウェアの使用によって発生した(代替品または代用サービスの調達、使用の
喪失、データの喪失、利益の喪失、業務の中断も含め、またそれに限定されない)直接
損害、間接損害、偶発的な損害、特別損害、懲罰的損害、または結果損害について、
一切責任を負わないものとします。
--------------------------------------------------------------------------------
2. Googletest (tests/gtests/gtest/)
Copyright 2005, Google Inc.
Copyright 2006, Google Inc.
Copyright 2007, Google Inc.
Copyright 2008, Google Inc.
Copyright 2015, Google Inc.
All rights reserved.
3-Clause BSD License
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--------------------------------------------------------------------------------
3. Instrumentation and Tracing Technology API (src/common/ittnotify/)
Copyright (c) 2011, Intel Corporation. All rights reserved.
Copyright (c) 2005-2014 Intel Corporation. All rights reserved.
3-Clause BSD License
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--------------------------------------------------------------------------------
4. CMake (cmake/FindOpenCL.cmake, cmake/FindBLAS.cmake, cmake/FindACL.cmake)
CMake - Cross Platform Makefile Generator
Copyright 2000-2020 Kitware, Inc. and Contributors
All rights reserved.
3-Clause BSD License
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Kitware, Inc. nor the names of Contributors
may be used to endorse or promote products derived from this
software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
------------------------------------------------------------------------------
The following individuals and institutions are among the Contributors:
* Aaron C. Meadows <cmake@shadowguarddev.com>
* Adriaan de Groot <groot@kde.org>
* Aleksey Avdeev <solo@altlinux.ru>
* Alexander Neundorf <neundorf@kde.org>
* Alexander Smorkalov <alexander.smorkalov@itseez.com>
* Alexey Sokolov <sokolov@google.com>
* Alex Merry <alex.merry@kde.org>
* Alex Turbov <i.zaufi@gmail.com>
* Andreas Pakulat <apaku@gmx.de>
* Andreas Schneider <asn@cryptomilk.org>
* André Rigland Brodtkorb <Andre.Brodtkorb@ifi.uio.no>
* Axel Huebl, Helmholtz-Zentrum Dresden - Rossendorf
* Benjamin Eikel
* Bjoern Ricks <bjoern.ricks@gmail.com>
* Brad Hards <bradh@kde.org>
* Christopher Harvey
* Christoph Grüninger <foss@grueninger.de>
* Clement Creusot <creusot@cs.york.ac.uk>
* Daniel Blezek <blezek@gmail.com>
* Daniel Pfeifer <daniel@pfeifer-mail.de>
* Enrico Scholz <enrico.scholz@informatik.tu-chemnitz.de>
* Eran Ifrah <eran.ifrah@gmail.com>
* Esben Mose Hansen, Ange Optimization ApS
* Geoffrey Viola <geoffrey.viola@asirobots.com>
* Google Inc
* Gregor Jasny
* Helio Chissini de Castro <helio@kde.org>
* Ilya Lavrenov <ilya.lavrenov@itseez.com>
* Insight Software Consortium <insightsoftwareconsortium.org>
* Jan Woetzel
* Julien Schueller
* Kelly Thompson <kgt@lanl.gov>
* Konstantin Podsvirov <konstantin@podsvirov.pro>
* Laurent Montel <montel@kde.org>
* Mario Bensi <mbensi@ipsquad.net>
* Martin Gräßlin <mgraesslin@kde.org>
* Mathieu Malaterre <mathieu.malaterre@gmail.com>
* Matthaeus G. Chajdas
* Matthias Kretz <kretz@kde.org>
* Matthias Maennich <matthias@maennich.net>
* Michael Hirsch, Ph.D. <www.scivision.co>
* Michael Stürmer
* Miguel A. Figueroa-Villanueva
* Mike Jackson
* Mike McQuaid <mike@mikemcquaid.com>
* Nicolas Bock <nicolasbock@gmail.com>
* Nicolas Despres <nicolas.despres@gmail.com>
* Nikita Krupen'ko <krnekit@gmail.com>
* NVIDIA Corporation <www.nvidia.com>
* OpenGamma Ltd. <opengamma.com>
* Patrick Stotko <stotko@cs.uni-bonn.de>
* Per Øyvind Karlsen <peroyvind@mandriva.org>
* Peter Collingbourne <peter@pcc.me.uk>
* Petr Gotthard <gotthard@honeywell.com>
* Philip Lowman <philip@yhbt.com>
* Philippe Proulx <pproulx@efficios.com>
* Raffi Enficiaud, Max Planck Society
* Raumfeld <raumfeld.com>
* Roger Leigh <rleigh@codelibre.net>
* Rolf Eike Beer <eike@sf-mail.de>
* Roman Donchenko <roman.donchenko@itseez.com>
* Roman Kharitonov <roman.kharitonov@itseez.com>
* Ruslan Baratov
* Sebastian Holtermann <sebholt@xwmw.org>
* Stephen Kelly <steveire@gmail.com>
* Sylvain Joubert <joubert.sy@gmail.com>
* The Qt Company Ltd.
* Thomas Sondergaard <ts@medical-insight.com>
* Tobias Hunger <tobias.hunger@qt.io>
* Todd Gamblin <tgamblin@llnl.gov>
* Tristan Carel
* University of Dundee
* Vadim Zhukov
* Will Dicharry <wdicharry@stellarscience.com>
See version control history for details of individual contributions.
The above copyright and license notice applies to distributions of
CMake in source and binary form. Third-party software packages supplied
with CMake under compatible licenses provide their own copyright notices
documented in corresponding subdirectories or source files.
------------------------------------------------------------------------------
CMake was initially developed by Kitware with the following sponsorship:
* National Library of Medicine at the National Institutes of Health
as part of the Insight Segmentation and Registration Toolkit (ITK).
* US National Labs (Los Alamos, Livermore, Sandia) ASC Parallel
Visualization Initiative.
* National Alliance for Medical Image Computing (NAMIC) is funded by the
National Institutes of Health through the NIH Roadmap for Medical Research,
Grant U54 EB005149.
* Kitware, Inc.
--------------------------------------------------------------------------------
5. Xbyak_aarch64 (src/cpu/aarch64/xbyak_aarch64/)
Copyright 2019-2020 FUJITSU LIMITED
Apache License, Version 2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
--------------------------------------------------------------------------------
6. Boost C++ Libraries (src/common/primitive_hashing.hpp, src/graph/backend/graph_compiler/core/src/util/hash_utils.hpp)
Copyright 2005-2014 Daniel James.
Boost Software License - Version 1.0 - August 17th, 2003
Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:
The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
--------------------------------------------------------------------------------
7. Intel(R) Graphics Compute Runtime for oneAPI Level Zero and OpenCL(TM)
Driver (src/gpu/jit/ngen/npack/{elf_structs,hash}.hpp)
Copyright (c) 2018 Intel Corporation
Intel(R) Graphics Compiler (src/gpu/jit/ngen/npack/neo_structs.hpp)
Copyright (c) 2019 Intel Corporation
oneAPI Level Zero (src/sycl/level_zero)
Copyright (C) 2019-2021 Intel Corporation
Doxyrest toolkit (doc/doxyrest/*)
Copyright (c) 2016, Tibbo Technology Inc
Copyright (c) 2016, Vladimir Gladkov
Copyright (c) 2016, Doxyrest maintainers
MIT License
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
8. Sphinx (doc/sphinx/conf/py)
Copyright (c) 2007-2021 by the Sphinx team (see AUTHORS file).
All rights reserved.
2-Clause BSD License
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--------------------------------------------------------------------------------
9. Intel(R) Metrics Discovery Application Programming Interface (src/gpu/ocl/mdapi/metrics_discovery_api.h)
MIT License
Copyright (c) 2019, Intel Corporation
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
10. LLVM (src/graph/backend/graph_compiler/core/src/util/array_ref.hpp)
==============================================================================
The LLVM Project is under the Apache License v2.0 with LLVM Exceptions:
==============================================================================
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
---- LLVM Exceptions to the Apache 2.0 License ----
As an exception, if, as a result of your compiling your source code, portions
of this Software are embedded into an Object form of such source code, you
may redistribute such embedded portions in such Object form without complying
with the conditions of Sections 4(a), 4(b) and 4(d) of the License.
In addition, if you combine or link compiled forms of this Software with
software that is licensed under the GPLv2 ("Combined Software") and if a
court of competent jurisdiction determines that the patent provision (Section
3), the indemnity provision (Section 9) or other Section of the License
conflicts with the conditions of the GPLv2, you may retroactively and
prospectively choose to deem waived or otherwise exclude such Section(s) of
the License, but only in their entirety and only with respect to the Combined
Software.
Previous Releases
Performance Optimizations
Intel® Architecture Processors
- Improved performance for 4th generation Intel® Xeon® Scalable processors (formerly Sapphire Rapids).
- Improved performance for Intel® Xeon® 6 processors (formerly Granite Rapids).
- Improved performance of group normalization primitive.
- Improved bf16 matmul performance with int4 compressed weights on processors with Intel® AMX instruction set support.
- Improved performance of fp8 matmul, pooling, and eltwise primitives on processors with Intel® AMX instruction set support.
- Improved fp32 RNN primitive performance on processors with Intel® AVX2 instruction set support.
- Improved performance of the following subgraphs with Graph API:
- convolution and binary operation fusions with better layout selection in Graph API.
- fp8 convolution and unary or binary on processors with Intel® AMX instruction set.
- Scaled Dot Product Attention (SDPA) without scale, Multi-Query Attention (MQA), and Grouped Query Attention (GQA) patterns.
- LayerNorm, GroupNorm, and SoftMax with int8 quantized output and zero-points.
Intel® Graphics Products
- Improved performance for the Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Introduced broad production quality optimizations for Intel® Arc™ Graphics for Intel® Core™ Ultra Processors (Series 2) (formerly Lunar Lake).
- Introduced broad production quality optimizations for future discrete GPU based on Intel® Xe2 architecture (code name Battlemage).
- Introduced support for Intel® Arc™ Graphics for future Intel® Core™ Ultra Processor (code name Arrow Lake-H).
- Improved performance of fp8_e5m2 primitives on Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved matmul and inner product primitives performance for shapes relevant to large language models (LLMs) on GPUs with Intel® XMX support.
- Improved int8 convolution performance with weight zero points.
- Reduced primitive creation time for softmax, layer normalization, and concat primitives via kernel reuse.
- Improved performance of the following subgraphs with Graph API:
- SDPA without scale, MQA, and GQA patterns. float16 variants of these patterns significantly benefit from Intel® Xe Matrix Extensions (Intel® XMX) support.
- fp8 convolution and unary or binary on Intel® Data Center GPU Max Series.
- LayerNorm, GroupNorm, and SoftMax with int8 quantized output and zero-points.
Functionality
- Introduced generic GPU support. This implementation relies on portable SYCL* kernels and can be used as a starting point to enable new devices in oneDNN.
- Enabled support for int8 activations with grouped scales and int8 or int4 compressed weights in matmul primitive. This functionality is implemented on Intel® GPUs.
- Introduces support for stochastic rounding for fp8 data type functionality.
- [experimental] Extended microkernel API:
- Introduced int8 quantization support.
- Extended transform microkernel with transposition support and support for arbitrary strides.
- Introduced verbose diagnostics support.
- [experimental] Extended sparse API:
- Introduced support for sparse memory with coordinate (COO) storage format.
- Extended matmul primitive to work with sparse memory in COO format.
- Introduced int8 support in eltwise primitive with clip algorithm. This functionality is implemented on Intel® CPUs.
- Graph API:
- Introduced GroupNorm operation and fusions in Graph API.
- Introduced support for standalone StaticReshape and StaticTranspose operations.
Usability
- Added examples for SDPA, MQA, and GQA patterns implementation with Graph API.
- Added an example for deconvolution primitive.
- Added examples for Vanilla RNN and LBR GRU RNN cells.
- Introduced support for Intel® oneAPI DPC++/C++ Compiler 2025.0.
- Introduced interoperability with SYCL Graph record/replay mode.
- [experimental] Introduced logging mechanism based on spdlog library.
- Introduced support for ONEDNN_ENABLE_WORKLOAD build knob for Graph API.
- Improved performance of get_partitions() function in Graph API.
Deprecated Functionality
-
Experimental Graph Compiler is deprecated and will be removed in future releases.
Performance Optimizations
-
Intel® Architecture Processors:
- Improved performance for 4th generation Intel® Xeon® Scalable processors (formerly Sapphire Rapids).
- Improved performance for the future Intel® Xeon® Scalable processors (code-named Sierra Forest and Granite Rapids).
- Improved performance of group normalization primitive.
- Improved performance of matmul primitive with sum post-op for batched cases on processors with Intel® AMX instruction set support.
- Improved performance of the following subgraphs with Graph API:
- Multi-Query Attention (MQA).
- Scaled Dot Product Attention (SDPA), including the variant with
select
operation. LayerNorm
+Multiply
+Quantize
produced by SmoothQuant algorithm.Convolution
+Sigmoid
+Multiply
with mixed precisions.
-
Intel® Graphics Products:
- Improved performance for Processor Graphics based on Xe2 architecture.
- Improved performance for the Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel® Arc™ graphics (formerly Alchemist and DG2) and the Intel® Data Center GPU Flex Series (formerly Arctic Sound).
- Improved RNN primitive performance for LSTM cell case.
- Improved performance of f8_e4m3 data type emulation on Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
-
AArch64-based Processors:
- Improved convolution forward propagation, matmul, and softmax performance for processors with SVE support.
- Improved bf16 matmul, convolution, and reorder primitives performance with Arm Compute Library (ACL).
- Improved eltwise primitive performance with
gelu_erf
algorithm with ACL.
Functionality
- Introduced sum and binary post-ops support for layer normalization primitive. This functionality is currently implemented on CPUs only.
- Introduced support for int4 data type and extended quantization model with support for grouped scales and zero points.
- Introduced fp64 matmul support. This functionality is currently implemented on Intel® GPUs with hardware acceleration for fp64 math only.
- Extended floating point math mode API to support weight decompression scenarios. See matmul weights decompression example to get started. New floating mode is supported in the following configurations:
- bfloat16 matmul with int8 weights on Intel® CPUs.
- float16 and bfloat16 matmul with int8 or int4 weights on Intel® GPUs.
- [experimental] Introduced microkernel API for Intel® Architecture Processors. This API exposes internal mechanisms used in matmul and convolution implementation to expert users.
Usability
- Extended error messages for engine and memory objects creation errors.
- Extended verbose mode diagnostics with information on dispatching decisions for all primitives.
- Introduced support for clang++ host compiler in SYCL builds.
- Introduced API for tensor serialization and deserialization.
- Extended verbose mode diagnostics for Graph API with information on pattern matcher decisions.
- Introduced OpenCL runtime support for Graph API.
- Added support for building oneDNN with installed Arm Compute Library (ACL).
Validation
- Extended benchdnn with support for tensor tags in RNN primitive validation.
Performance Optimizations
-
Intel® Architecture Processors:
- Improved performance for 4th generation Intel® Xeon® Scalable processors (formerly Sapphire Rapids).
- Improved performance for the future Intel® Xeon® Scalable processors (code-named Sierra Forest and Granite Rapids). These optimizations are now included by default on compatible processors.
- Improved RNN primitive performance with LBR_GRU cell.
- Improved softmax performance on processors with Intel® AVX2 or Intel® AVX-512 instruction set support.
- Improved fp32 inner product performance on processors with Intel® AVX2 instruction set support.
- Improved fp32, fp16, bf16 matmul primitive performance on processors with Intel® AVX-512 and Intel® AMX instruction set support.
- Improved int8 matmul performance with transposed A tensor.
- Improved performance of resampling primitive on processors with Intel® AVX2 instruction set support.
- Improved performance of int8 convolution with post-ops.
- Optimized batch matmul with binary post-op and broadcast mask
1
and14
. - Improved the Scaled Dot Product Attention (SDPA) subgraph performance with Graph API.
- Improved performance of subgraphs including
matmul
andadd
operations and mixed int8 and bfloat16 data types with Graph API. - [experimental] Improved performance of
reduction
,softmax
andlayernorm
operations with experimental Graph Compiler backend. - [experimental] Improved performance for llama2 MLP subgraph with experimental Graph Compiler backend.
-
Intel® Graphics Products:
- Introduced initial optimizations for Processor Graphics based on Xe2 architecture.
- Improved performance for the Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel® Arc graphics (formerly Alchemist and DG2) and the Intel® Data Center GPU Flex Series (formerly Arctic Sound).
- Improved matmul performance for cases relevant to Large Language Models (LLMs) and Transformer-like models.
- Improved convolution performance for cases relevant to the Stable Diffusion model.
- Improved RNN primitive performance.
- Improved pooling forward propagation performance.
- Improved batched matmul performance for cases with 5 dimensions or more.
-
AArch64-based Processors:
- Added an option to build oneDNN with macOS Accelerate library to improve performance on Apple silicon.
- Improved reorder primitive performance with Compute Library for the Arm architecture (ACL).
- Improved bf16 inner product product primitive performance with ACL.
Functionality
- Introduced GPT-Q support to improve Large Language Models (LLMs) performance with compressed weights. Optimized implementation is available for Intel® Graphics Products and support matmul with int8 wight compression.
- Introduced fp8 data type support in primitives and Graph API. Optimized implementation is available for Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Introduced support for fp16 and bf16 scale and shift arguments for layer normalization. Optimized implementation is available for Intel Graphics Products.
- [experimental] Introduced unstructured sparsity support for processors with Intel® AMX® support relying on VCOMPRESS/VPEXPAND instructions.
- Intel® Graphics Products
- Introduced support for Intel® Data Center GPU Max 1550VG
- Introduced PReLU post-op support for inner product and matmul primitives.
Usability
- Added opt-in deterministic mode support. Deterministic mode guarantees that results are bitwise identical between runs in a fixed environment.
- Introduced accumulation mode control.
- Extended oneDNN verbose diagnostics with information on dispatching decisions in convolution and matmul implementations.
- Extended verbose diagnostics for Graph API with information for operation schema check results and pattern matching results.
- Reduced RNN primitive memory consumption on GPUs.
- Added examples demonstrating use of oneDNN Graph API in eager mode use cases.
- Extended tensor constructor in Graph API to support memory allocation and management by the library.
- Introduced new API and environment variable to manage Graph API constant tensor cache capacity.
- Improved the efficiency of pattern matching in Graph API by optimizing pattern registration, reducing pattern numbers, and skipping patterns more wisely.
- Changed default optimization flags for AArch64 builds to
-mcpu=generic
to improve portability.
Validation
- Improved benchdnn performance by optimizing bottlenecks in validation code.
- Introduced
--num-streams
knob in benchdnn to support benchmarking in multi-stream scenarios.
Known Limitations
- Intel® Datacenter GPU Flex Series driver for Windows has an issue resulting in program hangs or crashes when oneDNN primitives are created concurrently.
- int8 concat primitive may produce incorrect results on integrated GPUs with current GPU driver.
- fp32 pooling primitive may produce incorrect results in rare conditions on Intel® Datacenter GPU Max Series with current GPU driver.
- reorder primitive causes segmentation fault for prime sizes exceeding 2^31 on Intel CPUs.
- fp64 convolution and deconvolution produces incorrect results on integrated graphics in future Intel® Core processors (code name Arrow Lake)
- int8 matmul primitive creation with fp32 bias fails on Intel® GPU Flex Series and Intel® Arc Graphics.
Breaking Changes
- Updated minimal supported ACL version to 23.11 (was 23.02.1).
Performance Optimizations
- Intel Architecture Processors:
- Improved performance for 4th generation Intel® Xeon® Scalable processors (formerly Sapphire Rapids).
- Improved int8 convolution performance with zero points on processors with Intel® AMX instruction set support.
- Improved performance for the future Intel® Xeon® Scalable processors (code-named Sierra Forest and Granite Rapids). This functionality is disabled by default and can be enabled via CPU dispatcher control.
- Improved fp32 and int8 convolution performance for cases with small numbers of input channels for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and/or Intel® AMX instruction set support.
- Improved s32 binary primitive performance.
- Improved fp16, fp32, and int8 convolution performance for processors with Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions support.
- Improved performance of subgraphs with convolution, matmul, avgpool, maxpool, and softmax operations followed by unary or binary operations with Graph API.
- Improved performance of convolution for depthwise cases with Graph API.
- [experimental] Improved performance of LLAMA2 MLP block with Graph Compiler.
- Intel Graphics Products:
- Improved performance for the Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and the Intel® Data Center GPU Flex Series (formerly Arctic Sound-M).
- Reduced RNN primitive initialization time on Intel GPUs.
- AArch64-based Processors:
- Improved fp32 to bf16 reorder performance.
- Improved max pooling performance with Arm Compute Library (ACL).
- Improved dilated convolution performance for depthwise cases with ACL.
Functionality
- Introduced group normalization primitive support. The functionality is currently available on CPUs.
- Intel CPUs:
- Introduced support for zero points in int8 convolution with groups and 3D spatial.
Usability
- Extended verbose mode output:
- Improved diagnostics on engine creation errors.
- Added information on Graph API calls.
- Added information on strides for non-dense memory objects.
- Added values of runtime dimension.
- Added indication that primitive descriptor was created with
any
memory format tag.
- Introduced examples for Graph API.
- Graph API constant tensor cache is now disabled by default and requires opt-in with
dnnl::graph::set_constant_tensor_cache()
call. - Reduced oneDNN Graph API memory consumption in certain scenarios.
Validation
- Extended benchdnn performance reporting with primitive creation time.
- Introduced cold cache mode in benchdnn.
Known Limitations
- Current GPU OpenCL runtime for Linux has an issue resulting in convolution producing incorrect results on integrated GPUs based on Xe architecture. SYCL configuration is not affected.
- Pooling, resampling, prelu, batch normalization, layer normalization, and eltwise primitives may sporadically produce incorrect results on Intel® Arc GPUs on Windows.
- Current GPU driver for Linux has an issue resulting in program hangs or crashes when oneDNN primitives are executed concurrently on Intel® Datacenter GPU Max Series.
- Extensive use of RNN primitive on Intel GPUs with default primitive cache setting may lead to a device reboot. Workaround: consider reducing primitive cache size to 100.
- Int8 deconvolution with signed weights and activations may produce incorrect results of processors with Intel® AMX support.
Performance Optimizations
- Intel® Architecture Processors:
- Improved performance for 4th generation Intel® Xeon® Scalable Processor (formerly Sapphire Rapids).
- Improved performance for future Intel® Xeon® Scalable Processor (code-named Sierra Forest). The functionality is disabled by default and can be enabled via CPU dispatcher control.
- Improved fp32 inner product performance for processors with Intel® AVX-512 instructions support.
- Improved bf16 and int8 matmul performance with runtime dimensions for processors with Intel AMX® instructions support.
- Intel® Graphics Products:
- Improved performance for Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel® Arc Graphics (formerly Alchemist and DG2) and Intel® Data Center GPU Flex Series (formerly Arctic Sound-M).
- Reduced creation time for matmul, inner product, and RNN primitives.
- AArch64-based Processors:
- Improved convolution performance with post-ops on processors with SVE support.
- Improved fp32 and fp16 depth-wise convolution performance with Arm Compute Library (ACL).
- Improved fp32 deconvolution performance for math mode bf16 or any with ACL.
- IBM Z Platform:
- Improved int8 matmul, inner product, and RNN performance for s390 z15 systems.
Functionality
- [experimental] Introduced Graph Compiler backend for Graph API. Graph Compiler improves performance of composite operations like multi-head attention (MHA), multi-level perceptron (MLP), and convolution residual blocks for processors with Intel AVX-512 and Intel® AMX® instructions support.
- Extended Graph API with boolean data type, select, and pow operations.
- Introduced support for binary and eltwise post-ops in softmax primitives.
- Introduced reference SYCL implementations of batch normalization, layer normalization, linear response normalization (LRN), binary, softmax, eltwise, pooling, PReLU, shuffle, and resampling primitives. These implementations address functional gaps on NVIDIA and AMD GPUs where support is missing in native libraries.
- Intel® Graphics Products:
- Introduced mixed precision support for binary primitives.
- NVIDIA GPUs:
- Introduced bfloat16 support for deconvolution and softmax primitives.
- AMD GPUs:
- Introduced support for inner product, convolution, deconvolution, batch normalization, and reorder primitives support.
Usability
- Extended verbose mode with additional capabilities, including information about implementation dispatching decisions and reasons for primitive creation errors.
- Reduced stack consumption to less than 20 KB across implementations.
- [experimental] Introduced profiling API for SYCL and OpenCL applications.
Validation
- Introduced fast performance validation mode (--mode=F) in benchdnn. Testing speed is improved by initializing oneDNN objects in parallel and avoiding use of host memory when benchmarking GPU primitives.
- Reduced benchdnn memory consumption in performance validation mode.
- Introduced smoke test set for benchdnn. This test set provides basic validation for all primitives.
Known Limitations
- Intel® Architecture Processors:
- fp32 matmul with bfloat16 binary post-op may produce incorrect results on processors with Intel® AVX2 and Intel® DL Boost support.
- fp32 convolution forward propagation with strides has performance regression on processors with Intel® AVX-512 instructions support.
- Resampling primitive with binary post-op may produce incorrect results on CPUs.
- 3D fp32 matmul primitive with transposed source tag may produce incorrect results on CPUs with Intel® AVX-512 instructions support.
- Intel® Graphics Products:
- Convolution and deconvolution primitives on Intel® Arc GPU on Windows may lead to memory corruption under heavy repeated use.
- Extensive use of RNN primitive on Intel GPUs with default primitive cache setting may lead to a device reboot. Workaround: consider reducing primitive cache size to 100.
- bfloat16 matmul primitive may crash on Intel® Arc GPUs on Windows.
- bfloat16 matmul primitive has performance regression with shapes 14x128:128x200:14x200 and 200x128:128x200:200x200 on Intel® Data Center GPU MAX Series.
- Pooling and resampling may sporadically produce incorrect results on Intel® Arc GPUs on Windows.
- Inner product weight gradient may produce incorrect results on Intel® Processor Graphics on Windows.
- oneDNN primitives may crash or produce incorrect results with tensors exceeding 4 Gb in size.
- softmax primitive with NHWC memory format may produce incorrect results on Intel® Data Center GPU Max Series.
- oneDNN Graph partititions containing ConvTransposeBackwardWeights or int8 matmul operations may produce incorrect results on Intel® Processor Graphics on Windows.
Performance Optimizations
- Intel® Architecture processors:
- Improved performance for 4th generation Intel® Xeon Scalable processor (formerly Sapphire Rapids).
- Introduced initial optimizations for future Intel® Xeon Scalable processor (code name Sierra Forest). The functionality is disabled by default and should be enabled via CPU dispatcher control..
- Intel® Processor Graphics and Xe architecture-based Graphics::
- Improved performance for Intel® Data Center GPU Max Series (formerly Ponte Vecchio).
- Improved performance for Intel® Arc graphics (formerly Alchemist and DG2) and Intel® Data Center GPU Flex Series (formerly Arctic Sound-M).
- Improved concat primitive performance with per-argument scales on Intel® GPUs.
Functionality
- Enabled Graph API as a production feature. Graph API is intended to simplify oneDNN integration into frameworks.
- Added an option to zero-out weight gradient in RNN primitive. See details in corresponding RFC.
- Added support for the non-zero alpha parameter in the batch normalization ReLU post-op on Intel® GPUs.
- Enabled the layer normalization primitive with f64 datatype support on Intel® GPUs.
Deprecated Functionality
- Legacy CPU-only configurations are deprecated and will be removed in oneDNN 2024 release.
What's New
- Deliver production quality AI Deep Learning optimizations for 4th Gen Intel® Xeon® Scalable processor, Intel® Xeon® processor Max Series, Intel® Data Center GPU Flex Series, and Intel® Arc™ A-Series GPUs
- With support for S8/S8 weights and activations enable greater input influence on the outcomes on 4th Gen Intel® Xeon® Scalable processor with Intel® Advanced Matrix Extensions (Intel® AMX) acceleration instruction set
- Support wider operators -BF32 on 4th Gen Intel® Xeon® Scalable processor and TF32 Intel® Data Center GPU Flex Series and , Intel® Max Series GPUs for more accurate inferencing
- Enable limited support for FP64 operators on Intel® Data Center GPU Max Series GPUs for high precision model deployment
- Deliver experimental Graph API support (opensource only) to simplify integration to frameworks and extend optimization capabilities
Performance Optimizations
- Intel® Architecture processors:
- Improved performance for 4th generation Intel® Xeon Scalable processor (formerly Sapphire Rapids).
- Introduced performance optimizations for bf16 floating point math mode on 4th generation Intel® Xeon Scalable processors (code name Sapphire Rapids). The bf16 math mode allows oneDNN to use bf16 arithmetic and Intel® AMX instructions in computations on fp32 data.
- Introduced FP16 support and initial optimizations for future Intel® Xeon Scalable processor (code name Granite Rapids).
- Intel® Processor Graphics and Xe architecture-based Graphics::
- Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
- Introduced performance optimizations for tf32 floating point math mode on future Xe Architecture graphics (code name Ponte Vecchio). The tf32 math mode allows oneDNN to use tf32 arithmetic in computations on fp32 data.
- Improved performance for Intel® Arc graphics (formerly Alchemist and DG2) and Intel® Data Center GPU Flex Series (formerly Arctic Sound-M).
Functionality
- Introduced runtime output scales support in all primitives.
- Introduced scales support in concat primitive.
- Extended floating point math mode API with tf32 data type option.
- Extended eltwise primitive with support for
hardsigmoid
algorithm. - Extended layer normalization primitive with support for mixed source and destination data types.
- Extended depthwise post-op with support for arbitrary padding size. The implementation is available only on Intel processors.
- Added limited fp64 data type support in convolution primitive. Optimized implementation is available for future Xe Architecture graphics (code name Ponte Vecchio).
- Extended int8 convolution and deconvolution implementations on GPUs with arbitrary destination data type support.
- Extended batch normalization primitive with
dnnl_fuse_norm_add_relu
flag that allows to fuse sum and relu operations. The implementation is available for Intel GPUs. - Extended GPU deconvolution primitive implementation with support for output scales and zero points.
- Introduced new quantization scheme. Major changes include support for per-argument runtime scales in all primitives and unquantized bias.
- Introduced support for Intel DPC++/C++ Compiler 2023.0, including new features from the SYCL 2020 standard.
- Extended persistent cache to cover GPU engine object. This improvement allows applications to further reduce oneDNN initialization time.
- Extended threadpool API with a function to indicate maximum available concurrency.
- Extended binary primitive implementation on GPU with bfloat16 source and int8 destination support.
Usability
- Added
matmul_perf
example that benchmarks matmul primitive for all supported data types. - Introduced annotations for JIT kernels to allow profilers like Linux perf to correctly label JIT code.
- Extended verbose logs converter with RNN primitive support.
- Added verbose output for
dnnl_*gemm*
calls. - Removed Level Zero headers from the list of build time dependencies.
- Extended the set of supported format tags to cover formats used in applications.
Deprecated Functionality
- Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in the future releases.
- Static output scales are deprecated and will be removed in the next release.
- Convolution Winograd algorithm implementation for int8 data type is deprecated and will be removed in the next release.
Breaking Changes
- Changed formula for AUGRU RNN cell to align with Tensorflow. See proposal for details.
- Removed deprecated APIs.
- Removed operation descriptor object and made memory descriptor object opaque. See details in operation and memory descriptors RFC.
- Removed creation time primitive scales support and primitive output scales support. See details in quantization scaling RFC.
- Removed support for Intel DPC++/C++ Compiler with SYCL 1.2.1 (aka SYCL 2017) standard.
- Removed Winograd convolution implementation for int8 data type.
No change from 2022.1 version to 2022.2 version.
Performance Optimizations
- Intel® Processor Graphics and Xe architecture-based Graphics:
- Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
- Improved performance for future Arc graphics (code name Alchemist and DG2).
- Intel® Architecture processors
- Improved performance for future Intel® Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16 or later.
- Improved performance of matmul primitive for processors with Intel® AVX-512 support.
New Functionality
- Introduced bfloat16 destination support for int8 convolution, matmul and inner product primitives for processors with Intel® AVX-512 support and or future Intel® Xeon® Scalable processors (code name Sapphire Rapids)
- Extended RNN primitive with support for AUGRU cell.
- Added support for non-zero negative slope in ReLU post-op for batch normalization primitive.
- Introduced support for mixed source and destination data types in softmax primitive.
- Introduced persistent cache API. This functionality allows to serialize and reuse JIT kernels.
Usability
- Reduced stack consumption in GEMM implementation.
Breaking Changes
- Removed performance optimizations for Intel® Xeon® Phi processors. oneDNN will continue to be functional on these processors using Intel® AVX2 codepath..
Deprecated Functionality
- Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.
Known issues and limitations
- See DPC++ limitations that impact the library as well.
Performance Optimizations
- Intel® Processor Graphics and Xe architecture-based Graphics:
- Introduced initial optimizations for future Xe Architecture graphics (code name Ponte Vecchio).
- Improved pooling and layer normalization primitives performance.
- Intel® Architecture processors
- Improved performance for future Intel® Xeon Scalable processors (code name Sapphire Rapids). The functionality is now enabled by default and requires Linux kernel 5.16.
- Improved performance of matmul primitive for processors with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) support.
New Functionality
- Introduced support for compiler with SYCL 2020 standard support.
- Introduced support for the ICX/ICPX and DPCPP compiler drivers available in the Intel® oneAPI DPC++ Compiler.
Usability
- Added environment variables and build options with 'ONEDNN' prefix.
Breaking Changes
- The Intel MKL-DNN compatibility API is removed. See Transition from Intel® MKL-DNN to oneDNN page for instructions on moving to the new API.
Deprecated Functionality
- Support for Intel® Xeon Phi processors is deprecated and will be removed in the next release.
- Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.
Known issues and limitations
- See DPC++ limitations that impact the library as well.
Performance Optimizations
- Improved primitive cache performance for Intel Graphics products.
- Intel® Processor Graphics and Xe architecture-based Graphics:
- Introduced initial optimizations for future Intel® Arc™ Graphics codenamed Alchemist (ACM). That includes optimizations of compute-bound primitives (Convolution, GEMM) for s8/u8, f16 and bf16 datatypes via DPAS (Dot Product Systolic Accumulate) instructions.
- Improved performance of convolution and deconvolution after some OpenCL kernels were re-implemented using kernel code generator (jit:ir implementation as reported by DNNL_VERBOSE).
- Intel® Architecture processors
- Improved performance for future Intell® Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved binary primitive performance for cases when one of the tensors is broadcasted.
- Improved reorder primitive performance for memory formats with padding and/or zero points.
- Improved performance of reduction primitive, reorder, shuffle primitives.
- Improved performance of depthwise forward convolution primitive for processors with Intel® AVX512 support.
- Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel® AVX512 support.
- Improved int8 GEMM performance for processors with Intell® AVX2 and Intel® DL Boost support.
New Functionality
- Introduced PReLU post-op support in convolution and matmul.
- Extended maximum allowed post-ops chain for compute primitives (convolution, deconvolution, inner product, and matmul) to 32.
- Introduced support for zero points in sum post-op for convolution and matmul. The functionality is implemented only for CPUs.
- Extended binary primitive with support for mixed data types for input tensors. The functionality is implemented only for CPUs.
- Extended sum post-op for convolution and matmul primitives with support for mixed data types. The functionality is implemented only for CPUs.
Usability
- Reduced overall library size by trimming down use of templates, OpenCL headers, and TBB headers. The configurations that benefitted the most are CPU only configuration with TBB threading.
Deprecated Functionality
- Intel MKL-DNN compatibility API is deprecated and will be removed in the next update. See Transition from Intel MKL-DNN to oneDNN page for instructions on moving to new API.
- Support for Intel Xeon Phi processors is deprecated and will be removed in the next release.
Known issues and limitations
- See DPC++ limitations that impact the library as well.
Performance Optimizations
- Extended primitive cache to improve primitive descriptor creation performance.
- Improved primitive cache performance in multithreaded configurations.
- Intel® Processor Graphics and Xe architecture-based Graphics:
- Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats.
- Improved performance of reduction primitive.
- Improved performance of depthwise convolution primitive with NHWC activations for training cases
- Intel® Architecture processors
- Introduced initial optimizations for bfloat16 functionality for future Intel® Xeon Scalable processor with Intel® AMX support (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved performance of int8 compute functionality for future Intel® Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Introduced initial performance optimizations for future Intel® Core processor with Intel® AVX2 and Intel® DL Boost instructions support (code name Alder Lake).
- Improved performance of int8 primitives for processors with Intel® SSE4.1 instruction set support.
- Improved performance of int8 and bfloat16 RNN and inner product primitives.
- Introduced CPU ISA hints environment variable and API. New API is intended to dispatch function implementations using YMM registers to improve performance on processors with a single Intel® AVX512 compute unit.
- Improved forward convolution performance for Intel® AVX-512 systems.
- Improved convolution and batch normalization performance with threadpool.
- Improved performance of bfloat16 shuffle primitive.
- Improved performance of `dnnl_gemm` and functionality relying on this implementation for cases with `n=1` on all supported processors.
New Functionality
- Extended batch normalization and layer normalization primitives API to take separate scale and shift arguments.
- Extended resampling primitive with post-ops support and mixed source and destination data types..
Usability
- Introduced support for DPC++ debug configuration on Windows
Breaking changes
- Updated minimal supported CMake version from to 2.8.12 (was 2.8.11)
Known issues and limitations
- Backward inner product primitive may produce incorrect result for the shapes with number of output channels not been multiple by 16 for future Intel Xeon Scalable processor (code name Sapphire Rapids)
- Convolution with binary post-op may produce incorrect results for formats with channel padding.
- Pooling and batch normalization primitives may hang on Windows GEN9 and DG1 in DPC++/L0 configuration.
- Pooling and batch normalization primitives with 4D double blocked memory formats may produce NaNs or hang on Linux DG1 platforms.
- See DPC++ limitations that impact the library as well.
Performance Optimizations
- Reduced overheads associated with primitive cache.
- Intel® Processor Graphics and Xe architecture-based Graphics:
- Improved performance of int8 primitives with NHWC activations format.
- Improved functionality performance for padded memory formats.
- Improved performance of reorder and shuffle primitives for multiple formats and all dimensions.
- Improved performance of fp16 pooling primitive.
- Improved performance of lnorm primitive for plain memory formats.
- Improved performance of resampling primitive for blocked memory formats .
- Improved performance of Winograd convolution.
- Intel® Architecture processors
- Introduced initial optimizations for bfloat16 functionality for future Intel® Xeon Scalable processor with Intel® AMX support (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Improved performance of int8 compute functionality for future Intel® Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via CPU dispatcher control.
- Introduced initial performance optimizations for future Intel® Core processor with Intel® AVX2 and Intel® DL Boost instructions support (code name Alder Lake).
- Improved performance of int8 primitives for processors with Intel® SSE4.1 instruction set support.
- Improved performance of int8 and bfloat16 RNN and inner product primitives.
- Introduced CPU ISA hints environment variable and API. New API is intended to dispatch function implementations using YMM registers to improve performance on processors with a single Intel® AVX512 compute unit.
- Improved forward convolution performance for Intel® AVX-512 systems.
- Improved convolution and batch normalization performance with threadpool.
- Improved performance of bfloat16 shuffle primitive.
- Improved performance of `dnnl_gemm` and functionality relying on this implementation for cases with `n=1` on all supported processors.
New Functionality
- Introduced binary post-op for (de)-convolution, pooling, eltwise, binary, inner product, matmul and reduction (GPU only) along with performance optimizations for CPUs and GPUs. Extended the number of supported post-ops for primitives to 20.
- Extended eltwise primitive with support for `logsigmoid`, `mish`, `hardswish`, and `clip_v2` algorithms.
- Introduced support for PRelu primitive.
- Introduced int8 support for LSTM primitive with projection for CPU.
- Introduced asymmetric quantization support for int8 deconvolution.
- Extended matmul implementation with support for per-output channel zero-points for quantization.
- Extended support for broadcasting in binary primitive to both inputs for CPU.
- Extended binary primitive with support for comparison operators.
- Introduced float16 support in reduction primitive for GPU.
- Introduced support for mixed input and output types in binary primitive for GPU.
- Introduced support for post-ops in GPU resampling implementation.
Usability
- Added API to enable displaying timestamps in oneDNN verbose mode. Timestamps allow to use oneDNN verbose output in profiling tools.
- Improved presentation of oneDNN primitives in Intel® VTune™ Profiler.
Validation
- Extended benchdnn to report operation bandwidth.
-
Added ability to choose target GPU in benchdnn.
Known issues and limitations
- When using driver version older than 27.20.100.9316 for Intel® UHD Graphics for 9th Gen Intel® Processor on Windows, convolution/de-convolution functions may sporadically hang or produce incorrect results in DPC++ configuration with LevelZero. Please upgrade your driver version to fix the issue. An alternative solution is to use DPC++ with OpenCL backend with DPC++ compiler.
- Reorder, prelu, softmax, and pooling primitives on GPUs may be slower for zero padded memory formats than Intel oneDNN 2021.1.
- Reorder operation for 5D tensor with two dimensions equal to 16 and one uneven dimension can produce incorrect results on Intel® Iris® Xe Max Graphics.
- Eltwise primitive may produce incorrect results for oneDNN DPC++ configuration with Level Zero runtime. In order to avoid this, use DPC++ with OpenCL backend with DPC++ compiler.
- Deconvolution primitive may segfault with int8 data on processors for cases with non-trivial padding on processors with Intel AVX-512 support.
- Deconvolution primitive may segault with int8 data when used with post-ops and per_oc broadcast on processors with Intel AVX2 support.
- Pooling, batch normalization, and binary primitives may segfault when executed on Xe architecture-based graphics. No workaround available.
- Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
- When running GPU kernels that take longer than a certain time (it depends on OS and system settings), you may face a situation resulting in apparent hang of the application. There are ways to configure driver or system settings to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including oneDNN examples:
- On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
- On Windows* (See more details at Timeout Detection and Recovery (TDR) Registry Keys):
Increase TdrDelay and TdrDdiDelay values in registry
- On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
- See DPC++ limitations that impact the library as well.
New Functionality
- Introduced SYCL API extensions compliant with oneAPI specification v1.0.
- Introduced support for Intel® oneAPI DPC++/C++ compiler.
- Introduced Unified Shared Memory (USM) support for Intel Processor Graphics and Xe architecture-based graphics.
Known issues and limitations
- Pooling, batch normalization, and binary primitives may segfault when executed on Xe architecture-based graphics. No workaround available.
- Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
- When running GPU kernels that take longer than a certain time (it depends on OS and system settings), you may face a situation resulting in apparent hang of the application. There are ways to configure driver or system settings to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including oneDNN examples:
- On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
- On Windows* (See more details at Timeout Detection and Recovery (TDR) Registry Keys):
Increase TdrDelay and TdrDdiDelay values in registry
- On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
- See DPC++ limitations that impact the library as well.
Notices and Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.