This page provides the current Release Notes for Intel® Math Kernel Library (Intel® MKL). The notes are categorized by year, from newest to oldest, with individual releases listed within each year.
Click a version to expand it into a summary of new features, changes, and known issues in that version since the last release, or the buttons under each major release to see important information, such as pre-requisites, software compatibility, and installation instructions.
You can copy a link to a specific version's section by clicking the chain icon next to its name.
To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.
2020
Installation Guide System Requirements Bug Fix Log
Update 4
- Graph:
- Improved performance of mxv with CSR matrices and dense matrices for large workloads without the optimization step.
- BLAS:
- Introduced Intel® Advanced Matrix Extensions (Intel® AMX) based GEMM with MKL_ENABLE_INSTRUCTIONS=AVX512_E4.
- gemm_s8u8s32() for Fortran
- cblas_gemm_s8u8s32() for C
- cblas_gemm_bf16bf16f32() for C
- Introduced Intel® Advanced Matrix Extensions (Intel® AMX) based GEMM with MKL_ENABLE_INSTRUCTIONS=AVX512_E4.
- LAPACK:
- Aligned Intel® MKL LAPACK functionality with Netlib LAPACK 3.9.0: Added QR-preconditioned SVD (?GESVDQ) and Householder reconstruction (?{OR,UN}HR_COL and ?{OR,UN}GTSQR) routines.
- Sparse:
- Improved performance of the solving phase of PARDISO direct sparse solver for complex matrices.
- Vector Math:
- Added support for VML in the Strided API VM implementation (New functionality was introduced in Intel® MKL 2020 U2, allowing the use of non-unit increments when traversing input and output arrays with vector mathematical functions).
- Performance improvement for several Vector Math functions:
- Real single precision: Cos, Div, MaxMag, MinMag, NextAfter, Powx, SinCos, Tanh by 5% to 8% for AVX-512; Tgamma by 7% for Intel® Advanced Vector Extensions 2 (Intel® AVX2).
- Real double precision: Erfinv by 27% for Intel® Advanced Vector Extensions 512 (Intel® AVX-512), ExpInt1 by 6% on AVX/AVX2; Fdim by 8% for Intel® Advanced Vector Extensions (Intel® AVX).
- Complex single precision: Abs, Add, Arg, CIS, Cos, Div, Ln, Mul, MulByConj, Pow, Powx, Sin, Sub by 5% to 47% for Intel® AVX, Intel® AVX2, or Intel® AVX-512.
- Complex double precision: Add, CIS, Cos, Mul, MulByConj, Pow, Powx, Sin, Sub by 5% to 22% for Intel® AVX, Intel® AVX2, or Intel® AVX-512.
Known Limitations
- CHERK/ZHERK may return inaccurate results when alpha is zero and beta is one on non-Intel x86-compatible processors.
Deprecation Notices
- MPICH2 Windows support to align with the mpich.org support matrix and will be removed in a future release. mpich.org recommends that users switch to MS MPI.
- Compaq Visual Fortran (CVF) interfaces for Windows* IA32 and will be removed in a future release.
Product Content
Intel® MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2020 Composer, Professional, or Cluster Edition. Download Link.
- Intel® System Studio 2020 Composer, Professional, or Ultimate Edition. Download Link.
Intel® MKL distribution consists of one package for both IA-32 and Intel® 64 architectures and as an alternate choice, an online installer
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel® MKL, including FAQs, tips and tricks, and other support information, please visit the Intel® MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel® MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel® MKL, please purchase either Intel® Parallel Studio XE 2020 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html."
*Other names and brands may be claimed as the property of others.
Update 3
- Addressed performance regressions issue introduced in Intel® MKL 2020 Update 2.
Update 2
- Graph:
- Significant API changes were made to enable better consistency and uniformity.
- Added new API for transposing graph data.
- Added support for sparse vectors and Compressed Sparse Column (CSC) matrix format.
- Added support for vector-times-matrix multiply (vxm).
- Added support for new semirings and extended support for descriptor flags which can be used for Connected Components(CC), Triangle Count (TC), Betweenness Centrality (BC) and Breadth First Search (BFS).
- Added PLUS accumulator for mxv which can be used for PageRank (PR).
- BLAS
- Introduced {cblas_}?axpy_batch APIs.
- Introduced bfloat16 data type support for GEMM and pack-API.
- Fixed a parameter validation error of cblas_zgemmt to allow CblasConjTrans value.
- LAPACK:
- Improved performance of {D,S}GESDD for case jobz='N'.
- ScaLAPACK:
- Aligned Intel® MKL ScaLAPACK functionality with Netlib ScaLAPACK 2.1.0. and added robust ScaLAPACK routines for computing the QR factorization with column pivoting.
- Vector Math:
- Introduced the Strided API VM feature, which adds new functionality in Intel® MKL VM by allowing the use non-unit increments when traversing input and output arrays with vector mathematical functions.
- Vector Statistics:
- Improved performance for the threading version of Sobol quasi-random number generator in case of registered user-defined parameters.
Known Limitations
- Issue: Performance regressions may occur on non-Intel x86-compatible processors. These regressions will be addressed in a future release.
- Running the xhpl_intel64_static binary requires the dynamic (shared) Intel(R) MPI runtime library. Intel(R) MPI 2019 and after has a libfabric.so.1 dependency which may result in incompatibilities. To work around this issue, use xhpl_intel64_dynamic (dynamic version of binary) with the runme_intel64_dynamic script or build the xhpl binary using the build.sh script.
Deprecation Notices
- Intel® MKL Windows ia32 Compaq Visual Fortran (CVF) interface (mkl_intel_s.lib and mkl_intel_s_dll.lib) is deprecated and will be removed in future version.
Product Content
Intel® MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2020 Composer, Professional, or Cluster Edition. Download Link.
- Intel® System Studio 2020 Composer, Professional, or Ultimate Edition. Download Link.
Intel® MKL distribution consists of one package for both IA-32 and Intel® 64 architectures and as an alternate choice, an online installer
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel® MKL, including FAQs, tips and tricks, and other support information, please visit the Intel® MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel® MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel® MKL, please purchase either Intel® Parallel Studio XE 2020 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html."
*Other names and brands may be claimed as the property of others.
Update 1
- Graph:
- Introduced Graph functionality as a preview feature that supports the sparse linear algebra operations and semirings used in PageRank and Triangle Count algorithms. This functionality was inspired by the GraphBLAS C API specification.
- BLAS
- Improved GEMM3M performance on Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architecture sets.
- LAPACK:
- Improved performance of {D,S}GESDD for case jobz='N'.
- ScaLAPACK:
- Introduced distributed nonsymmetric eigensolver functionality (P?GEEVX) for computing eigenvalues and optionally eigenvectors for a general nonsymmetric matrix.
- FFT:
- Improved performance for FFTs with Intel® MKL threading building block layer (Intel® TBB) on CPUs.
- Vector Math:
- Fixed issue that caused a segmentation fault for very large arrays in complex functions on Intel® 64 architecture in the LP64 data model.
- Data Fitting:
- Introduced a fix of integration routine for multi-limits computation mode.
Known Limitations
- Issue: Intel® MKL FFT: complex-to-complex, in-place, 32-bit DFT may return incorrect results for sizes divisible by 7 on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures. Work around: add “mkl_enable_instruction(MKL_ENABLE_AX2);” to the application if this particular configuration and size is encountered.
Deprecation Notices
- None.
Product Content
Intel® MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2020 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe. - Intel® System Studio 2020 Composer, Professional, or Ultimate Edition
Download from /content/www/us/en/develop/tools/system-studio.html.
Intel® MKL distribution consists of one package for both IA-32 and Intel® 64 architectures and as an alternate choice, an online installer
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel® MKL, including FAQs, tips and tricks, and other support information, please visit the Intel® MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel® MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel® MKL, please purchase either Intel® Parallel Studio XE 2020 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html."
*Other names and brands may be claimed as the property of others.
Initial Release
- LAPACK:
- Improved performance of {D,S}GESDD for case jobz='N' and 'A'.
- ScaLAPACK
- Introduced P{D,S}TREVC functions for computing some or all of the right and/or left eigenvectors of a real upper quasi-triangular matrix.
- Random Number Generators:
- Introduced an advanced SkipAhead method for parallel random number generation by MRG32K3A/PHILOX4X32-10/ARS-5 basic random number generator.
- Improved performance of ARS-5 basic random number generator for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) systems.
- Summary Statistics:
- Improved performance of fast calculation method for raw/central moments/sums, variance-covariance/correlation/cross-product matrix on Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® AVX-512 architecture sets.
- Library Engineering:
- Introduced modulefile support.
- Deprecation Notices:
- Deep Neural Network (DNN) component has been removed in this release.
- The DFTI_COMPLEX_REAL storage scheme for 1D and 2D R2C FFTs is deprecated and will be removed in the next major release.
- The pardiso_getenv and pardiso_setenv functions are deprecated and will be removed in the next major release.
- Known Issues:
- Customer can see segmentation fault when using new code path for two-level Pardiso factorization algorithm. Workaround is to switch to old code path.
- The mkl_sparse_qr_reorder method gives an incorrect initialization error in the case of a diagonal matrix.
- When MKL is used with Threading Building Blocks (TBB) based threading layer, reducing the number of threads that TBB can use (in particular via tbb::global_control) may lead to high memory consumption.
- For the standalone version of Intel® MKL on Linux, if you expect to use Intel® TBB, you will need to install the standalone version of Intel® TBB, otherwise expect the examples to crash. This is a workaround. For more information, see this article.
Product Content
Intel MKL can be installed as a part of the following suites:
- Intel® Parallel Studio XE 2020 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe. - Intel® System Studio 2020 Composer, Professional, or Ultimate Edition
Download from /content/www/us/en/develop/tools/system-studio.html.
Intel MKL consists of either one package for both IA-32 and Intel® 64 architectures or alternatively an online installer.
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel MKL, please purchase either Intel® Parallel Studio XE 2020 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html.
*Other names and brands may be claimed as the property of others.
2019
Installation Guide System Requirements Bug Fix Log
Update 5
- BLAS Features:
- Improved performance of small to medium size GEMM_S8U8S32 and GEMM_S16S16S32 when C-offset is non-zero on Intel® AVX2 and Intel® AVX 512 architecture sets.
- Improved SGEMM performance for tall-and-skinny case for small N.
- Enabled TBB threading for GEMM_S8U8S32 and GEMM_S16S16S32.
- Added GEMM_S8U8S32 and GEMM_S16S16S32 optimizations for Intel® AVX and SSE4.2 architecture sets.
- Addressed GEMM_S8U8S32 and SGEMM_S16S16S32 accuracy issues on Intel® AVX 2 and 512 architectures when the Alpha scale factor is not an integer value.
- LAPACK:
- Improved performance of ?GEQR for tall-and-skinny matrices on Intel® AVX-512 architecture sets.
- Improved performance of the ?TRTRI, ?POTRI, ?GETRI and ?TFTRI inverse routines.
- ScaLAPACK
- Significantly reduced the memory footprint of the P{D,S}HSEQR eigensolvers.
- Sparse Solver:
- Added new ILU smoother support.
- Known Limitations:
- GEMM_S8U8S32 and GEMM_S16S16S32 may return wrong results when using Intel® TBB threading if the offsetc parameter is given by a lower case letter.
- GEMM_S8U8S32_COMPUTE and GEMM_S16S16S32_COMPUTE may return wrong results when using Intel(R) TBB threading if neither the A or B matrix are packed into the internal format and the offsetc parameter is given by a lower case letter. As a workaround use only upper case letters for the offsetc parameter in the above situations i.e. ‘F’, ‘C’, or ‘R’.
- Customer using complex general sparse eigenvalue function can come across a potential segmentation fault issue depending on input matrices.
- When MKL is used with TBB-based threading layer, reducing the number of threads that TBB can use (in particular via tbb::global_control) may lead to high memory consumption.
- For the standalone version of Intel® MKL on Linux, if you expect to use Intel® TBB, you will need to install the standalone version of Intel® TBB, otherwise expect the examples to crash. This is a workaround. For more information, see this article.
Product Content
Intel MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2019 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe.
Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel MKL, please purchase either Intel® Parallel Studio XE 2019 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html."
*Other names and brands may be claimed as the property of others.
Update 4
- BLAS Features:
- Improved API consistency with the Netlib CBLAS header file.
- Changed ?GEMM_PACK_GET_SIZE and GEMM_*_PACK_GET_SIZE return types to size_t to avoid overflows when using the lp64 interface.
- Addressed a potential {CBLAS_}{S,D}GEMM OpenMP threading deadlock on Intel® Advanced Vector Extension 2 (Intel® AVX2) and later architectures when OMP_DYNAMIC is TRUE.
- Addressed a FORTRAN GEMM_S8U8S32 and GEMM_S16S16S32 API issue when there are row offsets.
- LAPACK:
- Direct Call feature extended with non-pivoting LU factorization providing a significant performance boost for small matrices (N < 8).
- Improved performance of the triangular matrix inverse ?TRTRI routines for Intel® Advanced Vector Extensions (Intel® AVX) and higher with OpenMP threading.
- ScaLAPACK
- Improved performance of P?POTRF for Intel® AVX and higher.
- Improved performance of P?{SY,HE}EVD for Intel® AVX and higher.
- FFT
- Improved performance of C2C and R2C FFT functions for several sizes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) systems.
- Sparse Solver:
- In Parallel Direct Sparse Solver for Clusters - Added the ability to get matrices from matrix factorization via set of functions: CLUSTER_SPARSE_SOLVER_GET_CSR_SIZE, CLUSTER_SPARSE_SOLVER_SET_CSR_PTRS, CLUSTER_SPARSE_SOLVER_SET_PTR, CLUSTER_SPARSE_SOLVER_EXPORT.
- Known Issues:
- MKL_VERBOSE mode might not work correctly for Intel MKL ScaLAPACK functions with Xcode Clang compiler on Mac OS. Workaround: Use the Intel® C/C++ compiler
Product Content
Intel MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2019 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe.
Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel MKL, please purchase either Intel® Parallel Studio XE 2019 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html."
*Other names and brands may be claimed as the property of others.
Update 3
- BLAS Features:
- Just-in-time (JIT) compiled SGEMM/DGEMM now supports larger matrices (m, n, k ? 16), with increased performance for some existing matrix sizes.
- Added support for JIT SGEMM/DGEMM on the Intel® Advanced Vector Extensions (Intel® AVX) architecture.
- Added JIT generation of CGEMM/ZGEMM kernels, accelerating small complex matrix multiplications on Intel® AVX, Intel® AVX2, and Intel® AVX-512.
- Added strict CNR mode, providing bitwise reproducible results for ?gemm, ?trsm, ?symm, ?hemm independent of the number of threads, on Intel® AVX2 and higher.
- Improved performance of GEMM_S8U8S32 and GEMM_S16S16S32 for offsets different than zero.
- Improved threaded performance of SGEMM_COMPUTE for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) systems.
- Improved ?GEMM performance with Intel® TBB.
- Improved the threaded vs. sequential ?GEMV threshold.
- Improved single threaded performance of ?GEMM_COMPUTE for M ? 7 or N ?7.
- ScaLAPACK:
- Introduced P{C,Z}GEBAL functions for balancing a general complex matrix.
- Introduced P?GEBAK functions for reversing balancing of a general matrix.
- Introduced MKL_VERBOSE support for the following functions: P?POTRF, P?TRTRI, PDSYEV{D,R,X} and PZHEEV{D,R,X}. All MPI ranks will print MKL_VERBOSE output.
- Vector Mathematics:
- Introduced overflow status reporting introduced for V?POW functions.
- Improved performance for ERF/ERFINV/ERFCINV/EXPINT1/CDFNORINV on Intel® Xeon® Processor Scalable Family platforms.
- Eliminated superfluous INVALID exceptions in VSPOW, when the first argument contains zeros.
- Random Number Generators:
- Chi-Square continuous distribution random number generator was added.
- Improved performance of Philox4x32-10 basic random number generator for Intel® AVX-512 instruction sets.
- Known Issues:
- Error LNK2005: _powf already defined in libmmt.lib
- Customer Impact: Customers using 32-bit Windows will get a link error when math.h is included and when linking with libmmt.lib or libm.lib
- Workaround: Uncomment the following definitions in math.h: //# define ldexpf _MS_ldexpf and //#define powf _MS_powf
- Cluster_Sparse_Solver can spontaneously crash when linking with Intel® MPI 2019 and later versions.
- Workaround: Use Intel® MPI 2018.
- Error LNK2005: _powf already defined in libmmt.lib
Product Content
Intel MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2019 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe.
Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel MKL, please purchase either Intel® Parallel Studio XE 2019 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html."
*Other names and brands may be claimed as the property of others.
Update 2
Intel® Math Kernel Library (Intel® MKL) Update 2 includes functional and security updates. Users should update to the latest version.
Update 1
- BLAS Features:
- Introduced new packed integer interfaces for matrix multiplication: [CBLAS_]GEMM_{S8U8S32,S16S16S32}_PACK, [CBLAS_]GEMM _{S8U8S32,S16S16S32}_COMPUTE, [CBLAS_]GEMM _{S8U8S32,S16S16S32}_PACK_GET_SIZE.
- Introduced new interfaces to get the number of bytes needed to store a packed matrix for single and double precisions: [CBLAS_]{D,S}GEMM_PACK_GET_SIZE.
- Improved GEMM_S8U8S32 and GEMM_S16S16S32 performance for large problem sizes on Intel® Advanced Vector Extensions 512 (Intel® AVX-512).
- Introduced Intel® Threading Building Blocks (Intel® TBB) support for GEMM_BATCH.
- Improved GEMM_BATCH performance for large problem sizes.
- Improved MKL_?GEMM_COMPACT performance on Intel® AVX-512.
- LAPACK:
- Improved performance of complex precision general eigensolver functions (CGEEV and ZGEEV) for OpenMP and Intel® TBB threading.
- Improved performance of ?PBTRS function with single right-hand side.
- FFT:
- Improved performance for non-power of 2 sizes on Intel® AVX-512.
- Library Engineering:
- Introduced support for Universal Windows Driver* (UWD).
Product Content
Intel MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2019 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe.
Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer
Known Limitations
- For STDCALL convention, gemm_s8u8s32_compute and gemm_s16s16s32_compute signatures in mkl_blas_stdcall.h must be modified from alpha to *alpha. The alpha parameter should be a pointer.
- Issue: Intel® MKL Vector Math real functions can report overflow while the corresponding complex functions do not (or vice-versa). Impact: You may see inconsistent error reporting when you compare real and complex Intel® MKL Vector Math functions.
- Issue: Complicated OpenMP tasks with dependencies used in matrix inversion routines (?TRTRI, ?GETRI, and ?POTRI) may require more stack space than what is available by default. Impact: The matrix inversion routines (?TRTRI, ?GETRI, and ?POTRI) for large sizes (>17000) with OpenMP threading can have significant performance regression compared to Intel® MKL 2018 Update 3. Work Around: Increase the OpenMP stack size to 16M or more for larger problem sizes.
- Issue: Excessive allocation of memory in case of many RHS in Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters. Impact: During Pardiso computation, you may see that the allocated memory depends almost linearly on the number of threads in case of many RHS with 2-level factorization. Work Around: Switch to 1 level factorization in case of lack of memory.
- Issue: On Win32 system, we have incorrect output for Pardiso. Impact: Cannot compute Schur complement with VBSR format using the Pardiso with lparm[23]=1, iparm[35]>0 and iparm[36]<0.
- Issue: Few direct sparse solver for cluster tests crashed on win32 with Intel® Message Parsing Interface (Intel® MPI) version 2019. Impact: Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters may fail on win32 with Intel® MPI 2019. Work around: Use Intel® MPI version 2018.
Deprecation Notices
- The [CBLAS_]{D,S}GEMM_ALLOC and [CBLAS_]{D,S}GEMM_FREE interfaces for single and double precisions have been deprecated and will be removed in a future release of Intel MKL. Please use the [CBLAS_]{D,S}GEMM_PACK_GET_SIZE function to get the number of bytes needed to store the packed matrix. You can then use MKL_MALLOC and MKL_FREE to allocate aligned memory and free it respectively.
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/
For paid support with Intel MKL, please purchase either Intel® Parallel Studio XE 2019 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html."
*Other names and brands may be claimed as the property of others.
Initial Release
- BLAS Features:
- Introduced automatic S/DGEMM JIT capability for small matrix sizes (m,n,k <=16) to improve S/DGEMM performance for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) when compiling with MKL_DIRECT_CALL_JIT (threaded usage) or MKL_DIRECT_CALL_SEQ_JIT (sequential usage).
- Introduced new functions to JIT (create) optimized S/DGEMM-like matrix multiply kernels for small matrix sizes (m,n,k <=16) for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512), execute the optimized kernel created using matrices with matching dimensions, and to remove (destroy) the JIT kernel.
- Sparse BLAS:
- Introduced SYPR and Sp2M functionality for triple matrix multiply ABA^t and matrix multiply AB (and their transposes).
- Improved performance of Inspector-Executor Sparse BLAS routines for Intel® TBB and sequential threading layers.
- Improved performance of SpMV , MKL_SPARSE_[S,D,C,Z]_SYMGS and MKL_SPARSE_[S,D,C,Z]_TRSV routines for Intel® Advanced Vector Extensions 512 (Intel® AVX-512).
- DNN:
- Deep Neural Network (DNN) component is deprecated and will be removed in the next Intel MKL release. We will continue to provide optimized functions for deep neural networks in Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN).
- LAPACK:
- Aligned MKL LAPACK functionality with Netlib LAPACK 3.7.1 and 3.8.0:Added routines for symmetric indefinite matrix factorization using a 2-stage Aasen’s algorithm.
- Improved performance of ?GETRF for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and other micro architectures with OpenMP* threading.
- Improved performance of ?GETRF and ?POTRF with TBB* threading.
- ScaLAPACK:
- Improved performance and significantly reduced memory footprint of ScaLAPACK Eigensolvers P?[SY|HE]EV[D|X|R] routine.
- FFT:
- Improved performance of 1D real-to-complex FFT.
- Improved performance of C2C 1D and 2D FFT for Intel® Advanced Vector Extensions 512 (Intel® AVX-512).
- Sparse Solvers:
- Introduced SparseQR functionality.
- Introduced Extreme{EVD/SVD} functionality to calculate set of most positive or most negative eigen/singular values of a symmetric(Hermitian) matrix.
- Introduced support of partial inversion of sparse matrices (compute diagonal of inverse) in Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters.
- Random Generators:
- Introduced Multinominal Random Number Generators.
Product Content
Intel MKL can be installed as a part of the following suite:
- Intel® Parallel Studio XE 2019 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe.
Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer
Deprecation Notices
- Deep Neural Network (DNN) component is deprecated and will be removed in the next Intel MKL release. We will continue to provide optimized functions for deep neural networks in Intel Math Kernel Library for Deep Neural Networks (Intel MKL-DNN).
- Removed support for 32 bit applications on macOS*.
- Xcode10 no longer supports compilation of applications for 32bit. Hence, support will also be removed for MKL 2019 on macOS* in Update 1.
- If users require 32-bit support on macOS*, they should use MKL 2018 or early versions.
Known Issues
- –nocompchk flag:
- Problem Statement: Intel® Message Passing Interface (Intel® MPI) 2019 no longer handles the flag –nocompchk with mpiexec (it has been officially removed from support) and use of it will return an error. The old default behavior of running compchk.sh scripts when mpiexec (-nocompchk flag turned off this default behavior) is called has been changed. The current behavior is that you must explicitly tell mpiexec you want to run compchk.sh scripts with the –compchk flag.
- Customer Impact: The file “examples/pblasc/make_lnx.inc” for Intel® MKL 2019 Gold release still has this flag on lines 195, 196, 220 and 221 and so will pblasc example will fail if run with Intel® MPI 2019.
- Workaround: If using Intel® MPI 2019 to run this pblasc example, you must remove the –nocompchk flags in the “examples/pblasc/make_lnx.inc” file for it to succeed.
- Input stream redirection on Windows:
- Problem Statement: Intel® MPI 2019 doesn’t support input stream redirection on Windows.
- Customer Impact: The examples from “examples/cdftf” for Intel® MKL 2019 Gold release use input stream redirection get data from data file and as the result hang on Windows in case of using Intel® MPI 2019.
- Workaround:remove stream redirection form “examples/cdftf/makefile” (.exe < data \$*.dat), and update source code using OPEN/READ/CLOSE functions to load data from data file.
- Schur test fail:
- Problem Statement:Several Intel® Math Kernel Library Parallel Direct Sparse Solver for Clusters Schur tests failed in MKL2019 after performance improvement of solving step with many rhs.
- Customer Impact: If customer uses Schur complement in new Pardiso branch they may get an error on the factorization step.
- Workaround:Switch to an old Pardiso branch (classic one level factorization, use C: iparm[24-1]=0 Fortran: iparm(24)=0 parameters) or use MKL2018.3.
- Custom fatal error handler usage limitation:
- Custom fatal errors handler should stop the Intel® MKL 2019 Gold computation.
- If a fatal error occurs when Intel® MKL cannot load a dynamic library or confronts an unsupported CPU type, and custom fatal errors handler doesn’t force Intel® MKL to stop the computation then Intel® MKL will fail with segmentation fault.
- Call exit function (for C application) or use exceptions (for C++ applications) in the custom fatal errors handler implementation.
- Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.
For paid support with Intel MKL, please purchase either Intel® Parallel Studio XE 2019 - https://software.intel.com/en-us/parallel-studio-xe or Intel® System Studio - /content/www/us/en/develop/tools/system-studio.html.".
2018
Installation Guide System Requirements Bug Fix Log
Update 4
What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018 Update 4:
- BLAS
- Improved ?COPY functions performance by introducing threading for the incx=0 case.
- Improved GEMM_S8U8S32 and GEMM_S16S16S32 functions performance for small N cases on Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures.
- Introduced consistent NaN handling for the i?amin and i?amax functions. If the vector contains NaN values, the index of the first NaN is returned.
- Addressed a conditional reproducibility (CNR) CROT function issue for Intel® AVX-512 architecture.
- Addressed a GEMM_S16S16S32 function accuracy issue for beta = 0 cases for Intel® AVX2 and Intel® AVX-512 architectures.
- SparseBLAS:
- Improved performance of sequential calls of mkl_sparse_?_mv, mkl_sparse_?_trsv and mkl_sparse_?_symgs functions for Intel® AVX-512 architecture.
- Improved mkl_sparse_?_trsv performance.
- FFT:
- Improved 2D FFT performance in the case of Complex-to-Complex transformations.
- Sparse Solvers:
- Improved Pardiso performance at the solving step in case of many right-hand sides.
Update 3
What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018 Update 3:
- BLAS
- Addressed ?TRMM NaN propagation issues on Advanced Vector Extensions 512 (Intel® AVX-512) for 32-bit architectures.
- Improved performance on small sizes of multithreaded {S,D}SYRK and {C,Z}HERK for Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
- LAPACK:
- Added ?POTRF and ?GEQRF optimizations for Intel® Advanced Vector Extensions 2 and Intel® Advanced Vector Extensions 512 (Intel l®AVX2 and Intel l® AVX-512) instruction sets.
- Improved the performance of ?GESVD for very small square matrices (N<6).
- Improved performance of inverse routines ?TRTRI, ?GETRI and ?POTRI.
- SparseBLAS:
- Improved the performance of SPARSE_OPTIMIZE, SPARSE_SV and SPARSE_SYPR routines for Intel® TBB threading.
- Added support of BSR format for the SPARSE_SYPR routine.
- Library Engineering:
- Added functionality to write the output of MKL_VERBOSE to a file specified by the user.
- Enabled optimizations for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set with support of Vector Neural Network Instructions via MKL_ENABLE_INSTRUCTIONS.
Known Limitations:
When the leading dimension of matrix A is not equal to the number of rows or columns, the MKL_?GEMM_COMPACT functions can return incorrect results when executed on a processor that does not support Intel ® AVX-2 or Intel ® AVX-512 instructions.
Update 2
What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018 Update 2:
- BLAS
- Improved {S,D}GEMM performance for small sizes on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set.
- Improved i{C,S,Z,D}A{MIN,MAX} performance on Intel® AVX-512 instruction set.
- Improved CSROT performance on Intel® AVX-512 32-bit architectures.
- Improved parallel and serial performance of BLAS Level 3 routines on Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel AVX-512 32-bit architectures.
- Improved GEMM_BATCH performance for groups with tall and skinny matrices.
- DNN:
- Improved initialization phase performance for convolutions.
- Sparse BLAS
- Introduced triple product functionality A*B*AT with sparse and dense matrices B.
- Introduced sparse matrix product functionality A*B that can transpose both matrices and split their multiplication into phases similar to MKL_CSRMULTCSR.
- Introduced note about deprecation and replacement of the Sparse BLAS API
- Introduced Intel® Threaded Building Blocks (Intel® TBB) support for triangular solvers and converters routines.
- Improved performance of matrix vector and matrix product for CSR and BSR formats.
- Improved performance of matrix product for CSC format.
- LAPACK
- Improved parallel performance of ?SYTRF/?SYTRI/?SYTRI2.
- Improved performance of numerous LAPACK functions for matrix sizes ≤ 30.
- Improved parallel performance of (S|D)SYEVX Eigensolver.
- ScaLAPACK:
- Improved performance of P?(DSY|ZHE)EVD and P?(DSY|ZHE)EVX symmetrical Eigensolver drivers. Observed speed-up is up 4x depending on the matrix size and the grid configuration.
- FFT
- Improved 1D and 3D FFT performance for the processors supporting Intel® AVX512 and Intel® AVX2 Instruction sets.
- Sparse Solvers:
- Introduced an OOC version of Parallel Direct Sparse Solver for Clusters.
- Introduced support for Schur complement (dense), partial solving, and customer reordering in Parallel Direct Sparse Solver for Clusters.
- Introduced support for Sparse Schur complement in Intel MKL PARDISO functionality.
- Removed restriction of simultaneous use of VBSR format/Schur complement/OOC algorithm in Intel MKL PARDISO for two-level factorization branch.
- Improved performance of Intel MKL PARDISO for two-level factorization branch.
- Returned main information via Parallel Direct Sparse Solver for Clusters interface similar to Intel MKL PARDISO interface – memory peaks on different phases after reordering, inertia, and number of pivots after factorization.
- Vector Mathematics
- Improved performance of processors supporting Intel® AVX2 Instruction sets for 64-bit implementations of vsErfc_HA/LA/EP, vdSqrt_HA, vsCbrt_HA, vsInvCbrt_HA/LA.
- Improved performance of processors supporting Intel® AVX2 Instruction sets for 64-bit implementations of vdAtan_LA, vdTanpi_HA, vdTand_HA, vdTan_HA, vsAtan2pi_LA, vdTand_EP, vsTanh_HA, vdExpInt1_EP, vzDiv_HA, vzDiv_LA, vdTand_LA, vcArg_LA, vdTanh_LA, vdTanh_EP, vsLog10_LA, vsAtan2_LA, vsAtan2pi_EP.
- Data Fitting and Vector Statistics
- Improved performance of VS SS Summary Statistics Quantiles for Intel® Xeon® processors supporting Intel® AVX-512 (codename Skylake Server) and Intel® Xeon Phi™ processor 72** ( formerly Knights Landing) in OpenMP threading layer for dimensions n > 10^5
Update 1
What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018 Update 1
- BLAS
- Improved single precision and single precision complex Level 3 BLAS performance for Intel® Xeon Phi™ processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support for AVX512_4FMAPS instructions
- Improved irregular-shaped SGEMM performance on Intel® Xeon Phi™ processor x200
- Added stack unwind support to internal Intel64 assembly kernels on Windows OS
- Improved MKL_DIRECT_CALL DGEMM performance on Intel® Advanced Vector Extensions 2 (AVX2) for Intel and GNU C/C++ compilers
- Sparse BLAS
- Improved performance of Inspector-Executed mode of SpMV for CSR format
- Improved performance of SpMM routine for CSR format
- Improved performance of Inspector-Executed mode of SpMV for BSR format in Intel TBB threading layer
- LAPACK
- Improved performance of ?(OR|UN)GQR, ?GEQR and ?GEMQR routines in Intel® TBB threading layer
- Introduced LAPACKE_set_nancheck routine for disabling/enabling nan checks in LAPACKE functions
- ScaLAPACK:
- Added optimizations (2-stage band reduction algorithm) for pdsyevr/pzheevr for JOBZ=’N|V’ and for RANGE=A’. New algorithm is enabled for N>=4000 and for appropriate process grids; otherwise traditional algorithm is used. Best possible speed-up is expected for larger matrices.
- FFT
- Improved performance for batched real-to-complex 3D for Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
- Improved performance with and without scaling factor across all domains.
- Sparse Solvers
- Improved Intel Pardiso performance for small matrices with (iparm(24)=10)
- Vector Mathematics
- The default behavior has changed for unmasked exception handling. By default, all floating-point exceptions are now masked before any internal MKL VM computation, whereas until now exceptions unmasked by the user applied to internal computations as well. As a new feature, the user can employ four newly added modes (VML_TRAP_INVALID, VML_TRAP_DIVBYZERO, VML_TRAP_OVERFLOW, and VML_TRAP_UNDERFLOW) to trap on unmasked exceptions raised during internal computation of vector math functions.
- Data Fitting and Vector Statistics
- Introduced TBB-threading layer in MKL Data Fitting and Vector Statistics components
- Library Engineering
- Added pkg-config files to simplify compilation of applications and libraries with MKL.
Known limitation:
Data Fitting and Vector Statistics: Work in oversubscribed mode is not supported in this release. Please do not set number of TBB threads more than logical cores number
Initial Release
What’s New in Intel® Math Kernel Library (Intel® MKL) version 2018
- BLAS Features:
- Introduced compact GEMM and TRSM functions (mkl_{s,d,c,z}gemm_compact and mkl_{s,d,c,z}trsm_compact) to work on groups of matrices in compact format and service functions to support the new format
- Introduced optimized integer matrix-matrix multiplication routines GEMM_S8U8S32 and GEMM_S16S16S32 to work with quantized matrices for all architectures.
- BLAS Optimizations:
- Optimized SGEMM and SGEMM packed for Intel® Xeon Phi™ processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instructions
- Optimized GEMM_S8U8S32 and GEMM_S16S16S32 for AVX2, AVX512 and Intel® Xeon Phi™ processors based on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) with support of AVX512_4FMAPS and AVX512_4VNNIW instruction groups
- Deep Neural Network:
- Added support for non-square pooling kernels
- Improved performance of large non-square kernels on Intel® Xeon Phi™ processors
- Optimized conversions between plain (nchw, nhwc) and internal data layouts
- LAPACK:
- Added the following improvements and optimizations for small matrices (N<16):
- Direct Call feature extended with Cholesky and QR factorizations providing significant performance boost
- Introduced LU and Inverse routines without pivoting with significantly better performance: mkl_?getrfnp and mkl_?getrinp
- Introduced Compact routines for much faster solving of multiple matrices packed together: mkl_?getr[f|i]np_compact, mkl_?potrf_compact and mkl_?geqrf_compact
- Added ?gesvd, ?geqr/?gemqr, ?gelq/?gemlq optimizations for tall-and-skinny/short-and-wide matrice
- Added optimizations for ?pbtrs routine
- Added optimizations for ?potrf routine for Intel® Threading Building Blocks layer
- Added optimizations for CS decomposition routines: ?dorcsd and ?orcsd2by1
- Introduced factorization and solve routines based on Aasen's algorithm: ?sytrf_aa/?hetrf_aa, ?sytrs_aa/?hetrs_aa
- Introduced new (faster)_rk routines for symmetric indefinite (or Hermitian indefinite) factorization with bounded Bunch-Kaufman (rook) pivoting algorithm
- Added the following improvements and optimizations for small matrices (N<16):
- ScaLAPACK:
- Added optimizations (2-stage band reduction) for p?syevr/p?heevr routines for JOBZ=’N’ (eigenvalues only) case
- FFT:
- Introduced Verbose support for FFT domain, which enables users to capture the FFT descriptor information for Intel MKL
- Improved performance for 2D real-to-complex and complex-to-real for Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and Intel® Xeon Phi™ processor 72** ( formerly Knights Landing)
- Improved performance for 3D complex-to-complex for Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and Intel® Xeon Phi™ processor 72** ( formerly Knights Landing)
- Intel® Optimized High Performance Conjugate Gradient Benchmark:
- New version of benchmark with Intel® MKL API
- Sparse BLAS:
- Introduced Symmetric Gauss-Zeidel preconditioner
- Introduced Symmetric Gauss-Zeidel preconditioner with ddot calculation of resulted and initial arrays
- Sparse Matvec routine with ddot calculation of resulted and initial arrays
- Sparse Syrk routine with both OpenMP and Intel® Threading Building Block support
- Improved performance of Sparse MM and MV functionality for Intel® AVX-512 Instruction Set
- Direct Sparse Solver for Cluster:
- Add support of transpose solver
- Vector Mathematics:
- Added 24 new functions: v?Fmod, v?Remainder, v?Powr, v?Exp2; v?Exp10; v?Log2; v?Logb; v?Cospi; v?Sinpi; v?Tanpi; v?Acospi; v?Asinpi; v?Atanpi; v?Atan2pi; v?Cosd; v?Sind; v?Tand; v?CopySign; v?NextAfter; v?Fdim; v?Fmax; v?Fmin; v?MaxMag and v?MinMag including optimizations for processors based on Intel(R) Advanced Vector Extensions 512 (Intel® AVX-512)
- Data Fitting:
- Cubic spline-based interpolation in ILP64 interface was optimized up to 8x times on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server) and 2.5x on Intel® Xeon Phi™ processor 72** (formerly Knights Landing)
- Documentation:
- Starting with this version of Intel® MKL, most of the documentation for Parallel Studio XE is only available online at https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation. You can also download it from the Intel Registration Center > Product List > Intel® Parallel Studio XE Documentation
- Intel continually evaluates the markets for our products in order to provide the best possible solutions to our customer’s challenges. As part of this on-going evaluation process Intel has decided to not offer Intel® Xeon Phi™ 7200 Coprocessor (codenamed Knights Landing Coprocessor) products to the market.
- Given the rapid adoption of Intel® Xeon Phi™ 7200 processors, Intel has decided to not deploy the Knights Landing Coprocessor to the general market.
- Intel® Xeon Phi™ Processors remain a key element of our solution portfolio for providing customers the most compelling and competitive solutions possible.
- Support for the Intel® Xeon Phi™ x100 product family coprocessor (formerly code name Knights Corner) is removed in this release. The Intel® Xeon Phi™ x100 product family coprocessor (formerly code name Knights Corner) was officially announced end of life in January 2017. As part of the end of life process, the support for this family will only be available in the Intel® Parallel Studio XE 2017 version. Intel® Parallel Studio XE 2017 will be supported for a period of 3 years ending in January 2020 for the Intel® Xeon Phi™ x100 product family. Support will be provided for those customers with active support.
Product Content
Intel MKL can be installed as a part of the following suite:>
- Intel® Parallel Studio XE 2018 Composer, Professional, or Cluster Edition
Download from https://software.intel.com/en-us/intel-parallel-studio-xe.
Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer
Known Issues
- Convolution primitives for forward pass may return incorrect results or crashes for the case where input spatial dimensions smaller than kernel spatial dimensions for Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
- Intel® MKL FFT – complex-to-complex in-place batched 1D FFT with transposed output returns incorrect output
- Intel® ScaLAPACK may fail with OpenMPI* 1.6.1 and later releases due to known OpenMPI* issue: https://github.com/open-mpi/ompi/issues/3937. As a workaround, please avoid using OpenMPI
- Intel® VML functions may raise spurious FP exceptions even if the (default) ML_ERRMODE_EXCEPT is not set. Recommendation: do not unmask FP exceptions before calling VML functions.
- When an application uses Vector Math functions with the single dynamic library (SDL) interface combined with TBB threading layer, the application may generate runtime error “Intel MKL FATAL ERROR: Error on loading function mkl_vml_serv_threader_c_1i_2o.”
2017
Installation Guide System Requirements Bug Fix Log
Update 4
What's New in Intel MKL 2017 Update 4
- BLAS:
- Addressed an early release buffer issue in *GEMV threaded routines
- Improved Intel® Threading Building Blocks *GEMM performance for small m, n and large k cases
- Fixed irregular division by zero and invalid floating point exceptions in {C/Z}TRSM for Intel® Xeon Phi™ processor x200 (aka KNL) and Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) code path
- Improved {s/d} GEMV threaded performance on Intel64 architecture
- Addressed incorrect SSYRK calculation on Intel® Xeon Phi™ processor x200 with Intel® TBB threading occurring if the value of K is very large
- Addressed a GEMM multithreading issue, which may cause segfaults for large matrices (M, N >= ~30,000, K >= ~5000) on for Intel® Xeon Phi™ processor x200 (aka KNL)
- Deep Neural Networks:
- Added support for non-square pooling kernels
- Sparse BLAS
- Improved SpMV and SpMM performance for the processor supporting Intel® AVX512 Instruction set Improved SpMV performance for the processor supporting Intel® AVX2 Instruction set
- Added Intel® TBB support for SparseSyrk and SpMM routines
- Intel MKL Pardiso
- Significantly improved factorization and solving steps for “small” matrices
- Introduced low rank approach suitable for solving set of systems with small changes in elements
- Parallel Direct Sparse Solver for Cluster:
- Added Iterative support
- Improved performance for number of processes not power of 2
- LAPACK:
- Improved performance of ?(OR|UN)GQR, ?GEQR and ?GEMQR routines in Intel(R) TBB threading layer.
- Introduced LAPACKE_set_nancheck routine for disabling/enabling nan checks in LAPACKE functions.
- FFT:
- Improved 2D and 3D FFT performance for the processors supporting Intel® AVX512 and Intel® AVX2 Instruction sets.
- Improved FFT performance of w/ and w/o scaling factor across all domains.
- Introduced MKL_VERBOSE mode support for FFT domain.
Update 3
What's New in Intel MKL 2017 Update 3
- BLAS:
- Optimized SGEMM for Intel® Xeon Phi™ processor x*** (codename Knights Mill)
- Improved performance for ?GEMM for medium problem sizes on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
- Improved performance for SGEMM/DGEMM for small problem sizes on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
- Improved performance for ?GEMM_BATCH on all architectures
- Improved performance for SSYMV/DSYMV on Intel® Advanced Vector Extensions 2 (Intel® AVX2) and later architectures
- Improved performance for DGEMM Automatic Offload (AO) for square sizes (3000<m=n=k< formerly="" knights="" li="" on="" processor="" xeon=""> </m=n=k<>
- Improved performance for general BLAS functions on the 32-bit Intel® Advanced Vector Extensions 512 (Intel® AVX512) architecture
- Fixed ?AXPBY to propagate NaNs in the y vector when beta = 0 on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and later architectures
- FFT:
- Improved performance of 3D FFT complex-to-real and real-to-complex problems on Intel® Xeon Phi™ processor 72** (formerly Knights Landing)
- Improved performance of 2D FFT complex-to-complex problems with scale on Intel® Xeon Phi™ processor 72** (formerly Knights Landing)
- High Performance Conjugate Gradients (HPCG):
- Add support of Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
- Deep Neural Networks:
- Added initial convolution and inner product optimizations for the next generation of Intel Xeon Phi processor (code name Knights Mill)
- Improved parallel performance of convolutions on Intel Xeon Phi processor (code name Knights Landing)
- Average pooling has an option to include padding into mean values computation
- LAPACK:
- Optimized ?GELQ and ?GEMLQ performance for short-and-wide matrices
- Optimized performance of ?ORCSD2BY1 and ? DORCSD routines
- Fixed LU performance degradation for medium sizes on 6 threads
- Vector Statistics:
- Fixed failure of VSL RNG MT19937 on big vector lengths on Intel® Xeon Phi™ Coprocessor x100 Product Family.
- Improved performance of Outlier Detection (BACON) algorithm for single and double precisions for processor supporting Intel® AVX2 and intel® AVX512 Instruction sets
Update 2
What's New in Intel MKL 2017 Update 2
- Library Engineering:
- Intel® AVX-512 code is dispatched by default on Intel® Xeon processors
- BLAS:
- Improved performance of dgemv non transpose when number of threads are large (typically on Intel® Xeon Phi™ processor x200 (formerly Knights Landing)). For example: factor 2 speedup when M=K=10000 with 68 threads on Intel® Xeon Phi™ processor x200
- Improved performance for dgemm, TN and NN cases, with very small N on Intel® Xeon Phi™ processor x200 and 6th Generation Intel® Core™ processor ( as known as Skylake)
- Introduced MKL_NUM_STRIPES environment variable and accompanying Intel MKL support functions to control the 2D partitioning of multithreaded *GEMM on all Intel architectures except from Intel® Xeon Phi™ Coprocessor x100 Product Family. Please see the related section in Intel MKL Developer Guide for details.
- Improved the {s,d}gemm_compute performance on Intel64 architectures supporting Intel® AVX2 instruction set.
- Improved ?gemm_batch performance when N==1.
- Sparse BLAS
- Improved performance of BCSMV functionality with 3-10, 14 and 18 problem sizes for processor supporting Intel® AVX2 and intel® AVX512 Instruction sets
- Improved performance of CSRMV functionality for processor supporting Intel® AVX2 and intel® AVX512 Instruction sets
- Added Intel® Threading Building Blocks (Intel® TBB) threading support for CSRMV functionality with symmetric matrices
- Intel MKL Pardiso
- Added support of Intel TBB threading support for Intel MKL Pardiso at the solving step
- Deep Neural Networks:
- Improved performance on Intel Xeon processors with Intel® AVX2 and Intel® AVX512 instruction set support
- Improved performance on the second generation of Intel® Xeon Phi™ processor x200
- Introduced support for rectangular convolution kernels
- Significantly improved reference convolution code performance
- Added unsymmetric padding support in convolution and pooling
- Introduced extended Batch Normalization API that allows access to mean, variance, scale and shift parameters
- LAPACK:
- Added ?GEQR, ?GEMQR and ?GETSLS functions with performance optimized for tall-and-skinny matrices.
- Improved LAPACK performance for very small sizes (N<16) in LP64 layer by reducing internal LP64/ILP64 conversion overhead.
- Improved ?[SY|HE]EVD scalability up to 32 and beyond threads on Intel® Xeon and Intel® Xeon Phi™ processor x200
- Significantly improved ?LANGE (‘Frobenius’ norm) performance
- ScaLAPACK:
- Added MKL_PROGRESS() support in P?GETRF
- Improved P?TRSM/P?SYRK performance
- Optimized ?GE(SD|RV|BS|BR)2D routines in BLACS
- Fixed failure in P?GEMM (‘N’, ‘N’ case)
- Vector Mathematics:
- Added Intel TBB threading support for all mathematical functions.
- Vector Statistics:
- Improved C interfaces of vsl*SSEdit*() functions
Known Limitations:
- For Intel® Xeon Phi™ processor x200 leverage boot mode without Hyper Threading, MKL have an oversubscription of threads for versions prior to MPSS 4.3.2 due to COI occupying 4 cores. This affects the performance of MKL substantially. As an work around, the costumer can add ‘norespect’ to the MIC_KMP_AFFINITY environment variable.
- ?GETRF functionality can give incorrect results for some matrices of 5x5 size when MKL_DIRECT_CALL is enabled. The patch fixing the issue is posted on MKL Forum.
- Recently added TS QR functionality (?GEQR and ?GEMQR) may demonstrate very slow performance when the number of threads is less than 30.
- On SKX DGEMM does not scale C by beta when transa == N, transb == N, K==0 and N==2. A workaround is to set transa == T or transb == T since with K==0 the transpose is not relevant
Update 1
What’s New in Intel MKL 2017 Update 1
- Added support of Intel® Xeon Phi™ processor x200 leverage boot mode on Windows* OS.
- BLAS :
- The Intel Optimized MP LINPACK Benchmark supports various MPI implementations in addition to Intel® Message Parsing Interface (Intel® MPI), and the contents of the mp_linpack directory have changed.
- Improved single thread SGEMM/DGEMM performance on Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and Intel® Xeon® for Intel® Many Integrated Core Architecture.
- Deep Neural Networks (DNN) primitives :
- Introduced additional optimizations for Intel® Xeon® processor E3-xxxx V5 ( formerly Skylake).
- Added support of non-square cores of convolution
- Sparse BLAS :
- Improved Sparse BLAS matrix vector functionality in block compressed sparse row (BSR) format for block size equal to 6,10,14, or 18 on Intel AVX2.
- Improved Inspector-executor Sparse BLAS matrix-vector and matrix-matrix functionality for symmetric matrices.
- LAPACK :
- Improved performance of ?GETRF, ?GETRS and ?GETRI for very small matrices via MKL_DIRECT_CALL.
- Improved performance of ?ORGQR and SVD functionality for tall-and-skinny matrices.
- Parallelized ?ORGQR in Intel® Threading Building Blocks (Intel® TBB) threading layer.
- Vector Math :
- Introduced the exponential integral function E1 with three accuracy levels HA, LA, and EP, for single precision and double precision real data types.
- ScaLAPACK :
- Improved performance of PZGETRF.
- Known Limitations for MKL 2017 Update 1 :
- Intel MKL (in Intel® Parallel Studio XE) integration with Microsoft Visual Studio in IA-32 architecture environments is limited. This issue does not affect the Intel® 64 architecture target environment. Intel MKL (in Intel® System Studio) integration with Microsoft Visual Studio is limited in both IA-32 and Intel 64 architecture environments.
- Workaround: set up include, library folders, and required libraries manually (Step 3 in How to Build an Intel® MKL Application with Intel® Visual Fortran Compiler).
- 1D complex-to-complex FFT may return incorrect results on systems with Intel AVX-512 support if the number of threads is different at DFT descriptor commit time and DFT execution.
- The AVX512 code path works when the MKL_ENABLE_INSTRUCTIONS=AVX512 environment variable is set, but mkl_enable_instructions(MKL_ENABLE_AVX512) function call does not.
- Building the Intel Optimized MP LINPACK Benchmark for a customized MPI implementation on Windows* is not supported for Microsoft Visual Studio 2015 and later.
- Intel MKL (in Intel® Parallel Studio XE) integration with Microsoft Visual Studio in IA-32 architecture environments is limited. This issue does not affect the Intel® 64 architecture target environment. Intel MKL (in Intel® System Studio) integration with Microsoft Visual Studio is limited in both IA-32 and Intel 64 architecture environments.
Workaround: Use an earlier version of Microsoft Visual Studio.
Issue Description: If the user tries to use MSVS 2015 with our provided build.bat script to build their own xhpl.exe executable, they will see a number of unresolved external symbol errors like:
libhpl_intel64.lib(HPL_pdmatgen.obj) : error LNK2001: unresolved external symbol __iob_func
An older version of MSVS was used to build the libhpl_intel64.lib library we provide to link against when building the MP LINPACK benchmark for a customized MPI implementation. It appears that these functions are now inlined in MSVS2015.
Initial Release
What's New in Intel MKL 2017
- Introduced optimizations for the Intel® Xeon Phi™ processor x200 (formerly Knights Landing ) self-boot platform for Windows* OS
- Enabled Automatic Offload (AO) and Compiler Assisted Offload (CAO) modes for the second generation of Intel Xeon Phi coprocessor on Linux* OS
- Introduced Deep Neural Networks (DNN) primitives including convolution, normalization, activation, and pooling functions intended to accelerate convolutional neural networks (CNNs) and deep neural networks on Intel® Architecture.
- Optimized for Intel® Xeon® processor E5-xxxx v3 (formerly Haswell), Intel Xeon processor E5-xxxx v4 (formerly Broadwell), and Intel Xeon Phi processor x200 self-boot platform.
- Introduced inner product primitive to support fully connected layers.
- Introduced batch normalization, sum, split, and concat primitives to provide full support for GoogLeNet and ResidualNet topologies.
- BLAS:
- Introduced new packed matrix multiplication interfaces (?gemm_alloc, ?gemm_pack ,?gemm_compute, ?gemm_free) for single and double precisions.
- Improved performance over standard S/DGEMM on Intel Xeon processor E5-xxxx v3 and later processors.
- The Intel Optimized MP LINPACK Benchmark pre-built binaries using Intel® Message Parsing Interface (Intel® MPI) were moved to the mp_linpack root folder. Support for multiple MPI implementations was also added. The benchmark source codes in the mp_linpack directory were removed except for HPL_main.c, which can be used to create an Intel Optimized MP LINPACK benchmark binary for a specific MPI implementation.
- Sparse BLAS:
- Improved performance of parallel BSRMV functionality for processor supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction set.
- Improved performance of sparse matrix functionality on the Intel Xeon Phi processor x200.
- Intel MKL PARDISO:
- Improved performance of parallel solving step for matrices with fewer than 300000 elements.
- Added support for mkl_progress in Parallel Direct Sparse Solver for Clusters.
- Added fully distributed reordering step to Parallel Direct Sparse Solver for Clusters.
- Fourier Transforms:
- Improved performance of batched 1D FFT with large batch size on processor supporting Intel® Advanced Vector Extensions (Intel® AVX), Intel AVX2, Intel® Advanced Vector Extensions 512 (Intel® AVX512) and Intel AVX-512 Intel® Many Integrated Core Instructions (Intel® MIC Instructions) instruction sets
- Improved performance for small size batched 2D FFT on the Intel Xeon Phi processor x200 self-boot platform, Intel Xeon processor E5-xxxx v3, and Intel Xeon processor E5-xxxx v4.
- Improved performance for 3D FFT on the Intel Xeon Phi processor x200 self-boot platform.
- LAPACK
- Included the latest LAPACK v3.6 enhancements. New features introduced are:
- SVD by Jacobi ([CZ]GESVJ) and preconditioned Jacobi ([CZ]GEJSV)
- SVD via EVD allowing computation of a subset of singular values and vectors (?GESVDX)
- In BLAS level 3, generalized Schur (?GGES3), generalized EVD (?GGEV3), generalized SVD (?GGSVD3), and reduction to generalized upper Hessenberg form (?GGHD3)
- Multiplication of a general matrix by a unitary or orthogonal matrix that possesses a 2x2 block structure ([DS]ORM22/[CZ]UNM22)
- Improved performance for large size QR(?GEQRF) on processors supporting theIntel AVX2 instruction set.
- Improved LU factorization, solve, and inverse (?GETR?) performance for very small sizes (<16).
- Improved General Eigensolver (?GEEV and ?GEEVD) performance for the case when eigenvectors are needed.
- Improved?GETRF, ?POTRF and ?GEQRF, linear solver (?GETRS) and SMP LINPACK performance on the Intel Xeon Phi processor x200 self-boot platform.
- Included the latest LAPACK v3.6 enhancements. New features introduced are:
- ScaLAPACK
- Improved performance for hybrid (MPI + OpenMP*) mode of ScaLAPACK and PBLAS.
- Improved performance of P?GEMM and P?TRSM resulted in better scalability of Qbox First-Principles Molecular Dynamics code.
- Data Fitting:
- Introduced two new storage formats for interpolation results (DF_MATRIX_STORAGE_SITES_FUNCS_DERS and DF_MATRIX_STORAGE_SITES_DERS_FUNCS).
- Added Hyman monotonic cubic spline.
- Improved performance of Data Fititng functionality on the Intel Xeon Phi processor x200.
- Modified callback APIs to allow users to pass information about integration limits.
- Vector Mathematics:
- Introduced optimizations for the Intel Xeon Phi processor x200.
- Improved performance for Intel Xeon processor E5-xxxx v3 and Intel Xeon processor E5-xxxx v4.
- Vector Statistics:
- Introduced additional optimization of SkipAhead method for MT19937 and SFMT19937.
- Improved performance of Vector Statistic functionality including Random Number Generators and Summary Statistic on the Intel Xeon Phi processor x200.
Deprecation Notices:
- Removed pre-compiled BLACS library for MPICH v1; MPICH users can still build the BLACS library with MPICH support via Intel MKL MPI wrappers.
- The SP2DP interface library is removed.
- The PGI* compiler on IA32 is no longer supported.
- Installation on IA-32 architecture hosts is no longer supported, and the Intel MKL packages for Intel® 64 architecture hosts include both 64-bit and 32-bit Intel IPP libraries.
- Red Hat Enterprise Linux* 5.0 support is dropped
Known Limitations:
- cblas_?gemm_alloc is not supported on Windows* OS for the IA-32 architectures with single dynamic library linking.
- Intel MKL (in Intel Parallel Studio XE) Integration with Microsoft Visual Studio in IA-32 environment is limited. This issue does not affect the Intel® 64 target environment. Intel MKL (in Intel® System Studio) integration with Microsoft Visual Studio is limited in both IA-32 and Intel 64 environments.
- Workaround: setup include, library folders and required libraries manually (Step 3 in How to Build an Intel® MKL Application with Intel® Visual Fortran Compiler).
- 1D complex-to-complex FFT may return incorrect results on systems with Intel Advanced Vector Instructions 512 (Intel AVX 512) support if the number of threads is different at DFT descriptor commit time and DFT execution.
- {s,d}gemm_compute may leak memory if only one of the matrices are packed in sequential Intel MKL for Intel AVX2 and above.Workaround: Use multi-threaded Intel MKL and set MKL_NUM_THREADS to 1 instead of using sequential Intel MKL.
-
nodeperf.c, which comes with the MP LINPACK Benchmark for Clusters package, may fail to run on Windows*.
Workaround : Use the Intel Optimized MP LINPACK Benchmark for benchmarking individual nodes on a Windows* cluster. Alternatively, uncomment line 551 and comment out line 552 in nodeperf.c, to use malloc instead of mkl_malloc.
Note: nodeperf.c will be removed in MKL 2017 Update 1. We recommend using the MP LINPACK benchmark directly for measuring cluster performance.
Product Contents
Now Intel MKL consists in one package for both IA-32 and Intel® 64 architectures and in online installer
Technical Support
If you did not register your Intel software product during installation, please do so now at the Intel® Software Development Products Registration Center. Registration entitles you to free technical support, product updates, and upgrades for the duration of the support term.
For general information about Intel technical support, product updates, user forums, FAQs, tips and tricks and other support questions, please visit http://www.intel.com/software/products/support/.
Note: If your distributor provides technical support for this product, please contact them rather than Intel.
For technical information about Intel MKL, including FAQs, tips and tricks, and other support information, please visit the Intel MKL forum: http://software.intel.com/en-us/forums/intel-math-kernel-library/ and browse the Intel MKL knowledge base: http://software.intel.com/en-us/articles/intel-mkl-kb/all/.