Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Reproducibility Conditions

To get reproducible results from run to run, ensure that the number of threads is fixed and constant. Specifically:

  • If you are running your program with OpenMP* parallelization on different processors, explicitly specify the number of threads.
  • To ensure that your application has deterministic behavior with OpenMP* parallelization and does not adjust the number of threads dynamically at run time, set MKL_DYNAMIC and OMP_DYNAMIC to FALSE. This is especially needed if you are running your program on different systems.
  • If you are running your program with the Intel® Threading Building Blocks parallelization, numerical reproducibility is not guaranteed.

OpenMP* Offload

Starting in version 2024.1, numerical reproducibility is supported for using OpenMP* offload to execute BLAS level-3 routines and batched extensions on the GPU. CNR will be enabled for GPU whenever any CNR code branch is enabled (that is, in the case of a setting other than MKL_CBWR_OFF or MKL_CBWR_BRANCH_OFF). For more information on CNR support for GPU, see the oneMKL Developer Guide.

Strict CNR Mode

In strict CNR mode, oneAPI Math Kernel Library provides bitwise reproducible results for a limited set of functions and code branches even when the number of threads changes. These routines and branches support strict CNR mode (64-bit libraries only):

  • ?gemm, ?symm, ?hemm, ?trsm, and their CBLAS equivalents (cblas_?gemm, cblas_?symm, cblas_?hemm, and cblas_?trsm.
  • Intel® Advanced Vector Extensions 2 (Intel® AVX2) or Intel® Advanced Vector Extensions 512 (Intel® AVX-512).

When using other routines or CNR branches,oneAPI Math Kernel Library operates in standard (non-strict) CNR mode, subject to the restrictions described above. Enabling strict CNR mode can reduce performance.

NOTE:
  • As usual, you should align your data, even in CNR mode, to obtain the best possible performance. While CNR mode also fully supports unaligned input and output data, the use of it might reduce the performance of some oneAPI Math Kernel Library functions on earlier Intel processors. To ensure proper alignment of arrays, allocate memory for them using mkl_malloc/mkl_calloc.

  • Conditional Numerical Reproducibility does not ensure that bitwise-identical NaN values are generated when the input data contains NaN values.

  • If dynamic memory allocation fails on one run but succeeds on another run, you may fail to get reproducible results between these two runs.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201