Visible to Intel only — GUID: GUID-4AA9082D-F781-4145-950F-091994B2A3A7
Getting Help and Support
What's New
Notational Conventions
Related Information
Getting Started
Structure of the Intel® oneAPI Math Kernel Library
Linking Your Application with the Intel® oneAPI Math Kernel Library
Managing Performance and Memory
Language-specific Usage Options
Obtaining Numerically Reproducible Results
Coding Tips
Managing Output
Working with the Intel® oneAPI Math Kernel Library Cluster Software
Managing Behavior of the Intel® oneAPI Math Kernel Library with Environment Variables
Configuring Your Integrated Development Environment to Link with Intel® oneAPI Math Kernel Library
Intel® oneAPI Math Kernel Library Benchmarks
Appendix A: Intel® oneAPI Math Kernel Library Language Interfaces Support
Appendix B: Support for Third-Party Interfaces
Appendix C: Directory Structure in Detail
Notices and Disclaimers
OpenMP* Threaded Functions and Problems
Functions Threaded with Intel® Threading Building Blocks
Avoiding Conflicts in the Execution Environment
Techniques to Set the Number of Threads
Setting the Number of Threads Using an OpenMP* Environment Variable
Changing the Number of OpenMP* Threads at Run Time
Using Additional Threading Control
Calling oneMKL Functions from Multi-threaded Applications
Using Intel® Hyper-Threading Technology
Managing Multi-core Performance
Managing Performance with Heterogeneous Cores
Getting Started with Conditional Numerical Reproducibility
Specifying Code Branches
Reproducibility Conditions
Setting the Environment Variable for Conditional Numerical Reproducibility
Code Examples
C Example of CNR
Fortran Example of CNR
Use of CNR with Unaligned Data in C
Use of CNR with Unaligned Data in Fortran
Overview of the Intel® Distribution for LINPACK* Benchmark
Overview of the Intel® Optimized HPL-AI* Benchmark
Contents of the Intel® Distribution for LINPACK* Benchmark and the Intel® Optimized HPL-AI* Benchmark
Building the Intel® Distribution for LINPACK* Benchmark and the Intel® Optimized HPL-AI* Benchmark for a Customized MPI Implementation
Building the Netlib HPL from Source Code
Configuring Parameters
Ease-of-use Command-line Parameters
Running the Intel® Distribution for LINPACK* Benchmark and the Intel® Optimized HPL-AI* Benchmark
Heterogeneous Support in the Intel® Distribution for LINPACK* Benchmark
Environment Variables
Improving Performance of Your Cluster
Overview of the Intel Optimized HPCG
Versions of the Intel® CPU Optimized HPCG
Versions of the Intel® GPU Optimized HPCG
Getting Started with Intel® CPU Optimized HPCG
Getting Started with Intel® GPU Optimized HPCG
Choosing the Best Configuration and Problem Sizes for CPUs
Choosing the Best HPCG Configuration for GPUs
Visible to Intel only — GUID: GUID-4AA9082D-F781-4145-950F-091994B2A3A7
Code Examples
The following simple programs show how to obtain reproducible results from run to run of Intel® oneAPI Math Kernel Library (oneMKL) functions. See the Intel® oneAPI Math Kernel Library (oneMKL) Developer Reference for more examples.
C Example of CNR
#include <mkl.h> int main(void) { int my_cbwr_branch; /* Align all input/output data on 64-byte boundaries */ /* for best performance of Intel® oneAPI Math Kernel Library (oneMKL) */ void *darray; int darray_size=1000; /* Set alignment value in bytes */ int alignment=64; /* Allocate aligned array */ darray = mkl_malloc (sizeof(double)*darray_size, alignment); /* Find the available MKL_CBWR_BRANCH automatically */ my_cbwr_branch = mkl_cbwr_get_auto_branch(); /* User code without oneMKL calls */ /* Piece of the code where CNR of oneMKL is needed */ /* The performance of oneMKL functions might be reduced for CNR mode */ /* If the "IF" statement below is commented out, Intel® oneAPI Math Kernel Library (oneMKL) will run in a regular mode, */ /* and data alignment will allow you to get best performance */ if (mkl_cbwr_set(my_cbwr_branch)) { printf("Error in setting MKL_CBWR_BRANCH! Aborting…\n"); return; } /* CNR calls to oneMKL + any other code */ /* Free the allocated aligned array */ mkl_free(darray); }
Fortran Example of CNR
PROGRAM MAIN INCLUDE 'mkl.fi' INTEGER*4 MY_CBWR_BRANCH ! Align all input/output data on 64-byte boundaries ! for best performance of Intel® oneAPI Math Kernel Library (oneMKL) ! Declare oneMKL memory allocation routine DOUBLE PRECISION DARRAY POINTER (P_DARRAY,DARRAY(1)) INTEGER DARRAY_SIZE PARAMETER (DARRAY_SIZE=1000) ! Set alignment value in bytes INTEGER ALIGNMENT PARAMETER (ALIGNMENT=64) ! Allocate aligned array INTEGER*8 ALLOC_SIZE ALLOC_SIZE = 8*DARRAY_SIZE P_DARRAY = MKL_MALLOC (ALLOC_SIZE, ALIGNMENT); ! Find the available MKL_CBWR_BRANCH automatically MY_CBWR_BRANCH = MKL_CBWR_GET_AUTO_BRANCH() ! User code without oneMKL calls ! Piece of the code where CNR of oneMKL is needed ! The performance of oneMKL functions may be reduced for CNR mode ! If the "IF" statement below is commented out, ! Intel® oneAPI Math Kernel Library (oneMKL) will run in a ! regular mode, and data alignment will enable you to get the best performance IF (MKL_CBWR_SET (MY_CBWR_BRANCH) .NE. MKL_CBWR_SUCCESS) THEN PRINT *, 'Error in setting MKL_CBWR_BRANCH! Aborting…' STOP 0 ENDIF ! CNR calls to oneMKL + any other code ! Free the allocated aligned array CALL MKL_FREE(P_DARRAY) END
Use of CNR with Unaligned Data in C
#include <mkl.h> int main(void) { int my_cbwr_branch; /* If it is not possible to align all input/output data on 64-byte boundaries */ /* to achieve performance, use unaligned IO data with possible performance */ /* penalty */ /* Using unaligned IO data */ double *darray; int darray_size=1000; /* Allocate array, malloc aligns data on 8/16-byte boundary only */ darray = (double *)malloc (sizeof(double)*darray_size); /* Find the available MKL_CBWR_BRANCH automatically */ my_cbwr_branch = mkl_cbwr_get_auto_branch(); /* User code without oneMKL calls */ /* Piece of the code where CNR of oneMKL is needed */ /* The performance of oneMKL functions might be reduced for CNR mode */ /* If the "IF" statement below is commented out, oneMKL will run in a regular mode, */ /* and you will NOT get best performance without data alignment */ if (mkl_cbwr_set(my_cbwr_branch)) { printf("Error in setting MKL_CBWR_BRANCH! Aborting…\n"); return; } /* CNR calls to oneMKL + any other code */ /* Free the allocated array */ free(darray);
Use of CNR with Unaligned Data in Fortran
PROGRAM MAIN INCLUDE 'mkl.fi' INTEGER*4 MY_CBWR_BRANCH ! If it is not possible to align all input/output data on 64-byte boundaries ! to achieve performance, use unaligned IO data with possible performance ! penalty DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: DARRAY INTEGER DARRAY_SIZE, STATUS PARAMETER (DARRAY_SIZE=1000) ! Allocate array with undefined alignment ALLOCATE(DARRAY(DARRAY_SIZE)); ! Find the available MKL_CBWR_BRANCH automatically MY_CBWR_BRANCH = MKL_CBWR_GET_AUTO_BRANCH() ! User code without oneMKL calls ! Piece of the code where CNR of oneMKL is needed ! The performance of oneMKL functions might be reduced for CNR mode ! If the "IF" statement below is commented out, oneMKL will run in a regular mode, ! and you will NOT get best performance without data alignment IF (MKL_CBWR_SET(MY_CBWR_BRANCH) .NE. MKL_CBWR_SUCCESS) THEN PRINT *, 'Error in setting MKL_CBWR_BRANCH! Aborting…' RETURN ENDIF ! CNR calls to oneMKL + any other code ! Free the allocated array DEALLOCATE(DARRAY) END
Parent topic: Obtaining Numerically Reproducible Results