Visible to Intel only — GUID: GUID-C3B46222-5A1D-4650-BA3D-B3DD6AC19B75
Visible to Intel only — GUID: GUID-C3B46222-5A1D-4650-BA3D-B3DD6AC19B75
Using oneMKL Verbose Mode
When building applications that call Intel® oneAPI Math Kernel Library (oneMKL) functions, it may be useful to determine:
- which computational functions are called
- what parameters are passed to them
- how much time is spent to execute the functions
- (for GPU applications) which GPU device the kernel is executed on
You can get an application to print this information to a standard output device by enabling Intel® oneAPI Math Kernel Library (oneMKL) Verbose. Functions that can print this information are referred to as verbose-enabled functions.
When Verbose mode is active in an Intel® oneAPI Math Kernel Library (oneMKL) domain, every call of a verbose-enabled function finishes with printing a human-readable line describing the call. However, if your application gets terminated for some reason during the function call, no information for that function will be printed. The first call to a verbose-enabled function also prints a version information line.
For GPU applications, additional information (one or more GPU information lines) will also be printed by the first call to a verbose-enabled function, following the version information line which will be printed for the host CPU. If there is more than one GPU detected, each GPU device will be printed in a separate line.
We have different implementations for verbose with CPU applications and verbose with GPU applications. The Intel® MKL Verbose mode has 2 modes when used with CPU applications: disabled (default) and enabled. The Intel® MKL Verbose mode has three modes when used with GPU applications: disabled (default), enabled without timing, and enabled with synchronous timing.
To change the verbose mode, either set the environment variable MKL_VERBOSE:
CPU application | GPU application | |
---|---|---|
Set MKL_VERBOSE to 0 | to disable Verbose | to disable Verbose |
Set MKL_VERBOSE to 1 | to enable Verbose | to enable Verbose without timing |
Set MKL_VERBOSE to 2 | to enable Verbose | to enable Verbose with synchronous timing |
or call the support function mkl_verbose(int mode):
CPU application | GPU application | |
---|---|---|
Call mkl_verbose(0) | to disable Verbose | to disable Verbose |
Call mkl_verbose(1) | to enable Verbose | to enable Verbose without timing |
Call mkl_verbose(2) | to enable Verbose | to enable Verbose with synchronous timing |
Verbose with CPU Applications
Verbose output will be consisted of version information line and call description lines for CPU.
For CPU applications, you can enable Intel® oneAPI Math Kernel Library (oneMKL) Verbose mode in these domains:
- BLAS (and BLAS-like extensions)
- LAPACK
- ScaLAPACK (selected functionality)
- FFT
Verbose with GPU Applications
The verbose feature is enabled for GPU applications that uses DPC++ API or C/Fortran API with OpenMP offload. When used with GPU applications, verbose allows the measurement of execution time to be enabled or disabled with verbose mode. Timing is taken synchronously, so if verbose is enabled with timing, kernel executions will become synchronous (previous kernel will block later kernels)
Verbose output will be consisted of version information line, GPU information lines, and call description lines for GPU.
For GPU applications, you can enable Intel® oneAPI Math Kernel Library (oneMKL) Verbose mode in these domains:
- BLAS (and BLAS-like extensions)
- LAPACK
- FFT
For Both CPU and GPU Verbose
Both enabling and disabling of the Verbose mode using the function call takes precedence over the environment setting. For a full description of the mkl_verbose function, see either the Intel® oneAPI Math Kernel Library Developer Reference for C or the Intel® oneAPI Math Kernel Library Developer Reference for Fortran. Both references are available in the Intel® Software Documentation Library.
Intel® oneAPI Math Kernel Library (oneMKL) Verbose mode is not a thread-local but a global state. In other words, if an application changes the mode from multiple threads, the result is undefined.
The performance of an application may degrade with the Verbose mode enabled, especially when the number of calls to verbose-enabled functions is large, because every call to a verbose-enabled function requires an output operation.