Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

ID 766690
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Call Description Line

Call Description Line for CPU

In Intel® oneAPI Math Kernel Library (oneMKL) Verbose mode, each verbose-enabled function called from your application prints a call description line. The line begins with the MKL_VERBOSE character string and uses spaces as delimiters. The format of the rest of the line is subject to change in a future release.

The following table lists information contained in a call description line for Verbose with CPU applications and provides available links for more information:

Information

Description

Related Links

The name of the function.

Although the name printed may differ from the name used in the source code of the application (for example, the cblas_ prefix of CBLAS functions is not printed), you can easily recognize the function by the printed name.

 

Values of the arguments.

  • The values are listed in the order of the formal argument list. The list directly follows the function name, it is parenthesized and comma-separated.
  • Arrays are printed as addresses (to see the alignment of the data).
  • Integer scalar parameters passed by reference are printed by value. Zero values are printed for NULL references.
  • Character values are printed without quotes.
  • For all parameters passed by reference, the values printed are the values returned by the function. For example, the printed value of the info parameter of a LAPACK function is its value after the function execution.
  • For verbose-enabled functions in the ScaLAPACK domain, in addition to the standard input parameters, information about blocking factors, MPI rank, and process grid is also printed.
 

Time taken by the function.

  • The time is printed in convenient units (seconds, milliseconds, and so on), which are explicitly indicated.

  • The time may fluctuate from run to run.

  • The time printed may occasionally be larger than the time actually taken by the function call, especially for small problem sizes and multi-socket machines.To reduce this effect, bind threads that call Intel® oneAPI Math Kernel Library (oneMKL) to CPU cores by setting an affinity mask.

Managing Multi-core Performance for options to set an affinity mask.

Value of the MKL_CBWR environment variable.

The value printed is prefixed with CNR:

Getting Started with Conditional Numerical Reproducibility

Value of the MKL_DYNAMIC environment variable.

The value printed is prefixed with Dyn:

MKL_DYNAMIC

Status of the Intel® oneAPI Math Kernel Library (oneMKL)memory manager.

The value printed is prefixed with FastMM:

Avoiding Memory Leaks in oneMKLfor a description of the Intel® oneAPI Math Kernel Library (oneMKL)memory manager

OpenMP* thread number of the calling thread.

The value printed is prefixed with TID:

 

Values of Intel® oneAPI Math Kernel Library (oneMKL) environment variables defining the general and domain-specific numbers of threads, separated by a comma.

The first value printed is prefixed with NThr:

oneMKL-specific Environment Variables for Threading Control

The following is an example of a call description line (with OpenMP threading):

MKL_VERBOSE DGEMM(n,n,1000,1000,240,0x7ffff708bb30,0x7ff2aea4c000,1000,0x7ff28e92b000,240,0x7ffff708bb38,0x7ff28e08d000,1000) 1.66ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:16

The following is an example of a call description line (with TBB threading):

MKL_VERBOSE DGEMM(n,n,1000,1000,240,0x7ffff708bb30,0x7ff2aea4c000,1000,0x7ff28e92b000,240,0x7ffff708bb38,0x7ff28e08d000,1000) 1.66ms CNR:OFF Dyn:1 FastMM:1

NOTE:
For more information about selected threading, refer to Version Information Line.

The following information is not printed because of limitations of Intel® oneAPI Math Kernel Library (oneMKL) Verbose mode:

  • Input values of parameters passed by reference if the values were changed by the function.

    For example, if a LAPACK function is called with a workspace query, that is, the value of the lwork parameter equals -1 on input, the call description line prints the result of the query and not -1.

  • Return values of functions.

    For example, the value returned by the function ilaenv is not printed.

  • Floating-point scalars passed by reference.

Call Description Line for GPU

In Intel® oneAPI Math Kernel Library (oneMKL) Verbose mode, each verbose-enabled function called from your application prints a call description line. The line begins with the MKL_VERBOSE character string and uses spaces as delimiters. The format of the rest of the line may change in a future release.

The following table lists information contained in a call description line for verbose with GPU applications.

Information Description
The name of the function Although the name printed may differ from the name used in the source code of the application, you can easily recognize the function by the printed name.
The values of the arguments
  • The values are listed in the order of the formal argument list. The list directly follows the function name, and it is parenthesized and comma-separated.
  • Arrays are printed as addresses (to show the alignment of the data).
  • Integer scalar parameters passed by reference are printed by value. Zero values are printed for NULL references.
  • Character values are printed without quotation marks.
  • For all parameters passed by reference, the values printed are the values returned by the function.
Time taken by the function
  • If verbose is enabled with timing for GPU applications, kernel executions will become synchronous (previous kernel will block later kernels) and the measured time may include potential data transfers and/or data copies in host and devices.
  • If Verbose is enabled without timing for GPU applications, time will be printed out as 0.
  • The time is printed in convenient units (seconds, milliseconds, and so on), which are explicitly indicated.
  • The time may fluctuate from run to run.
  • The time printed may occasionally be larger than the time actually taken by the function call, especially for small problem sizes.
Device index

The index of the GPU device on which the kernel is being executed will be printed after the character string "GPU" (e.g. GPU0, GPU1, GPU2, etc). Use the index and refer to the GPU information lines for more information about the specific device.

If the kernel is executed on the host CPU, this field will be empty.

The following is an example of a call description line:

MKL_VERBOSE FFT(dcfi64) 224.30us GPU0
Some Limitations:

For GPU applications, the call description lines may be printed out-of-order (the order of the call description lines printed in the verbose output may not be the order in which the kernels are submitted in the functions) for the following two cases:

  • Verbose is enabled without timing and the kernel executions stay asynchronous.
  • The kernel is not executed on one of the GPU devices, but on the host CPU (the device index will not be printed in this case).