Visible to Intel only — GUID: GUID-4D794B60-92F1-437A-899E-60ADA07D2B33
Visible to Intel only — GUID: GUID-4D794B60-92F1-437A-899E-60ADA07D2B33
Compute Modes
BLAS level-3 routines and extensions support alternate compute modes, which can provide increased performance in exchange for different numerical properties or reduced accuracy.
A list of one or more allowed modes can be specified either at compile time, on a per-call or per-source-file basis, or at runtime, using the MKL_BLAS_COMPUTE_MODE environment variable. oneMKL will automatically select an appropriate implementation from this list, taking into account routine parameters and hardware characteristics. In case none of the allowed alternate modes are supported by the given routine on the selected device, or if none of the allowed alternate modes are expected to improve performance, oneMKL will automatically fall back to a standard implementation.
By default, oneMKL does not enable any alternate compute modes. The MKL_BLAS_COMPUTE_MODE environment variable is intended for quickly evaluating whether alternate compute modes provide performance benefits and acceptable accuracy for an application. After initial testing, alternate mode settings can be permanently applied within the application using the per-call or per-source-file APIs.
When running on GPU, oneMKL’s verbose output indicates which mode is used for each call, whether the standard mode or one of the alternate modes. See Checking Which Mode Is Used for more details.
Mode Settings
Available alternate modes are described in the table below. Multiple modes can be combined, allowing oneMKL to choose any of the allowed modes that is expected to provide best performance.
For per-call or per-source-file mode, alternate compute modes are selected by OR’ing together one or more oneapi::mkl::blas::compute_mode values from the following table, e.g.:
using oneapi::mkl::blas;
auto mode_settings = compute_mode::float_to_bf16x2 | compute_mode::float_to_tf32; /* allow either of these two modes */
When making global changes with the MKL_BLAS_COMPUTE_MODE environment variable, multiple modes are combined with commas:
set MKL_BLAS_COMPUTE_MODE=FLOAT_TO_BF16X2,FLOAT_TO_TF32
compute_mode enum value |
Environment variable setting |
Description |
---|---|---|
compute_mode::float_to_bf16 |
FLOAT_TO_BF16 |
Convert single-precision inputs to bfloat16 format internally; output is accumulated in single precision. |
compute_mode::float_to_bf16x2 |
FLOAT_TO_BF16X2 |
Convert each single-precision input value to a sum of two bfloat16 values internally; output is accumulated in single precision. |
compute_mode::float_to_bf16x3 |
FLOAT_TO_BF16X3 |
Convert each single-precision input value to a sum of three bfloat16 values internally; output is accumulated in single precision. |
compute_mode::float_to_tf32 |
FLOAT_TO_TF32 |
Convert each single-precision input value to tf32 format internally; output is accumulated in single precision. |
compute_mode::complex_3m |
COMPLEX_3M |
Reduce the four real multiplications in a standard complex multiplication to three real multiplications. |
compute_mode::any |
ANY |
Allow any alternate compute mode. |
compute_mode::standard |
STANDARD |
Do not allow any alternate compute modes. |
compute_mode::prefer_alternate |
PREFER_ALTERNATE |
Used in conjunction with one or more alternate modes. |
compute_mode::force_alternate |
FORCE_ALTERNATE |
Used in conjunction with one or more alternate modes. |
Per-Call Mode Settings
BLAS level-3 routines and extensions support an optional oneapi::mkl::blas::compute_mode argument to specify a desired mode setting, at the end of the parameter list. For USM APIs, the compute_mode argument goes before the list of input dependencies, if any; either argument may be omitted. For example:
using oneapi::mkl::blas;
sycl::buffer<float, 1> a_buffer, c_buffer;
float *a_ptr, *c_ptr;
/* ... */
// Buffer API
syrk(my_queue, n, k, uplo, trans, alpha, a_buffer, lda, beta, c_buffer, ldc, compute_mode::float_to_bf16);
// Buffer API, forcing float_to_bf16 mode
syrk(my_queue, n, k, uplo, trans, alpha, a_buffer, lda, beta, c_buffer, ldc, compute_mode::float_to_bf16 | compute_mode::force_alternate);
// USM API, without dependencies
syrk(my_queue, n, k, uplo, trans, alpha, a_ptr, lda, beta, c_ptr, ldc, compute_mode::float_to_bf16);
// USM API, with dependencies
syrk(my_queue, n, k, uplo, trans, alpha, a_ptr, lda, beta, c_ptr, ldc, compute_mode::float_to_bf16, {event1, event2});
// USM API, dependencies but no special compute_mode settings
syrk(my_queue, n, k, uplo, trans, alpha, a_ptr, lda, beta, c_ptr, ldc, {event1, event2});
Per-Source-File Mode Settings
You can provide default mode settings for all calls within a source file by defining the MKL_BLAS_COMPUTE_MODE macro before including any oneMKL header files. This macro must be set to an expression of type oneapi::mkl::blas::compute_mode.
#define MKL_BLAS_COMPUTE_MODE oneapi::mkl::blas::compute_mode::complex_3m
#include <oneapi/mkl.hpp>
void my_function() {
/* ... */
// 3M mode will be allowed by default:
gemm(my_queue, m, n, k, trans_a, trans_b, alpha, a, lda, b, ldb, beta, c, ldc);
}
compute_mode parameters passed to a oneMKL routine take precedence over the default setting:
#define MKL_BLAS_COMPUTE_MODE oneapi::mkl::blas::compute_mode::complex_3m
#include <oneapi/mkl.hpp>
void my_function() {
/* ... */
// Provided compute_mode overrides the default 3M mode.
gemm(my_queue, m, n, k, trans_a, trans_b, alpha, a, lda, b, ldb, beta, c, ldc, compute_mode::standard);
}
Runtime Mode Settings
The MKL_BLAS_COMPUTE_MODE environment variable allows you to set default application-wide alternate compute mode settings, as a quick method for evaluating alternate compute modes at runtime. For example, with bash or a similar shell:
// my_application.cpp
#include <oneapi/mkl.hpp>
int main() {
/* ... */
// Call to gemm without a compute_mode argument:
oneapi::mkl::blas::gemm(my_queue, /* ... */);
}
export MKL_BLAS_COMPUTE_MODE=FLOAT_TO_BF16X3
my_application # gemm may use bf16x3 arithmetic.
Any per-call or per-source-file mode settings take precedence over the MKL_BLAS_COMPUTE_MODE environment variable. One result of this is that compiling an application with -DMKL_BLAS_COMPUTE_MODE=oneapi::mkl::blas::compute_mode::standard effectively disables the environment variable.
Checking Which Mode Is Used
On GPU, when oneMKL’s verbose mode is enabled, information on the compute mode(s) enabled and used for each call is provided in the verbose log. For example, in the following call, float_to_bf16 and float_to_tf32 were both enabled, and float_to_bf16 was selected (some output omitted for clarity):
MKL_VERBOSE oneapi::mkl::blas::column_major::gemm[float](0x7ffd39046350,...,float_to_bf16|float_to_tf32) mode:float_to_bf16 host:nan device:nan GPU0
Verbose mode can be enabled by setting the MKL_VERBOSE environment variable to 1, or via the mkl_verbose API. For more information, see “Using oneMKL Verbose Mode” in the Intel(R) oneAPI Math Kernel Library Developer Guide, available in the Intel Software Documentation Library.