Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

gemm_*

Computes a matrix-matrix product with general integer matrices.

Syntax

call gemm_s8u8s32(transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)

call gemm_s16s16s32(transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)

Include Files

  • mkl.fi

Description

The gemm_* routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. To get the final result, a vector is added to each row or column of the output matrix. The operation is defined as:

C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset

where :

  • op(X) is either op(X) = X or op(X) = XT,
  • A_offset is an m-by-k matrix with every element equal to the value oa,
  • B_offset is a k-by-n matrix with every element equal to the value ob,
  • C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter,
  • alpha and beta are scalars,
  • A is a matrix such that op(A) is m-by-k,
  • B is a matrix such that op(B) is k-by-n,
  • and C is an m-by-n matrix.

Input Parameters

transa

CHARACTER*1. Specifies the form of op(A) used in the matrix multiplication:

if transa = 'N' or 'n', then op(A) = A;

if transa = 'T' or 't', then op(A) = AT.

transb

CHARACTER*1. Specifies the form of op(B) used in the matrix multiplication:

if transb = 'N' or 'n', then op(B) = B;

if transb = 'T' or 't', then op(B) = BT.

offsetc

CHARACTER*1. Specifies the form of C_offset used in the matrix multiplication.

  • offsetc = 'F' or 'f': oc has a single element and every element of C_offset is equal to this element.
  • offsetc = 'C' or 'c': oc has a size of m and every column of C_offset is equal to oc.
  • offsetc = 'R' or 'r': oc has a size of n and every row of C_offset is equal to oc.

m

INTEGER. Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

n

INTEGER. Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

k

INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

SINGLE PRECISION. Specifies the scalar alpha.

a

INTEGER*1 for gemm_s8u8s32.

INTEGER*2 for gemm_s16s16s32.

Array, size lda by ka, where ka is k when transa = 'N' or 'n', and is m otherwise. Before entry with transa = 'N' or 'n', the leading m-by-k part of the array a must contain the matrix A, otherwise the leading k-by-m part of the array a must contain the matrix A.

lda

INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program.

When transa = 'N' or 'n', then lda must be at least max(1, m), otherwise lda must be at least max(1, k).

oa

INTEGER*1 for gemm_s8u8s32.

INTEGER*2 for gemm_s16s16s32.

Specifies the scalar offset value for matrix A.

b

INTEGER*1 for gemm_s8u8s32. INTEGER*2 for gemm_s16s16s32. Array, size ldb by kb, where kb is n when transa = 'N' or 'n', and is k otherwise. Before entry with transa = 'N' or 'n', the leading k-by-n part of the array b must contain the matrix B, otherwise the leading n-by-k part of the array b must contain the matrix B.

ldb

INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program.

When transb = 'N' or 'n', then ldb must be at least max(1, k), otherwise ldb must be at least max(1, n).

ob

INTEGER*1 for gemm_s8u8s32. INTEGER*2 for gemm_s16s16s32. Specifies the scalar offset value for matrix B.

beta

SINGLE PRECISION. Specifies the scalar beta. When beta is equal to zero, then c need not be set on input.

c

INTEGER*4

Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

ldc

INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program.

The value of ldc must be at least max(1, m).

oc

Array, size len. Specifies the offset values for matrix C.

  • If offsetc = 'F' or 'f': len must be at least 1.
  • If offsetc = 'C' or 'c': len must be at least max(1, m).
  • If offsetc = 'R' or 'r': oc must be at least max(1, n).

Output Parameters

c

Overwritten by alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C+ C_offset.

Application Notes

The matrix-matrix product can be expanded:

(op(A) + A_offset)*(op(B) + B_offset)

= op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset

After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers. In the event of overflow or underflow, the results depend on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.

When using cblas_gemm_s8u8s32 with row-major layout, the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B.

Intermediate integer computations in gemm_s8u8s32 on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer elements of A or B matrices under 8 bits.