Visible to Intel only — GUID: GUID-F026EAA4-1AD8-4226-B4D2-0DD2A617C73C
Visible to Intel only — GUID: GUID-F026EAA4-1AD8-4226-B4D2-0DD2A617C73C
gemm_*
Computes a matrix-matrix product with general integer matrices.
Syntax
call gemm_s8u8s32(transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)
call gemm_s16s16s32(transa, transb, offsetc, m, n, k, alpha, a, lda, oa, b, ldb, ob, beta, c, ldc, oc)
Include Files
- mkl.fi
Description
The gemm_* routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. To get the final result, a vector is added to each row or column of the output matrix. The operation is defined as:
C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset
where :
- op(X) is either op(X) = X or op(X) = XT,
- A_offset is an m-by-k matrix with every element equal to the value oa,
- B_offset is a k-by-n matrix with every element equal to the value ob,
- C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter,
- alpha and beta are scalars,
- A is a matrix such that op(A) is m-by-k,
- B is a matrix such that op(B) is k-by-n,
- and C is an m-by-n matrix.
Input Parameters
- transa
-
CHARACTER*1. Specifies the form of op(A) used in the matrix multiplication:
if transa = 'N' or 'n', then op(A) = A;
if transa = 'T' or 't', then op(A) = AT.
- transb
-
CHARACTER*1. Specifies the form of op(B) used in the matrix multiplication:
if transb = 'N' or 'n', then op(B) = B;
if transb = 'T' or 't', then op(B) = BT.
- offsetc
-
CHARACTER*1. Specifies the form of C_offset used in the matrix multiplication.
- offsetc = 'F' or 'f': oc has a single element and every element of C_offset is equal to this element.
- offsetc = 'C' or 'c': oc has a size of m and every column of C_offset is equal to oc.
- offsetc = 'R' or 'r': oc has a size of n and every row of C_offset is equal to oc.
- m
-
INTEGER. Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.
- n
-
INTEGER. Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.
- k
-
INTEGER. Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.
- alpha
-
SINGLE PRECISION. Specifies the scalar alpha.
- a
-
INTEGER*1 for gemm_s8u8s32.
INTEGER*2 for gemm_s16s16s32.
Array, size lda by ka, where ka is k when transa = 'N' or 'n', and is m otherwise. Before entry with transa = 'N' or 'n', the leading m-by-k part of the array a must contain the matrix A, otherwise the leading k-by-m part of the array a must contain the matrix A.
- lda
-
INTEGER. Specifies the leading dimension of a as declared in the calling (sub)program.
When transa = 'N' or 'n', then lda must be at least max(1, m), otherwise lda must be at least max(1, k).
- oa
-
INTEGER*1 for gemm_s8u8s32.
INTEGER*2 for gemm_s16s16s32.
Specifies the scalar offset value for matrix A.
- b
-
INTEGER*1 for gemm_s8u8s32. INTEGER*2 for gemm_s16s16s32. Array, size ldb by kb, where kb is n when transa = 'N' or 'n', and is k otherwise. Before entry with transa = 'N' or 'n', the leading k-by-n part of the array b must contain the matrix B, otherwise the leading n-by-k part of the array b must contain the matrix B.
- ldb
-
INTEGER. Specifies the leading dimension of b as declared in the calling (sub)program.
When transb = 'N' or 'n', then ldb must be at least max(1, k), otherwise ldb must be at least max(1, n).
- ob
-
INTEGER*1 for gemm_s8u8s32. INTEGER*2 for gemm_s16s16s32. Specifies the scalar offset value for matrix B.
- beta
-
SINGLE PRECISION. Specifies the scalar beta. When beta is equal to zero, then c need not be set on input.
- c
-
INTEGER*4
Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.
- ldc
-
INTEGER. Specifies the leading dimension of c as declared in the calling (sub)program.
The value of ldc must be at least max(1, m).
- oc
-
Array, size len. Specifies the offset values for matrix C.
- If offsetc = 'F' or 'f': len must be at least 1.
- If offsetc = 'C' or 'c': len must be at least max(1, m).
- If offsetc = 'R' or 'r': oc must be at least max(1, n).
Output Parameters
c |
Overwritten by alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C+ C_offset. |
Application Notes
The matrix-matrix product can be expanded:
(op(A) + A_offset)*(op(B) + B_offset)
= op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers. In the event of overflow or underflow, the results depend on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.
When using cblas_gemm_s8u8s32 with row-major layout, the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B.
Intermediate integer computations in gemm_s8u8s32 on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer elements of A or B matrices under 8 bits.