Visible to Intel only — GUID: GUID-964535E4-74AC-4612-81D3-11D6F5B89CBA
Visible to Intel only — GUID: GUID-964535E4-74AC-4612-81D3-11D6F5B89CBA
cblas_gemm_*_compute
Computes a matrix-matrix product with general integer matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.
void cblas_gemm_s8u8s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);
void cblas_gemm_s16s16s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);
- mkl.h
The cblas_gemm_*_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling cblas_gemm_*_pack call cblas_gemm_*_compute to compute
C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset,
where:
- op(X) is either op(X) = X or op(X) = XT
- alpha and betaare scalars
- A , B, and C are matrices:
- op(A) is an m-by-k matrix,
- op(B) is a k-by-n matrix,
- C is an m-by-n matrix.
- A_offset is an m-by-k matrix with every element equal to the value oa.
- B_offset is an k-by-n matrix with every element equal to the value ob.
- C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter.
You must use the same value of the Layout parameter for the entire sequence of related cblas_?gemm_pack and cblas_?gemm_compute calls.
For best performance, use the same number of threads for packing and for computing.
If you are packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.
Layout |
CBLAS_LAYOUT Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major(CblasColMajor). |
||||||||||||||||
transa |
MKL_INTSpecifies the form of op(A) used in the packing: If transa = CblasNoTrans op(A) = A. If transa = CblasTrans op(A) = AT. If transa = CblasPacked the matrix in array ais packed into a format internal to Intel® oneAPI Math Kernel Library andlda is ignored. |
||||||||||||||||
transb |
MKL_INT Specifies the form of op(B) used in the packing: If transb = CblasNoTrans op(B) = B. If transb = CblasTrans op(B) = BT. If transb = CblasPacked the matrix in array bis packed into a format internal to Intel® oneAPI Math Kernel Library andldb is ignored. |
||||||||||||||||
offsetc |
CBLAS_OFFSET Specifies the form of C_offset used in the matrix multiplication. If offsetc=CblasFixOffset :oc has a single element and every element of C_offset is equal to this element. If offsetc=CblasColOffset :oc has a size of m and every element of C_offset is equal to oc. If offsetc=CblasRowOffset :oc has a size of n and every element of C_offset is equal to oc. |
||||||||||||||||
m |
MKL_INTSpecifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero. |
||||||||||||||||
n |
MKL_INTSpecifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero. |
||||||||||||||||
k |
MKL_INTSpecifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero. |
||||||||||||||||
alpha |
floatSpecifies the scalar alpha. |
||||||||||||||||
a |
void* for gemm_s8u8s32_compute MKL_INT16* for gemm_s16s16s32_compute
|
||||||||||||||||
lda |
MKL_INTSpecifies the leading dimension of a as declared in the calling (sub)program.
|
||||||||||||||||
oa |
MKL_INT8 for cblas_gemm_s8u8s32_compute MKL_INT16 for cblas_gemm_s16s16s32_compute Specifies the scalar offset value for the matrix A. |
||||||||||||||||
b |
void* for gemm_s8u8s32_compute MKL_INT16* for gemm_s16s16s32_compute
|
||||||||||||||||
ldb |
MKL_INT Specifies the leading dimension of b as declared in the calling (sub)program.
|
||||||||||||||||
ob |
MKL_INT8 for cblas_gemm_s8u8s32_compute MKL_INT16 for cblas_gemm_s16s16s32_compute Specifies the scalar offset value for the matrix B. |
||||||||||||||||
beta |
float Specifies the scalar beta. |
||||||||||||||||
c |
MKL_INT32* Array:
|
||||||||||||||||
ldc |
MKL_INT Specifies the leading dimension of c as declared in the calling (sub)program.
|
||||||||||||||||
oc |
MKL_INT32* Array, size len. Specifies the scalar offset value for the matrix C. If offsetc = CblasFixOffset , len must be at least 1. If offsetc = CblasColOffset , len must be at least max(1, m). If offsetc = CblasRowOffset , len must be at least max(1, n). |
c |
MKL_INT32* Overwritten by the matrix alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset. |
See the following examples in the MKL installation directory to understand the use of these routines:
cblas_gemm_s8u8s32_compute: examples\cblas\source\cblas_gemm_s8u8s32_computex.c
cblas_gemm_s16s16s32_compute: examples\cblas\source\cblas_gemm_s16s16s32_computex.c
You can expand the matrix-matrix product in this manner:
(op(A) + A_offset)*(op(B) + B_offset) = op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset
After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers.
In the event of overflow or underflow, the results depend on the architecture. The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.
When using cblas_gemm_s8u8s32_compute with row-major layout , the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B .