Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

cblas_gemm_*

Computes a matrix-matrix product with general integer matrices.

Syntax

void cblas_gemm_s8u8s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

void cblas_gemm_s16s16s32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

Include Files

  • mkl.h

Description

The cblas_gemm_* routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. To get the final result, a vector is added to each row or column of the output matrix. The operation is defined as:

C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset

where :

  • op(X) is either op(X) = X or op(X) = XT,
  • A_offset is an m-by-k matrix with every element equal to the value oa,
  • B_offset is a k-by-n matrix with every element equal to the value ob,
  • C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter,
  • alpha and beta are scalars,
  • A is a matrix such that op(A) is m-by-k,
  • B is a matrix such that op(B) is k-by-n,
  • and C is an m-by-n matrix.

Input Parameters

Layout

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

transa

Specifies the form of op(A) used in the matrix multiplication:

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = AT.

transb

Specifies the form of op(B) used in the matrix multiplication:

if transb=CblasNoTrans, then op(B) = B;

if transb=CblasTrans, then op(B) = BT.

offsetc

Specifies the form of C_offset used in the matrix multiplication.

  • offsetc = CblasFixOffset: oc has a single element and every element of C_offset is equal to this element.
  • offsetc = CblasColOffset: oc has a size of m and every column of C_offset is equal to oc.
  • offsetc = CblasRowOffset: oc has a size of n and every row of C_offset is equal to oc.

m

Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

n

Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

k

Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

. Specifies the scalar alpha.

a
 

transa=CblasNoTrans

transa=CblasTrans

Layout = CblasColMajor

Array, size lda*k

Before entry, the leading m-by-k part of the array a must contain the matrix A of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size lda*m

Before entry, the leading k-by-m part of the array a must contain the matrix A of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Layout = CblasRowMajor

Array, size lda* m

Before entry, the leading k-by-m part of the array a must contain the matrix A of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size lda*k

Before entry, the leading m-by-k part of the array a must contain the matrix A of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

lda

Specifies the leading dimension of a as declared in the calling (sub)program.

 

transa=CblasNoTrans

transa=CblasTrans

Layout = CblasColMajor

lda must be at least max(1, m).

lda must be at least max(1, k).

Layout = CblasRowMajor

lda must be at least max(1, k).

lda must be at least max(1, m).

oa

Specifies the scalar offset value for matrix A.

b
 

transb=CblasNoTrans

transb=CblasTrans

Layout = CblasColMajor

Array, size ldb by n

Before entry, the leading k-by-n part of the array b must contain the matrix B of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size ldb by k

Before entry the leading n-by-k part of the array b must contain the matrix B of 8-bit unsigned integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Layout = CblasRowMajor

Array, size ldb by k

Before entry the leading n-by-k part of the array b must contain the matrix B of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

Array, size ldb by n

Before entry, the leading k-by-n part of the array b must contain the matrix B of 8-bit signed integers for cblas_gemm_s8u8s32 or 16-bit signed integers for cblas_gemm_s16s16s32.

ldb

Specifies the leading dimension of b as declared in the calling (sub)program.

 

transb=CblasNoTrans

transb=CblasTrans

Layout = CblasColMajor

ldb must be at least max(1, k).

ldb must be at least max(1, n).

Layout = CblasRowMajor

ldb must be at least max(1, n).

ldb must be at least max(1, k).

ob

Specifies the scalar offset value for matrix B.

beta

Specifies the scalar beta. When beta is equal to zero, then c need not be set on input.

c

Layout = CblasColMajor

Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

 

Layout = CblasRowMajor

Array, size ldc by m. Before entry, the leading n-by-m part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

 
ldc

Specifies the leading dimension of c as declared in the calling (sub)program.

Layout = CblasColMajor

ldc must be at least max(1, m).

 

Layout = CblasRowMajor

ldc must be at least max(1, n).

 
oc

Array, size len. Specifies the offset values for matrix C.

  • If offsetc = CblasFixOffset: len must be at least 1.
  • If offsetc = CblasColOffset: len must be at least max(1, m).
  • If offsetc = CblasRowOffset: oc must be at least max(1, n).

Output Parameters

c

Overwritten by alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C+ C_offset.

Example

For examples of routine usage, see the code in in the following links and in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory:

  • cblas_gemm_s8u8s32: examples\cblas\source\cblas_gemm_s8u8s32x.c

  • cblas_gemm_s16s16s32: examples\cblas\source\cblas_gemm_s16s16s32x.c

Application Notes

The matrix-matrix product can be expanded:

(op(A) + A_offset)*(op(B) + B_offset)

= op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset

After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers. In the event of overflow or underflow, the results depend on the architecture . The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.

When using cblas_gemm_s8u8s32 with row-major layout, the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B.

Intermediate integer computations in cblas_gemm_s8u8s32 on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) architectures without Vector Neural Network Instructions (VNNI) extensions can saturate. This is because only 16-bits are available for the accumulation of intermediate results. You can avoid integer saturation by maintaining all integer elements of A or B matrices under 8 bits.