cblas_gemm_*

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

cblas_gemm_*_compute

Computes a matrix-matrix product with general integer matrices (where one or both input matrices are stored in a packed data structure) and adds the result to a scalar-matrix product.

Syntax

void cblas_gemm_s8u8s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const void *a, const MKL_INT lda, const MKL_INT8 oa, const void *b, const MKL_INT ldb, const MKL_INT8 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

void cblas_gemm_s16s16s32_compute(const CBLAS_LAYOUT Layout, const MKL_INT transa, const MKL_INT transb, const CBLAS_OFFSET offsetc, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_INT16 *a, const MKL_INT lda, const MKL_INT16 oa, const MKL_INT16 *b, const MKL_INT ldb, const MKL_INT16 ob, const float beta, MKL_INT32 *c, const MKL_INT ldc, const MKL_INT32 *oc);

Include Files

mkl.h

Description

The cblas_gemm_*_compute routine is one of a set of related routines that enable use of an internal packed storage. After calling cblas_gemm_*_pack call cblas_gemm_*_compute to compute

C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset,

where:

op(X) is either op(X) = X or op(X) = X^T
alpha and betaare scalars
A , B, and C are matrices:
op(A) is an m-by-k matrix,
op(B) is a k-by-n matrix,
C is an m-by-n matrix.
A_offset is an m-by-k matrix with every element equal to the value oa.
B_offset is an k-by-n matrix with every element equal to the value ob.
C_offset is an m-by-n matrix defined by the oc array as described in the description of the offsetc parameter.

NOTE:

You must use the same value of the Layout parameter for the entire sequence of related cblas_?gemm_pack and cblas_?gemm_compute calls.

For best performance, use the same number of threads for packing and for computing.

If you are packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

Layout

CBLAS_LAYOUT

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major(CblasColMajor).

transa

MKL_INTSpecifies the form of op(A) used in the packing:

If transa = CblasNoTrans op(A) = A.

If transa = CblasTrans op(A) = A^T.

If transa = CblasPacked the matrix in array ais packed into a format internal to Intel® oneAPI Math Kernel Library (oneMKL) andlda is ignored.

transb

MKL_INT Specifies the form of op(B) used in the packing:

If transb = CblasNoTrans op(B) = B.

If transb = CblasTrans op(B) = B^T.

If transb = CblasPacked the matrix in array bis packed into a format internal to Intel® oneAPI Math Kernel Library (oneMKL) andldb is ignored.

offsetc

CBLAS_OFFSET Specifies the form of C_offset used in the matrix multiplication.

If offsetc=CblasFixOffset :oc has a single element and every element of C_offset is equal to this element.

If offsetc=CblasColOffset :oc has a size of m and every element of C_offset is equal to oc.

If offsetc=CblasRowOffset :oc has a size of n and every element of C_offset is equal to oc.

MKL_INTSpecifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

MKL_INTSpecifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

MKL_INTSpecifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

floatSpecifies the scalar alpha.

void* for gemm_s8u8s32_compute

MKL_INT16* for gemm_s16s16s32_compute

Layout = CblasColMajor

transa = CblasNoTrans

Array, size lda*k.

Before entry, the leading m-by-k part of the array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit signed integer.

transa = CblasTrans

Array, size lda*m.

Before entry, the leading k-by-m part of the array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit signed integer.

transa = CblasPacked

Array of size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack

Layout = CblasRowMajor

transa = CblasNoTrans

Array, size lda*m.

Before entry, the leading k-by-m part of the array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit unsigned integer.

transa = CblasTrans

Array, size lda*k.

Before entry, the leading m-by-k part of the array a must contain the matrix A.

For cblas_gemm_s8u8s32_compute, the element in the a array must be an 8-bit unsigned integer.

transa = CblasPacked

Array size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack

lda

MKL_INTSpecifies the leading dimension of a as declared in the calling (sub)program.

	transa = CblasNoTrans	transa = CblasTrans
Layout = CblasColMajor	lda must be at least `max(1, m)`.	lda must be at least `max(1, k)`.
Layout = CblasRowMajor	lda must be at least `max(1, k)`.	lda must be at least `max(1, m)`.

MKL_INT8 for cblas_gemm_s8u8s32_compute

MKL_INT16 for cblas_gemm_s16s16s32_compute

Specifies the scalar offset value for the matrix A.

void* for gemm_s8u8s32_compute

MKL_INT16* for gemm_s16s16s32_compute

Layout = CblasColMajor

transa = CblasNoTrans

Array, size ldb*n.

Before entry, the leading k-by-n part of the array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit unsigned integer.

transa = CblasTrans

Array, size ldb*k.

Before entry, the leading n-by-k part of the array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit unsigned integer.

transa = CblasPacked

Array of size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack

Layout = CblasRowMajor

transa = CblasNoTrans

Array, sizeldb*k.

Before entry, the leading n-by-k part of the array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit signed integer.

transa = CblasTrans

Array, size ldb*n.

Before entry, the leading k-by-n part of the array b must contain the matrix B.

For cblas_gemm_s8u8s32_compute, the element in the b array must be an 8-bit signed integer.

transa = CblasPacked

Array of size returned by cblas_gemm_*_pack_get_size and initialized using cblas_gemm_*_pack

ldb

MKL_INT Specifies the leading dimension of b as declared in the calling (sub)program.

	transb = CblasNoTrans	transb = CblasTrans
Layout = CblasColMajor	ldb must be at least `max(1, k)`.	ldb must be at least `max(1, n)`.
Layout = CblasRowMajor	ldb must be at least `max(1, n)`.	ldb must be at least `max(1, k)`.

MKL_INT8 for cblas_gemm_s8u8s32_compute

MKL_INT16 for cblas_gemm_s16s16s32_compute

Specifies the scalar offset value for the matrix B.

beta

float

Specifies the scalar beta.

MKL_INT32*

Array:

Layout = CblasColMajor

Array, size ldc*n.

Before entry, the leading m-by-n part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

Layout = CblasRowMajor

Array, size ldc*m.

Before entry, the leading n-by-m part of the array c must contain the matrix C, except when beta is equal to zero, in which case c need not be set on entry.

ldc

MKL_INT Specifies the leading dimension of c as declared in the calling (sub)program.

Layout = CblasColMajor	ldc must be at least `max(1, m)`
Layout = CblasRowMajor	ldc must be at least `max(1, n)`

MKL_INT32*

Array, size len. Specifies the scalar offset value for the matrix C.

If offsetc = CblasFixOffset , len must be at least 1.

If offsetc = CblasColOffset , len must be at least max(1, m).

If offsetc = CblasRowOffset , len must be at least max(1, n).

Output Parameters

c	MKL_INT32* Overwritten by the matrix `alpha`(`op(A) + A_offset)(op(B) + B_offset) + beta*C + C_offset`.

Example

See the following examples in the MKL installation directory to understand the use of these routines:

cblas_gemm_s8u8s32_compute: examples\cblas\source\cblas_gemm_s8u8s32_computex.c

cblas_gemm_s16s16s32_compute: examples\cblas\source\cblas_gemm_s16s16s32_computex.c

Application Notes

You can expand the matrix-matrix product in this manner:

(op(A) + A_offset)*(op(B) + B_offset) = op(A)*op(B) + op(A)*B_offset + A_offset*op(B) + A_offset*B_offset

After computing these four multiplication terms separately, they are summed from left to right. The results from the matrix-matrix product and the C matrix are scaled with alpha and beta floating-point values respectively using double-precision arithmetic. Before storing the results to the output c array, the floating-point values are rounded to the nearest integers.

In the event of overflow or underflow, the results depend on the architecture. The results are either unsaturated (wrapped) or saturated to maximum or minimum representable integer values for the data type of the output matrix.

When using cblas_gemm_s8u8s32_compute with row-major layout , the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B .

Parent topic: BLAS-like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_gemm_*_compute

Syntax

Include Files

Description

Input Parameters

Output Parameters

Example

Application Notes

See Also