cblas_gemm_*

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

cblas_gemm_*_pack

Pack the matrix into the buffer allocated previously.

Syntax

void cblas_gemm_s8u8s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void *src, const MKL_INT ld, void *dest);

void cblas_gemm_s16s16s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_INT16 *src, const MKL_INT ld, MKL_INT16 *dest);

void cblas_gemm_bf16bf16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_BF16 *src, const MKL_INT ld, MKL_BF16 *dest);

Include Files

mkl.h

Description

The cblas_gemm_*_pack routine is one of a set of related routines that enable the use of an internal packed storage. Call cblas_gemm_*_pack after you allocate a buffer whose size is given by cblas_gemm_*_pack_get_size. The cblas_gemm_*_pack routine packs the identified matrix into the buffer allocated previously.

The cblas_gemm_*_pack routine performs this operation:

dest := op(src) as part of the computation C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset for integer types.

C := alpha*op(A) * op(B) + beta*C for bfloat16 type.

where:

op(X) is one of the operations op(X) = X or op(X) = X^T
alpha and beta are scalars,
src is a matrix,
A , A_offset,B, B_offset,c,and C_offset are matrices
op(src) is an m-by-k matrix if identifier = CblasAMatrix,
op(src) is a k-by-n matrix if identifier =CblasBMatrix ,
dest is the buffer previously allocated to store the matrix packed into an internal format
A_offset is an m-by-k matrix.
B_offset is an k-by-n matrix.
C_offset is an m-by-n matrix.

NOTE:

You must use the same value of the Layout parameter for the entire sequence of related cblas_gemm_*_pack and cblas_gemm_*_compute calls.

For best performance, use the same number of threads for packing and for computing.

If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

Layout

CBLAS_LAYOUT

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major(CblasColMajor).

identifier

CBLAS_IDENTIFIER

Specifies which matrix is to be packed:

If identifier = CblasAMatrix, the A matrix is packed.

If identifier = CblasBMatrix, the B matrix is packed.

trans

CBLAS_TRANSPOSE

Specifies the form of op(src) used in the packing:

If trans = CblasNoTrans op(src) = src.

If trans = CblasTrans op(src) = src^T.

m

MKL_INT

Specifies the number of rows of matrix op(A) and of the matrix C. The value of m must be at least zero.

n

MKL_INT

Specifies the number of columns of matrix op(B) and the number of columns of matrix C. The value of n must be at least zero.

k

MKL_INT

Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). The value of k must be at least zero.

src

MKL_BF16* for cblas_gemm_bf16bf16f32_pack, void* for cblas_gemm_s8u8s32_pack and MKL_INT16* for cblas_gemm_s16s16s32_pack

	identifier = CblasAMatrix	identifier = CblasBMatrix
Layout = CblasColMajor	Size `ld*k`. Before entry, the leading m-by-k part of the array src must contain the matrix `A`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit signed integer.	Size `ld*m`. Before entry, the leading k-by-m part of the array src must contain the matrix `A`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit signed integer.	Size `ld*n`. Before entry, the leading k-by-n part of the array src must contain the matrix `B`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit unsigned integer.	Size `ld*k`. Before entry, the leading n-by-k part of the array src must contain the matrix `B`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit unsigned integer.
Layout = CblasRowMajor	Size `ld*m`. Before entry, the leading k-by-m part of the array src must contain the matrix `A`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit unsigned integer.	Size `ld*k`. Before entry, the leading m-by-k part of the array src must contain the matrix `A`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit unsigned integer.	Size `ld*k`. Before entry, the leading n-by-k part of the array src must contain the matrix `B`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit signed integer.	Size `ld*n`. Before entry, the leading k-by-n part of the array src must contain the matrix `B`. For `cblas_gemm_s8u8s32_pack` the element in src array must be an 8-bit signed integer.

identifier = CblasAMatrix

identifier = CblasBMatrix

trans = CblasNoTrans

trans = CblasTrans

trans = CblasNoTrans

trans = CblasTrans

Layout = CblasColMajor

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Layout = CblasRowMajor

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

ld

MKL_INTSpecifies the leading dimension of src as declared in the calling (sub)program.

	identifier = CblasAMatrix		identifier = CblasBMatrix
	trans = CblasNoTrans	trans = CblasTrans	trans = CblasNoTrans	trans = CblasTrans
Layout = CblasColMajor	ld must be at least `max(1, m)`.	ld must be at least `max(1, k)`.	ld must be at least `max(1, k)`.	ld must be at least `max(1, n)`.
Layout = CblasRowMajor	ld must be at least `max(1, k)`.	ld must be at least `max(1, m)`.	ld must be at least `max(1, n)`.	ld must be at least `max(1, k)`.

dest

MKL_BF16* for cblas_gemm_bf16bf16f32_pack, void* for cblas_gemm_s8u8s32_pack or MKL_INT16* for cblas_gemm_s16s16s32_pack

Buffer for the packed matrix.

Output Parameters

dest	MKL_BF16* for `cblas_gemm_bf16bf16f32_pack`, void* for `cblas_gemm_s8u8s32_pack` or MKL_INT16* for `cblas_gemm_s16s16s32_pack` Overwritten by the matrix `op(src)`stored in a format internal to Intel® oneAPI Math Kernel Library.

Example

See the following examples in the MKL installation directory to understand the use of these routines:

cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c

cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c

cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c

Application Notes

When using cblas_gemm_s8u8s32_pack with row-major layout , the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B .

Parent topic: BLAS-like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_gemm_*_pack

See Also