Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

cblas_gemm_*_pack

Pack the matrix into the buffer allocated previously.

Syntax

void cblas_gemm_s8u8s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void *src, const MKL_INT ld, void *dest);

void cblas_gemm_s16s16s32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_INT16 *src, const MKL_INT ld, MKL_INT16 *dest);

void cblas_gemm_bf16bf16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_BF16 *src, const MKL_INT ld, MKL_BF16 *dest);

void cblas_gemm_f16f16f32_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest);

Include Files

  • mkl.h

Description

The cblas_gemm_*_pack routine is one of a set of related routines that enable the use of an internal packed storage. Call cblas_gemm_*_pack after you allocate a buffer whose size is given by cblas_gemm_*_pack_get_size. The cblas_gemm_*_pack routine packs the identified matrix into the buffer allocated previously.

The cblas_gemm_*_pack routine performs this operation:

dest := op(src) as part of the computation C := alpha*(op(A) + A_offset)*(op(B) + B_offset) + beta*C + C_offset for integer types.

C := alpha*op(A) * op(B) + beta*C for bfloat16 type.

where:

  • op(X) is one of the operations op(X) = X or op(X) = XT
  • alpha and beta are scalars,
  • src is a matrix,
  • A , A_offset,B, B_offset,c,and C_offset are matrices
  • op(src) is an m-by-k matrix if identifier = CblasAMatrix,
  • op(src) is a k-by-n matrix if identifier =CblasBMatrix ,
  • dest is the buffer previously allocated to store the matrix packed into an internal format
  • A_offset is an m-by-k matrix.
  • B_offset is an k-by-n matrix.
  • C_offset is an m-by-n matrix.

NOTE:

You must use the same value of the Layout parameter for the entire sequence of related cblas_gemm_*_pack and cblas_gemm_*_compute calls.

For best performance, use the same number of threads for packing and for computing.

If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

Layout
CBLAS_LAYOUT

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major(CblasColMajor).

identifier
CBLAS_IDENTIFIER

Specifies which matrix is to be packed:

If identifier = CblasAMatrix, the A matrix is packed.

If identifier = CblasBMatrix, the B matrix is packed.

trans
CBLAS_TRANSPOSE

Specifies the form of op(src) used in the packing:

If trans = CblasNoTrans  op(src) = src.

If trans = CblasTrans  op(src) = srcT.

m
MKL_INT

Specifies the number of rows of matrix op(A) and of the matrix C. The value of m must be at least zero.

n
MKL_INT

Specifies the number of columns of matrix op(B) and the number of columns of matrix C. The value of n must be at least zero.

k
MKL_INT

Specifies the number of columns of matrix op(A) and the number of rows of matrix op(B). The value of k must be at least zero.

src

MKL_BF16* for cblas_gemm_bf16bf16f32_pack, MKL_F16* for cblas_gemm_f16f16f32_pack, void* for cblas_gemm_s8u8s32_pack and MKL_INT16* for cblas_gemm_s16s16s32_pack

 

identifier = CblasAMatrix

identifier = CblasBMatrix

 

trans = CblasNoTrans

trans = CblasTrans

trans = CblasNoTrans

trans = CblasTrans

Layout = CblasColMajor

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Layout = CblasRowMajor

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit unsigned integer.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

For cblas_gemm_s8u8s32_pack the element in src array must be an 8-bit signed integer.

ld

MKL_INTSpecifies the leading dimension of src as declared in the calling (sub)program.

 

identifier = CblasAMatrix

identifier = CblasBMatrix

 

trans = CblasNoTrans

trans = CblasTrans

trans = CblasNoTrans

trans = CblasTrans

Layout = CblasColMajor

ld must be at least max(1, m).

ld must be at least max(1, k).

ld must be at least max(1, k).

ld must be at least max(1, n).

Layout = CblasRowMajor

ld must be at least max(1, k).

ld must be at least max(1, m).

ld must be at least max(1, n).

ld must be at least max(1, k).

dest
MKL_BF16* for cblas_gemm_bf16bf16f32_pack, MKL_F16* for cblas_gemm_f16f16f32_pack, void* for cblas_gemm_s8u8s32_pack or MKL_INT16* for cblas_gemm_s16s16s32_pack

Buffer for the packed matrix.

Output Parameters

dest

MKL_BF16* for cblas_gemm_bf16bf16f32_pack, MKL_F16* for cblas_gemm_f16f16f32_pack, void* for cblas_gemm_s8u8s32_pack or MKL_INT16* for cblas_gemm_s16s16s32_pack

Overwritten by the matrix op(src)stored in a format internal to Intel® oneAPI Math Kernel Library (oneMKL).

Example

See the following examples in the MKL installation directory to understand the use of these routines:

cblas_gemm_s8u8s32_pack: examples\cblas\source\cblas_gemm_s8u8s32_computex.c

cblas_gemm_s16s16s32_pack: examples\cblas\source\cblas_gemm_s16s16s32_computex.c

cblas_gemm_bf16bf16f32_pack: examples\cblas\source\cblas_gemm_bf16bf16f32_computex.c

cblas_gemm_f16f16f32_pack: examples\cblas\source\cblas_gemm_f16f16f32_computex.c

Application Notes

When using cblas_gemm_s8u8s32_pack with row-major layout , the data types of A and B must be swapped. That is, you must provide an 8-bit unsigned integer array for matrix A and an 8-bit signed integer array for matrix B .

See Also