Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

cblas_?gemm_pack

Performs scaling and packing of the matrix into the previously allocated buffer.

Syntax

void cblas_hgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const MKL_F16 alpha, const MKL_F16 *src, const MKL_INT ld, MKL_F16 *dest);

void cblas_sgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const float *src, const MKL_INT ld, float *dest);

void cblas_dgemm_pack (const CBLAS_LAYOUT Layout, const CBLAS_IDENTIFIER identifier, const CBLAS_TRANSPOSE trans, const MKL_INT m, const MKL_INT n, const MKL_INT k, const double alpha, const double *src, const MKL_INT ld, double *dest);

Include Files

  • mkl.h

Description

The cblas_?gemm_pack routine is one of a set of related routines that enable use of an internal packed storage. Call cblas_?gemm_pack after you allocate a buffer whose size is given by cblas_?gemm_pack_getsize. The cblas_?gemm_pack routine scales the identified matrix by alpha and packs it into the buffer allocated previously.

NOTE:

Do not copy the packed matrix to a different address because the internal implementation depends on the alignment of internally-stored metadata.

The cblas_?gemm_pack routine performs this operation:

dest := alpha*op(src) as part of the computation C := alpha*op(A)*op(B) + beta*C

where:

  • op(X) is one of the operations op(X) = X, op(X) = XT, or op(X) = XH,
  • alpha and beta are scalars,
  • src is a matrix,
  • A , B, and C are matrices
  • op(src) is an m-by-k matrix if identifier = CblasAMatrix,
  • op(src) is a k-by-n matrix if identifier = CblasBMatrix,
  • dest is an internal packed storage buffer.

NOTE:

You must use the same value of the Layout parameter for the entire sequence of related cblas_?gemm_pack and cblas_?gemm_compute calls.

For best performance, use the same number of threads for packing and for computing.

If packing for both A and B matrices, you must use the same number of threads for packing A as for packing B.

Input Parameters

Layout

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

identifier

Specifies which matrix is to be packed:

If identifier = CblasAMatrix, the routine allocates storage to pack matrix A.

If identifier = CblasBMatrix, the routine allocates storage to pack matrix B.

trans

Specifies the form of op(src) used in the packing:

If trans = CblasNoTrans  op(src) = src.

If trans = CblasTrans  op(src) = srcT.

If trans = CblasConjTrans  op(src) = srcH.

m

Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

n

Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

k

Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

Specifies the scalar alpha.

src

Array:

 

identifier = CblasAMatrix

identifier = CblasBMatrix

 

trans = CblasNoTrans

trans = CblasTrans or trans = CblasConjTrans

trans = CblasNoTrans

trans = CblasTrans or trans = CblasConjTrans

Layout = CblasColMajor

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

Layout = CblasRowMajor

Size ld*m.

Before entry, the leading k-by-m part of the array src must contain the matrix A.

Size ld*k.

Before entry, the leading m-by-k part of the array src must contain the matrix A.

Size ld*k.

Before entry, the leading n-by-k part of the array src must contain the matrix B.

Size ld*n.

Before entry, the leading k-by-n part of the array src must contain the matrix B.

ld

Specifies the leading dimension of src as declared in the calling (sub)program.

 

identifier = CblasAMatrix

identifier = CblasBMatrix

 

trans = CblasNoTrans

trans = CblasTrans or trans = CblasConjTrans

trans = CblasNoTrans

trans = CblasTrans or trans = CblasConjTrans

Layout = CblasColMajor

ld must be at least max(1, m).

ld must be at least max(1, k).

ld must be at least max(1, k).

ld must be at least max(1, n).

Layout = CblasRowMajor

ld must be at least max(1, k).

ld must be at least max(1, m).

ld must be at least max(1, n).

ld must be at least max(1, k).

dest

Scaled and packed internal storage buffer.

Output Parameters

dest

Overwritten by the matrix alpha*op(src).

See Also