cblas_gemm_bf16bf16f32

Developer Reference for Intel® oneAPI Math Kernel Library for C

Download PDF

ID 766684

Date 10/31/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

cblas_gemm_bf16bf16f32

Computes a matrix-matrix product with general bfloat16 matrices.

Syntax

void cblas_gemm_bf16bf16f32 (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const float alpha, const MKL_BF16 *a, const MKL_INT lda, const MKL_BF16 *b, const MKL_INT ldb, const float beta, float *c, const MKL_INT ldc);

Include Files

mkl.h

Description

The cblas_gemm_bf16bf16f32 routines compute a scalar-matrix-matrix product and adds the result to a scalar-matrix product. The operation is defined as:

C := alpha*op(A) *op(B) + beta*C

where :

op(X) is one of op(X) = X or op(X) = X^T,
alpha and beta are scalars,
A, B, and C are matrices
op(A) is m-by-k matrix,
op(B) is k-by-n matrix,
C is an m-by-n matrix.

Input Parameters

Layout

Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).

transa

Specifies the form of op(A) used in the matrix multiplication:

if transa=CblasNoTrans, then op(A) = A;

if transa=CblasTrans, then op(A) = A^T.

transb

Specifies the form of op(B) used in the matrix multiplication:

if transb=CblasNoTrans, then op(B) = B;

if transb=CblasTrans, then op(B) = B^T.

Specifies the number of rows of the matrix op(A) and of the matrix C. The value of m must be at least zero.

Specifies the number of columns of the matrix op(B) and the number of columns of the matrix C. The value of n must be at least zero.

Specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B). The value of k must be at least zero.

alpha

Specifies the scalar alpha.

transa=CblasNoTrans

transa=CblasTrans

Layout = CblasColMajor

Array, size lda*k

Before entry, the leading m-by-k part of the array a must contain the matrix A.

Array, size lda*m

Before entry, the leading k-by-m part of the array a must contain the matrix A.

Layout = CblasRowMajor

Array, size lda* m

Before entry, the leading k-by-m part of the array a must contain the matrix.

Array, size lda*k

Before entry, the leading m-by-k part of the array a must contain the matrix.

lda

Specifies the leading dimension of a as declared in the calling (sub)program.

	transa=CblasNoTrans	transa=CblasTrans
Layout = CblasColMajor	lda must be at least `max(1, m)`.	lda must be at least `max(1, k)`.
Layout = CblasRowMajor	lda must be at least `max(1, k)`.	lda must be at least `max(1, m)`.

transb=CblasNoTrans

transb=CblasTrans

Layout = CblasColMajor

Array, size ldb by n

Before entry, the leading k-by-n part of the array b must contain the matrix B.

Array, size ldb by k

Before entry the leading n-by-k part of the array b must contain the matrix B.

Layout = CblasRowMajor

Array, size ldb by k

Before entry the leading n-by-k part of the array b must contain the matrix B.

Array, size ldb by n

Before entry, the leading k-by-n part of the array b must contain the matrix B.

ldb

Specifies the leading dimension of b as declared in the calling (sub)program.

	transb=CblasNoTrans	transb=CblasTrans
Layout = CblasColMajor	ldb must be at least `max(1, k)`.	ldb must be at least `max(1, n)`.
Layout = CblasRowMajor	ldb must be at least `max(1, n)`.	ldb must be at least `max(1, k)`.

beta

Specifies the scalar beta. When beta is equal to zero, then c need not be set on input.

Layout = CblasColMajor	Array, size ldc by n. Before entry, the leading m-by-n part of the array c must contain the matrix `C`, except when beta is equal to zero, in which case c need not be set on entry.
Layout = CblasRowMajor	Array, size ldc by m. Before entry, the leading n-by-m part of the array c must contain the matrix `C`, except when beta is equal to zero, in which case c need not be set on entry.

ldc

Specifies the leading dimension of c as declared in the calling (sub)program.

Layout = CblasColMajor	ldc must be at least `max(1, m)`.
Layout = CblasRowMajor	ldc must be at least `max(1, n)`.

Output Parameters

c	Overwritten by `alpha* op(A) * op(B) + beta*C`.

Example

For examples of routine usage, see these code examples in the Intel® oneAPI Math Kernel Library (oneMKL) installation directory:

cblas_gemm_bf16bf16f32: examples\cblas\source\cblas_gemm_bf16bf16f32x.c

Application Notes

On architectures without native bfloat16 hardware instructions, matrix A and B are upconverted to single precision and SGEMM is called to compute matrix multiplication operation.

Parent topic: BLAS-like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for C

cblas_gemm_bf16bf16f32

Syntax

Include Files

Description

Input Parameters

Output Parameters

Example

Application Notes