Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/22/2024

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents


Computes scalar-matrix-matrix products and adds the results to scalar matrix products for groups of general matrices.


void cblas_sgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const float* alpha_array, const float **a_array, const MKL_INT* lda_array, const float **b_array, const MKL_INT* ldb_array, const float* beta_array, float **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);

void cblas_dgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const double* alpha_array, const double **a_array, const MKL_INT* lda_array, const double **b_array, const MKL_INT* ldb_array, const double* beta_array, double **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);

void cblas_cgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);

void cblas_zgemm_batch (const CBLAS_LAYOUT Layout, const CBLAS_TRANSPOSE* transa_array, const CBLAS_TRANSPOSE* transb_array, const MKL_INT* m_array, const MKL_INT* n_array, const MKL_INT* k_array, const void *alpha_array, const void **a_array, const MKL_INT* lda_array, const void **b_array, const MKL_INT* ldb_array, const void *beta_array, void **c_array, const MKL_INT* ldc_array, const MKL_INT group_count, const MKL_INT* group_size);

Include Files

  • mkl.h


The ?gemm_batch routines perform a series of matrix-matrix operations with general matrices. They are similar to the ?gemm routine counterparts, but the ?gemm_batch routines perform matrix-matrix operations with groups of matrices, processing a number of groups at once. The groups contain matrices with the same parameters.

The operation is defined as

idx = 0
for i = 0..group_count - 1
     alpha and beta in alpha_array[i] and beta_array[i]
     for j = 0..group_size[i] - 1 
          A, B, and C matrix in a_array[idx], b_array[idx], and c_array[idx]
          C := alpha*op(A)*op(B) + beta*C,
          idx = idx + 1
     end for
 end for


op(X) is one of op(X) = X, or op(X) = XT, or op(X) = XH,

alpha and beta are scalar elements of alpha_array and beta_array,

A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:

op(A) is an m-by-k matrix,

op(B) is a k-by-n matrix,

C is an m-by-n matrix.

A, B, and C represent matrices stored at addresses pointed to by a_array, b_array, and c_array, respectively. The number of entries in a_array, b_array, and c_array is total_batch_count = the sum of all of the group_size entries.

See also gemm for a detailed description of multiplication for general matrices and ?gemm3m_batch, BLAS-like extension routines for similar matrix-matrix operations.


Error checking is not performed for oneMKL Windows* single dynamic libraries for the?gemm_batch routines.

Input Parameters


Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).


Array of size group_count. For the group i, transai = transa_array[i] specifies the form of op(A) used in the matrix multiplication:

if transai = CblasNoTrans, then op(A) = A;

if transai = CblasTrans, then op(A) = AT;

if transai = CblasConjTrans, then op(A) = AH.


Array of size group_count. For the group i, transbi = transb_array[i] specifies the form of op(Bi) used in the matrix multiplication:

if transbi = CblasNoTrans, then op(B) = B;

if transbi = CblasTrans, then op(B) = BT;

if transbi = CblasConjTrans, then op(B) = BH.


Array of size group_count. For the group i, mi = m_array[i] specifies the number of rows of the matrix op(A) and of the matrix C.

The value of each element of m_array must be at least zero.


Array of size group_count. For the group i, ni = n_array[i] specifies the number of columns of the matrix op(B) and the number of columns of the matrix C.

The value of each element of n_array must be at least zero.


Array of size group_count. For the group i, ki = k_array[i] specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).

The value of each element of k_array must be at least zero.


Array of size group_count. For the group i, alpha_array[i] specifies the scalar alphai.


Array, size total_batch_count, of pointers to arrays used to store A matrices.


Array of size group_count. For the group i, ldai = lda_array[i] specifies the leading dimension of the array storing matrix A as declared in the calling (sub)program.



transai=CblasTrans or transai=CblasConjTrans

Layout = CblasColMajor

ldai must be at least max(1, mi).

ldai must be at least max(1, ki)

Layout = CblasRowMajor

ldai must be at least max(1, ki)

ldai must be at least max(1, mi).


Array, size total_batch_count, of pointers to arrays used to store B matrices.


Array of size group_count. For the group i, ldbi = ldb_array[i] specifies the leading dimension of the array storing matrix B as declared in the calling (sub)program.



transbi=CblasTrans or transbi=CblasConjTrans

Layout = CblasColMajor

ldbi must be at least max(1, ki).

ldbi must be at least max(1, ni).

Layout = CblasRowMajor

ldbi must be at least max(1, ni).

ldbi must be at least max(1, ki).


Array of size group_count. For the group i, beta_array[i] specifies the scalar betai.

When betai is equal to zero, then C matrices in group i need not be set on input.


Array, size total_batch_count, of pointers to arrays used to store C matrices.


Array of size group_count. For the group i, ldci = ldc_array[i] specifies the leading dimension of all arrays storing matrix C in group i as declared in the calling (sub)program.

When Layout = CblasColMajorldci must be at least max(1, mi).

When Layout = CblasRowMajorldci must be at least max(1, ni).


Specifies the number of groups. Must be at least 0.


Array of size group_count. The element group_size[i] specifies the number of matrices in group i. Each element in group_size must be at least 0.

Output Parameters


Output buffer, overwritten by total_batch_count matrix multiply operations of the form alpha*op(A)*op(B) + beta*C.