Visible to Intel only — GUID: GUID-F3085921-6B03-4911-8F2C-BB27911A6169
Visible to Intel only — GUID: GUID-F3085921-6B03-4911-8F2C-BB27911A6169
cblas_?gemm3m_batch_strided
Computes groups of matrix-matrix product with general matrices.
Syntax
void cblas_cgemm3m_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
void cblas_zgemm3m_batch_strided (const CBLAS_LAYOUT layout, const CBLAS_TRANSPOSE transa, const CBLAS_TRANSPOSE transb, const MKL_INT m, const MKL_INT n, const MKL_INT k, const void *alpha, const void *a, const MKL_INT lda, const MKL_INT stridea, const void *b, const MKL_INT ldb, const MKL_INT strideb, const void *beta, void *c, const MKL_INT ldc, const MKL_INT stridec, const MKL_INT batch_size);
Include Files
- mkl.h
Description
The cblas_?gemm3m_batch_strided routines perform a series of matrix-matrix operations with general matrices. They are similar to the cblas_?gemm routine counterparts, but the cblas_?gemm3m_batch_strided routines perform matrix-matrix operations with groups of matrices. The groups contain matrices with the same parameters.
All matrix a (respectively, b or c) have the same parameters (size, leading dimension, transpose operation, alpha, beta scaling) and are stored at constant stridea (respectively, strideb or stridec) from each other. The operation is defined as
For i = 0 … batch_size – 1 Ai, Bi and Ci are matrices at offset i * stridea, i * strideb and i * stridec in a, b and c Ci = alpha * Ai * Bi + beta * Ci end for
The cblas_?gemm3m_batch_strided routines use fewer matrix multiplications than the cblas_?gemm routines, as described in the Application Notes below.
Input Parameters
- layout
-
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
- transa
-
Specifies op(A) the transposition operation applied to the matrices A.
if transa = CblasNoTrans, then op(A) = A;
if transa = CblasTrans, then op(A) = AT;
if transa = CblasConjTrans, then op(A) = AH.
- transb
-
Specifies op(B) the transposition operation applied to the matrices B.
if transb = CblasNoTrans, then op(B) = B;
if transb = CblasTrans, then op(B) = BT;
if transb = CblasConjTrans, then op(B) = BH.
- m
-
Number of rows of the op(A) and C matrices. Must be at least 0.
- n
-
Number of columns of the op(B) and C matrices. Must be at least 0.
- k
-
Number of columns of the op(A) matrix and number of rows of the op(B) matrix. Must be at least 0.
- alpha
-
Specifies the scalar alpha.
- a
-
Array of size at least stridea*batch_size holding the a matrices.
transa=CblasNoTrans
transa=CblasTrans or CblasConjTrans
layout = CblasColMajor
Before entry, the leading m-by-k part of the array a + i * stridea must contain the matrix Ai.
Before entry, the leading k-by-m part of the array a + i * stridea must contain the matrix Ai.
layout = CblasRowMajor
Before entry, the leading k-by-m part of the array a + i * stridea must contain the matrix Ai.
Before entry, the leading m-by-k part of the array a + i * stridea must contain the matrix Ai.
- lda
-
Specifies the leading dimension of the a matrices.
transa=CblasNoTrans
transa=CblasTrans or CblasConjTrans
layout = CblasColMajor
lda must be at least max(1,m).
lda must be at least max(1,k).
layout = CblasRowMajor
lda must be at least max(1,k).
lda must be at least max(1,m).
- stridea
-
Stride between two consecutive a matrices.
transa=CblasNoTrans
transa=CblasTrans or CblasConjTrans
layout = CblasColMajor
Must be at least lda*k.
Must be at least lda*m.
layout = CblasRowMajor
Must be at least lda*m.
Must be at least lda*k.
- b
-
Array of size at least strideb*batch_size holding the b matrices.
transb=CblasNoTrans
transb=CblasTrans or CblasConjTrans
layout = CblasColMajor
Before entry, the leading k-by-n part of the array b + i * strideb must contain the matrix Bi.
Before entry, the leading n-by-k part of the array b + i * strideb must contain the matrix Bi.
layout = CblasRowMajor
Before entry, the leading n-by-k part of the array b + i * strideb must contain the matrix Bi.
Before entry, the leading k-by-n part of the array b + i * strideb must contain the matrix Bi.
- ldb
-
Specifies the leading dimension of the b matrices.
transab=CblasNoTrans
transb=CblasTrans or CblasConjTrans
layout = CblasColMajor
ldb must be at least max(1,k).
ldb must be at least max(1,n).
layout = CblasRowMajor
ldb must be at least max(1,n).
ldb must be at least max(1,k).
- strideb
-
Stride between two consecutive b matrices.
transa=CblasNoTrans
transa=CblasTrans or CblasConjTrans
layout = CblasColMajor
Must be at least ldb*n.
Must be at least ldb*k.
layout = CblasRowMajor
Must be at least ldb*k.
Must be at least ldb*n.
- beta
-
Specifies the scalar beta.
- c
-
Array of size at least stridec*batch_size holding the c matrices.
If layout=CblasColMajor, before entry, the leading m-by-n part of the array c + i * stridec must contain the matrix Ci.
If layout=CblasRowMajor, before entry, the leading n-by-m part of the array c + i * stridec must contain the matrix Ci.
- ldc
-
Specifies the leading dimension of the c matrices.
Must be at least max(1,m) if layout=CblasColMajor or max(1,n) if layout=CblasRowMajor.
- stridec
-
Specifies the stride between two consecutive c matrices.
Must be at least ldc*nif layout=CblasColMajor or ldc*m if layout=CblasRowMajor.
- batch_size
-
Number of gemm computations to perform and a, b and c matrices. Must be at least 0.
Output Parameters
- c
-
Array holding the batch_size updated c matrices.
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)=fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2), ║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).
Thus the corresponding matrix multiplications are stable.