Visible to Intel only — GUID: GUID-F3085921-6B03-4911-8F2C-BB27911A6169
Visible to Intel only — GUID: GUID-F3085921-6B03-4911-8F2C-BB27911A6169
?gemm3m_batch_strided
Computes groups of matrix-matrix product with general matrices.
Syntax
call cgemm3m_batch_strided(transa, transb, m, n, k, alpha, a, lda, stridea, b, ldb, strideb, beta, c, ldc, stridec, batch_size)
call zgemm3m_batch_strided(transa, transb, m, n, k, alpha, a, lda, stridea, b, ldb, strideb, beta, c, ldc, stridec, batch_size)
Include Files
- mkl.fi
Description
The ?gemm3m_batch_strided routines perform a series of matrix-matrix operations with general matrices. They are similar to the ?gemm routine counterparts, but the ?gemm3m_batch_strided routines perform matrix-matrix operations with groups of matrices. The groups contain matrices with the same parameters.
All matrix a (respectively, b or c) have the same parameters (size, leading dimension, transpose operation, alpha, beta scaling) and are stored at constant stridea (respectively, strideb or stridec) from each other. The operation is defined as
For i = 0 … batch_size – 1 Ai, Bi and Ci are matrices at offset i * stridea, i * strideb and i * stridec in a, b and c Ci = alpha * Ai * Bi + beta * Ci end for
The ?gemm3m_batch_strided routines use fewer matrix multiplications than the ?gemm routines, as described in the Application Notes below.
Input Parameters
- transa
-
CHARACTER*1.
Specifies op(A) the transposition operation applied to the matrices A.
if transa = 'N' or 'n' , then op(A) = A;
if transa = 'T' or 't' , then op(A) = AT;
if transa = 'C' or 'c' , then op(A) = AH.
- transb
-
CHARACTER*1.
Specifies op(B) the transposition operation applied to the matrices B.
if transb = 'N' or 'n' , then op(B) = B;
if transb = 'T' or 't' , then op(B) = BT;
if transb = 'C' or 'c' , then op(B) = BH.
- m
-
INTEGER. Number of rows of the op(A) and C matrices. Must be at least 0.
- n
-
INTEGER. Number of columns of the op(B) and C matrices. Must be at least 0.
- k
-
INTEGER. Number of columns of the op(A) matrix and number of rows of the op(B) matrix. Must be at least 0.
- alpha
-
COMPLEX for cgemm3m_batch_strided
DOUBLE COMPLEX for zgemm3m_batch_strided
Specifies the scalar alpha.
- a
-
COMPLEX for cgemm3m_batch_strided
DOUBLE COMPLEX for zgemm3m_batch_strided
Array of size at least stridea*batch_size holding the a matrices.
transa='N' or 'n'
transa='T' or 't' or 'C' or 'c'
- lda
-
INTEGER. Specifies the leading dimension of the a matrices.
transa='N' or 'n'
transa='T' or 't' or 'C' or 'c'
- stridea
-
INTEGER. Stride between two consecutive a matrices.
transa='N' or 'n'
transa='T' or 't' or 'C' or 'c'
- b
-
COMPLEX for cgemm3m_batch_strided
DOUBLE COMPLEX for zgemm3m_batch_strided
Array of size at least strideb*batch_size holding the b matrices.
transb='N' or 'n'
transb='T' or 't' or 'C' or 'c'
- ldb
-
INTEGER. Specifies the leading dimension of the b matrices.
transab='N' or 'n'
transb='T' or 't' or 'C' or 'c'
- strideb
-
INTEGER. Stride between two consecutive b matrices.
transa='N' or 'n'
transa='T' or 't' or 'C' or 'c'
- beta
-
COMPLEX for cgemm3m_batch_strided
DOUBLE COMPLEX for zgemm3m_batch_strided
Specifies the scalar beta.
- c
-
COMPLEX for cgemm3m_batch_strided
DOUBLE COMPLEX for zgemm3m_batch_strided
Array of size at least stridec*batch_size holding the c matrices.
- ldc
-
INTEGER.
Specifies the leading dimension of the c matrices.
Must be at least max(1,m) .
- stridec
-
INTEGER.
Specifies the stride between two consecutive c matrices.
Must be at least ldc*n .
- batch_size
-
INTEGER.
Number of gemm computations to perform and a, b and c matrices. Must be at least 0.
Output Parameters
- c
-
Array holding the batch_size updated c matrices.
Application Notes
These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.
If the errors in the floating point calculations satisfy the following conditions:
fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u
then for an n-by-n matrix Ĉ=fl(C1+iC2)=fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are satisfied:
║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2), ║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),
where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).
Thus the corresponding matrix multiplications are stable.