Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 10/31/2024
Public
Document Table of Contents

?gemm3m_batch_strided

Computes groups of matrix-matrix product with general matrices.

Syntax

call cgemm3m_batch_strided(transa, transb, m, n, k, alpha, a, lda, stridea, b, ldb, strideb, beta, c, ldc, stridec, batch_size)

call zgemm3m_batch_strided(transa, transb, m, n, k, alpha, a, lda, stridea, b, ldb, strideb, beta, c, ldc, stridec, batch_size)

Include Files

  • mkl.fi

Description

The ?gemm3m_batch_strided routines perform a series of matrix-matrix operations with general matrices. They are similar to the ?gemm routine counterparts, but the ?gemm3m_batch_strided routines perform matrix-matrix operations with groups of matrices. The groups contain matrices with the same parameters.

All matrix a (respectively, b or c) have the same parameters (size, leading dimension, transpose operation, alpha, beta scaling) and are stored at constant stridea (respectively, strideb or stridec) from each other. The operation is defined as

For i = 0 … batch_size – 1
    Ai, Bi and Ci are matrices at offset i * stridea, i * strideb and i * stridec in a, b and c
    Ci = alpha * Ai * Bi +  beta * Ci
end for

The ?gemm3m_batch_strided routines use fewer matrix multiplications than the ?gemm routines, as described in the Application Notes below.

Input Parameters

transa

CHARACTER*1.

Specifies op(A) the transposition operation applied to the matrices A.

if transa = 'N' or 'n' , then op(A) = A;

if transa = 'T' or 't' , then op(A) = AT;

if transa = 'C' or 'c' , then op(A) = AH.

transb

CHARACTER*1.

Specifies op(B) the transposition operation applied to the matrices B.

if transb = 'N' or 'n' , then op(B) = B;

if transb = 'T' or 't' , then op(B) = BT;

if transb = 'C' or 'c' , then op(B) = BH.

m

INTEGER. Number of rows of the op(A) and C matrices. Must be at least 0.

n

INTEGER. Number of columns of the op(B) and C matrices. Must be at least 0.

k

INTEGER. Number of columns of the op(A) matrix and number of rows of the op(B) matrix. Must be at least 0.

alpha

COMPLEX for cgemm3m_batch_strided

DOUBLE COMPLEX for zgemm3m_batch_strided

Specifies the scalar alpha.

a

COMPLEX for cgemm3m_batch_strided

DOUBLE COMPLEX for zgemm3m_batch_strided

Array of size at least stridea*batch_size holding the a matrices.

 

transa='N' or 'n'

transa='T' or 't' or 'C' or 'c'

lda

INTEGER. Specifies the leading dimension of the a matrices.

 

transa='N' or 'n'

transa='T' or 't' or 'C' or 'c'

stridea

INTEGER. Stride between two consecutive a matrices.

 

transa='N' or 'n'

transa='T' or 't' or 'C' or 'c'

b

COMPLEX for cgemm3m_batch_strided

DOUBLE COMPLEX for zgemm3m_batch_strided

Array of size at least strideb*batch_size holding the b matrices.

 

transb='N' or 'n'

transb='T' or 't' or 'C' or 'c'

ldb

INTEGER. Specifies the leading dimension of the b matrices.

 

transab='N' or 'n'

transb='T' or 't' or 'C' or 'c'

strideb

INTEGER. Stride between two consecutive b matrices.

 

transa='N' or 'n'

transa='T' or 't' or 'C' or 'c'

beta

COMPLEX for cgemm3m_batch_strided

DOUBLE COMPLEX for zgemm3m_batch_strided

Specifies the scalar beta.

c

COMPLEX for cgemm3m_batch_strided

DOUBLE COMPLEX for zgemm3m_batch_strided

Array of size at least stridec*batch_size holding the c matrices.

ldc

INTEGER.

Specifies the leading dimension of the c matrices.

Must be at least max(1,m) .

stridec

INTEGER.

Specifies the stride between two consecutive c matrices.

Must be at least ldc*n .

batch_size

INTEGER.

Number of gemm computations to perform and a, b and c matrices. Must be at least 0.

Output Parameters

c

Array holding the batch_size updated c matrices.

Application Notes

These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.

If the errors in the floating point calculations satisfy the following conditions:

fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u

then for an n-by-n matrix Ĉ=fl(C1+iC2)=fl((A1+iA2)(B1+iB2))=Ĉ1+iĈ2, the following bounds are satisfied:

║Ĉ1-C1║≤ 2(n+1)u║A║∞║B║∞+O(u2),
║Ĉ2-C2║≤ 4(n+4)u║A║∞║B║∞+O(u2),

where ║A║∞=max(║A1║∞,║A2║∞), and ║B║∞=max(║B1║∞,║B2║∞).

Thus the corresponding matrix multiplications are stable.