?gemm3m_batch

Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

Download PDF

ID 766686

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

?gemm3m_batch

Computes scalar-matrix-matrix products and adds the results to scalar matrix products for groups of general matrices.

Syntax

call cgemm3m_batch(transa_array, transb_array, m_array, n_array, k_array, alpha_array, a_array, lda_array, b_array, ldb_array, beta_array, c_array, ldc_array, group_count, group_size)

call zgemm3m_batch(transa_array, transb_array, m_array, n_array, k_array, alpha_array, a_array, lda_array, b_array, ldb_array, beta_array, c_array, ldc_array, group_count, group_size)

call cgemm3m_batch(a_array, b_array, c_array, m_array, n_array, k_array, group_size [,transa_array][,transb_array] [,alpha_array][,beta_array])

call zgemm3m_batch(a_array, b_array, c_array, m_array, n_array, k_array, group_size [,transa_array][,transb_array] [,alpha_array][,beta_array])

Include Files

mkl.fi, blas.f90

Description

The ?gemm3m_batch routines perform a series of matrix-matrix operations with general matrices. They are similar to the ?gemm3m routine counterparts, but the ?gemm3m_batch routines perform matrix-matrix operations with groups of matrices, processing a number of groups at once. The groups contain matrices with the same parameters. The ?gemm3m_batch routines use fewer matrix multiplications than the ?gemm_batch routines, as described in the Application Notes.

The operation is defined as

idx = 1
for i = 1..group_count 
     alpha and beta in alpha_array(i) and beta_array(i)
     for j = 1..group_size(i) 
          A, B, and C matrix in a_array(idx), b_array(idx), and c_array(idx)
          C := alpha*op(A)*op(B) + beta*C,
          idx = idx + 1
     end for
 end for

where:

op(X) is one of op(X) = X, or op(X) = X^T, or op(X) = X^H,

alpha and beta are scalar elements of alpha_array and beta_array,

A, B and C are matrices such that for m, n, and k which are elements of m_array, n_array, and k_array:

op(A) is an m-by-k matrix,

op(B) is a k-by-n matrix,

C is an m-by-n matrix.

A, B, and C represent matrices stored at addresses pointed to by a_array, b_array, and c_array, respectively. The number of entries in a_array, b_array, and c_array is total_batch_count = the sum of all the group_size entries.

See also gemm for a detailed description of multiplication for general matrices and gemm_batch, BLAS-like extension routines for similar matrix-matrix operations.

NOTE:

Error checking is not performed for Intel® oneAPI Math Kernel Library (oneMKL) Windows* single dynamic libraries for the?gemm3m_batch routines.

Input Parameters

transa_array

CHARACTER*1. Array of size group_count. For the group i, transa_i = transa_array(i) specifies the form of op(A) used in the matrix multiplication:

if transa_i = 'N' or 'n', then op(A) = A;

if transa_i = 'T' or 't', then op(A) = A^T;

if transa_i = 'C' or 'c', then op(A) = A^H.

transb_array

CHARACTER*1. Array of size group_count. For the group i, transb_i = transb_array(i) specifies the form of op(B_i) used in the matrix multiplication:

if transb_i = 'N' or 'n', then op(B) = B;

if transb_i = 'T' or 't', then op(B) = B^T;

if transb_i = 'C' or 'c', then op(B) = B^H.

m_array

INTEGER. Array of size group_count. For the group i, m_i = m_array(i) specifies the number of rows of the matrix op(A) and of the matrix C.

The value of each element of m_array must be at least zero.

n_array

INTEGER. Array of size group_count. For the group i, n_i = n_array(i) specifies the number of columns of the matrix op(B) and the number of columns of the matrix C.

The value of each element of n_array must be at least zero.

k_array

INTEGER. Array of size group_count. For the group i, k_i = k_array(i) specifies the number of columns of the matrix op(A) and the number of rows of the matrix op(B).

The value of each element of k_array must be at least zero.

alpha_array

COMPLEX for cgemm3m_batch

DOUBLE COMPLEX for zgemm3m_batch

Array of size group_count. For the group i, alpha_array(i) specifies the scalar alpha_i.

a_array

INTEGER*8 for Intel^® 64 architecture

INTEGER*4 for IA-32 architecture

Array, size total_batch_count, of pointers to arrays used to store A matrices.

lda_array

INTEGER. Array of size group_count. For the group i, lda_i = lda_array(i) specifies the leading dimension of the array storing matrix A as declared in the calling (sub)program.

When transa_i = 'N' or 'n', then lda_i must be at least max(1, m_i), otherwise lda_i must be at least max(1, k_i).

b_array

INTEGER*8 for Intel^® 64 architecture

INTEGER*4 for IA-32 architecture

Array, size total_batch_count, of pointers to arrays used to store B matrices.

ldb_array

INTEGER.

Array of size group_count. For the group i, ldb_i = ldb_array(i) specifies the leading dimension of the array storing matrix B as declared in the calling (sub)program.

When transb_i = 'N' or 'n', then ldb_i must be at least max(1, k_i), otherwise ldb_i must be at least max(1, n_i).

beta_array

COMPLEX for cgemm3m_batch

DOUBLE COMPLEX for zgemm3m_batch

For the group i, beta_array(i) specifies the scalar beta_i.

When beta_i is equal to zero, then C matrices in group i need not be set on input.

c_array

INTEGER*8 for Intel^® 64 architecture

INTEGER*4 for IA-32 architecture

Array, size total_batch_count, of pointers to arrays used to store C matrices.

ldc_array

INTEGER.

Array of size group_count. For the group i, ldc_i = ldc_array(i) specifies the leading dimension of all arrays storing matrix C in group i as declared in the calling (sub)program.

ldc_i must be at least max(1, m_i).

group_count

INTEGER.

Specifies the number of groups. Must be at least 0.

group_size

INTEGER.

Array of size group_count. The element group_size(i) specifies the number of matrices in group i. Each element in group_size must be at least 0.

Output Parameters

c_array: Overwritten by the m_i-by-n_i matrix (alpha_i*op(A)*op(B) + beta_i*C) for group i.

BLAS 95 Interface Notes

Routines in Fortran 95 interface have fewer arguments in the calling sequence than their FORTRAN 77 counterparts. For general conventions applied to skip redundant or reconstructible arguments, see BLAS 95 Interface Conventions.

Specific details for the routine gemm3m_batch interface are the following:

a_array

Holds pointers to arrays containing matrices A of size (ma,ka) where

ka = k if transa='N',

ka = m otherwise,

ma = m if transa='N',

ma = k otherwise.

b_array

Holds pointers to arrays containing matrices B of size (mb,kb) where

kb = n if transb_array = 'N',

kb = k otherwise,

mb = k if transb_array = 'N',

mb = n otherwise.

c_array

Holds pointers to arrays containing matrices C of size (m,n).

m_array

Array indicating number of rows of matrices op(A) and C for each group.

n_array

Array indicating number of columns of matrices op(B) and C for each group.

k_array

Array indicating number of columns of matrices op(A) and number of rows of matrices op(B) for each group.

group_size

Array indicating number of matrices for each group. Each element in group_size must be at least 0.

transa_array

Array with each element set to one of 'N', 'C', or 'T'.

The default values are 'N'.

transb_array

Array with each element set to one of 'N', 'C', or 'T'.

The default values are 'N'.

alpha_array

Array of alpha values; the default value is 1.

beta_array

Array of beta values; the default value is 0.

Application Notes

These routines perform a complex matrix multiplication by forming the real and imaginary parts of the input matrices. This uses three real matrix multiplications and five real matrix additions instead of the conventional four real matrix multiplications and two real matrix additions. The use of three real matrix multiplications reduces the time spent in matrix operations by 25%, resulting in significant savings in compute time for large matrices.

If the errors in the floating point calculations satisfy the following conditions:

fl(x op y)=(x op y)(1+δ),|δ|≤u, op=×,/, fl(x±y)=x(1+α)±y(1+β), |α|,|β|≤u

then for an n-by-n matrix Ĉ=fl(C₁+iC₂)= fl((A₁+iA₂)(B₁+iB₂))=Ĉ₁+iĈ₂, the following bounds are satisfied:

║Ĉ₁-C₁║≤ 2(n+1)u║A║_∞║B║_∞+O(u²),

║Ĉ₂-C₂║≤ 4(n+4)u║A║_∞║B║_∞+O(u²),

where ║A║_∞=max(║A₁║_∞,║A₂║_∞), and ║B║_∞=max(║B₁║_∞,║B₂║_∞).

Thus the corresponding matrix multiplications are stable.

Parent topic: BLAS-like Extensions

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

?gemm3m_batch

Syntax

Include Files

Description

Input Parameters

Output Parameters

BLAS 95 Interface Notes

Application Notes