Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

mkl_?omatadd_batch_strided

Computes a group of out-of-place scaled matrix additions using general matrices.

void mkl_somatadd_batch_strided(char ordering, char transa, char transb, size_t rows, size_t cols, float alpha, const float * A, size_t lda, size_t stridea, float beta, const float * B, size_t ldb, size_t strideb, float * C, size_t ldc, size_t stridec, size_t batch_size);

void mkl_domatadd_batch_strided(char ordering, char transa, char transb, size_t rows, size_t cols, double alpha, const double * A, size_t lda, size_t stridea, double beta, const double * B, size_t ldb, size_t strideb, double * C, size_t ldc, size_t stridec, size_t batch_size);

void mkl_comatadd_batch_strided(char ordering, char transa, char transb, size_t rows, size_t cols, MKL_Complex8 alpha, const MKL_Complex8 * A, size_t lda, size_t stridea, MKL_Complex8 beta, const MKL_Complex8 * B, size_t ldb, size_t strideb, MKL_Complex8 * C, size_t ldc, size_t stridec, size_t batch_size);

void mkl_zomatadd_batch_strided(char ordering, char transa, char transb, size_t rows, size_t cols, MKL_Complex16 alpha, const MKL_Complex16 * A, size_t lda, size_t stridea, MKL_Complex16 beta, const MKL_Complex16 * B, size_t ldb, size_t strideb, MKL_Complex16 * C, size_t ldc, size_t stridec, size_t batch_size);

Description

The mkl_omatadd_batch_strided routines perform a series of scaled matrix additions. They are similar to the mkl_omatadd routines, but the mkl_omatadd_batch_strided routines perform matrix operations with a group of matrices.

The matrices A, B, and C are stored at a constant stride from each other in memory, given by the parameters stridea, strideb, and stridec. The operation is defined as:

for i = 0 … batch_size – 1
    A is a matrix at offset i * stridea in the array a
    B is a matrix at offset i * strideb in the array b
    C is a matrix at offset i * stridec in the array c
    C = alpha * op(A) + beta * op(B)
end for

where:

  • op(X) is one of op(X) = X, op(X) = X', op(X) = conjg(X) or op(X) = conjg(X').
  • alpha and beta are scalars.
  • A, B, and C are matrices.

The input arrays a and b contain all the input matrices, and the single output array c contains all the output matrices. The locations of the individual matrices within the array are given by stride lengths, while the number of matrices is given by the batch_size parameter.

In general, the a, b, and c arrays must not overlap in memory, with the exception of the following in-place operations:

  • a and c can point to the same memory if transa is non-transpose and all the A matrices within a have the same parameters as all the respective C matrices within c.

  • b and c can point to the same memory if transb is non-transpose and all the B matrices within b have the same parameters as all the respective C matrices within c.

Input Parameters

layout
Specifies whether two-dimensional array storage is row-major (CblasRowMajor) or column-major (CblasColMajor).
transa
Specifies op(A), the transposition operation applied to the matrices A. 'N' or 'n' indicates no operation, 'T' or 't' is transposition, 'R' or 'r' is complex conjugation wtihout tranpsosition, and 'C' or 'c' is conjugate transposition.
transb
Specifies op(B), the transposition operation applied to the matrices B.
rows
Number of rows for the result matrix C. Must be at least zero.
cols
Number of columns for the result matrix C. Must be at least zero.
alpha
Scaling factor for the matrices A.
a
Array holding the input matrices A. Must have size at least stride_a*batch_size.
lda
Leading dimension of the A matrices. If matrices are stored using column major layout, lda must be at least rows if A is not transposed or cols if A is transposed. If matrices are stored using row major layout, lda must be at least cols if A is not transposed or at least rows if A is transposed. Must be positive.
stride_a
Stride between the different A matrices. If matrices are stored using column major layout, stride_a must be at least lda*rows if A is not transposed or at least lda*cols if A is transposed. If matrices are stored using row major layout, stride_a must be at least lda*rows if B is not transposed or at least lda*cols if A is transposed.
beta
Scaling factor for the matrices B.
b
Array holding the input matrices B. Must have size at least stride_b*batch_size.
ldb
Leading dimension of the B matrices. If matrices are stored using column major layout, ldb must be at least rows if B is not transposed or cols if B is transposed. If matrices are stored using row major layout, ldb must be at least cols if B is not transposed or at least rows if B is transposed. Must be positive.
stride_b
Stride between the different B matrices. If matrices are stored using column major layout, stride_b must be at least ldb*cols if B is not transposed or at least ldb*rows if B is transposed. If matrices are stored using row major layout, stride_b must be at least ldb*rows if B is not transposed or at least ldb*cols if B is transposed.
c
Output array, overwritten by batch_size matrix addition operations of the form alpha*op(A) + beta*op(B). Must have size at least stride_c*batch_size.
ldc
Leading dimension of the A matrices. If matrices are stored using column major layout, lda must be at least rows. If matrices are stored using row major layout, lda must be at least cols. Must be positive.
stride_c
Stride between the different C matrices. If matrices are stored using column major layout, stride_c must be at least ldc*cols. If matrices are stored using row major layout, stride_c must be at least ldc*rows.
batch_size
Specifies the number of input and output matrices to add.

Output Parameters

c
Array holding the updated matrices C.