omatadd_batch

oneMKL - Data Parallel C++ Developer Reference

Download PDF

ID 772045

Date 3/31/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-12F42D1B-025E-4829-AB57-EC5C84868E03

View Details

omatadd_batch

Computes a group of out-of-place scaled matrix additions using general matrices.

Description

The omatadd_batch routines perform a series of out-of-place scaled matrix additions. They are similar to the omatadd routines, but the omatadd_batch routines perform matrix operations with a group of matrices.

omatadd_batch supports the following precisions:

T
`float`
`double`
`std::complex<float>`
`std::complex<double>`

omatadd_batch (Buffer Version)

Strided API

The matrices are always in a strided format for omatadd_batch. The operation is defined as:

for i = 0 … batch_size – 1
    A is a matrix at offset i * stride_a in a
    B is a matrix at offset i * stride_b in b
    C is a matrix at offset i * stride_c in c
    C = alpha * op(A) + beta * op(B)
end for

where:

op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')
alpha and beta are scalars
A, B and C are matrices

The input buffers a and b contain all the input matrices, and the single output buffer c contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the batch_size parameter.

In general, the a, b, and c buffers must not overlap in memory, with the exception of the following in-place operations:

a and c may point to the same memory if op(A) is non-transpose and all the A matrices within a have the same parameters as all the respective C matrices within c;
b and c may point to the same memory if op(B) is non-transpose and all the B matrices within b have the same parameters as all the respective C matrices within c.

Syntax

namespace oneapi::mkl::blas::column_major {
    void omatadd_batch(sycl::queue &queue,
                       transpose transa,
                       transpose transb,
                       std::int64_t m,
                       std::int64_t n,
                       T alpha,
                       sycl::buffer<T, 1> &a,
                       std::int64_t lda,
                       std::int64_t stride_a,
                       T beta,
                       sycl::buffer<T, 1> &b,
                       std::int64_t ldb,
                       std::int64_t stride_b,
                       sycl::buffer<T, 1> &c,
                       std::int64_t ldc,
                       std::int64_t stride_c,
                       std::int64_t batch_size);
}

namespace oneapi::mkl::blas::row_major {
    void omatadd_batch(sycl::queue &queue,
                       transpose transa,
                       transpose transb,
                       std::int64_t m,
                       std::int64_t n,
                       T alpha,
                       sycl::buffer<T, 1> &a,
                       std::int64_t lda,
                       std::int64_t stride_a,
                       T beta,
                       sycl::buffer<T, 1> &b,
                       std::int64_t ldb,
                       std::int64_t stride_b,
                       sycl::buffer<T, 1> &c,
                       std::int64_t ldc,
                       std::int64_t stride_c,
                       std::int64_t batch_size);
}

Input Parameters

queue

The queue where the routine should be executed.

transa

Specifies op(A), the transposition operation applied to the matrices A.

transb

Specifies op(B), the transposition operation applied to the matrices B.

m

Number of rows for the result matrix C. Must be at least zero.

n

Number of columns for the result matrix C. Must be at least zero.

alpha

Scaling factor for the matrices A.

a

Buffer holding the input matrices A. Must have size at least stride_a*batch_size.

lda

Leading dimension of the A matrices. Must be positive and satisfy:

	`transa` = `transpose::nontrans`	`transa` = `transpose::trans` or `transa` = `transpose::conjtrans`
Column major	Must be at least `m`	Must be at least `n`
Row major	Must be at least `n`	Must be at least `m`

stride_a

Stride between the different A matrices in the buffer a. Must be positive and satisfy:

	`transa` = `transpose::nontrans`	`transa` = `transpose::trans` or `transa` = `transpose::conjtrans`
Column major	Must be at least `lda*n`	Must be at least `lda*m`
Row major	Must be at least `lda*m`	Must be at least `lda*n`

beta

Scaling factor for the matrices B.

b

Buffer holding the input matrices B. Must have size at least stride_b*batch_size.

ldb

Leading dimension of the B matrices. Must be positive and satisfy:

	`transb` = `transpose::nontrans`	`transb` = `transpose::trans` or `transb` = `transpose::conjtrans`
Column major	Must be at least `m`	Must be at least `n`
Row major	Must be at least `n`	Must be at least `m`

stride_b

Stride between the different B matrices in the buffer b. Must be positive and satisfy:

	`transb` = `transpose::nontrans`	`transb` = `transpose::trans` or `transb` = `transpose::conjtrans`
Column major	Must be at least `ldb*n`	Must be at least `ldb*m`
Row major	Must be at least `ldb*m`	Must be at least `ldb*n`

ldc

Leading dimension of the C matrices. If matrices are stored using column major layout, ldc must be at least m. If matrices are stored using row major layout, ldc must be at least n. Must be positive.

stride_c

Stride between the different C matrices. If matrices are stored using column major layout, stride_c must be at least ldc*n. If matrices are stored using row major layout, stride_c must be at least ldc*m.

batch_size

Specifies the number of input and output matrices to add. Must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

c: Output buffer, overwritten by batch_size matrix addition operations of the form alpha*op(A) + beta*op(B). Must have size at least stride_c*batch_size.

omatadd_batch (USM Version)

Strided API

The matrices are always in a strided format for omatadd. The operation is defined as:

for i = 0 … batch_size – 1
    A is a matrix at offset i * stride_a in a
    B is a matrix at offset i * stride_b in b
    C is a matrix at offset i * stride_c in c
    C = alpha * op(A) + beta * op(B)
end for

where:

op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')
alpha and beta are scalars
A, B and C are matrices

In general, the a, b, and c buffers must not overlap in memory, with the exception of the following in-place operations:

a and c may point to the same memory if op(A) is non-transpose and all the A matrices within a have the same parameters as all the respective C matrices within c;
b and c may point to the same memory if op(B) is non-transpose and all the B matrices within b have the same parameters as all the respective C matrices within c.

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event omatadd_batch(sycl::queue &queue,
                              transpose transa,
                              transpose transb,
                              std::int64_t m,
                              std::int64_t n,
                              T alpha,
                              const T *a,
                              std::int64_t lda,
                              std::int64_t stride_a,
                              T beta,
                              const T *b,
                              std::int64_t ldb,
                              std::int64_t stride_b,
                              T *c,
                              std::int64_t ldc,
                              std::int64_t stride_c,
                              std::int64_t batch_size,
                              const std::vector<sycl::event> &dependencies = {});
}

namespace oneapi::mkl::blas::row_major {
    sycl::event omatadd_batch(sycl::queue &queue,
                              transpose transa,
                              transpose transb,
                              std::int64_t m,
                              std::int64_t n,
                              T alpha,
                              const T *a,
                              std::int64_t lda,
                              std::int64_t stride_a,
                              T beta,
                              const T *b,
                              std::int64_t ldb,
                              std::int64_t stride_b,
                              T *c,
                              std::int64_t ldc,
                              std::int64_t stride_c,
                              std::int64_t batch_size,
                              const std::vector<sycl::event> &dependencies = {});
}

Input Parameters

queue

The queue where the routine should be executed.

transa

Specifies op(A), the transposition operation applied to the matrices A.

transb

Specifies op(B), the transposition operation applied to the matrices B.

m

Number of rows for the result matrix C. Must be at least zero.

n

Number of columns for the result matrix C. Must be at least zero.

alpha

Scaling factor for the matrices A.

a

Array holding the input matrices A. Must have size at least stride_a*batch_size.

lda

Leading dimension of the A matrices. Must be positive and satisfy:

	`transa` = `transpose::nontrans`	`transa` = `transpose::trans` or `transa` = `transpose::conjtrans`
Column major	Must be at least `m`	Must be at least `n`
Row major	Must be at least `n`	Must be at least `m`

stride_a

Stride between the different A matrices in the array a. Must be positive and satisfy:

	`transa` = `transpose::nontrans`	`transa` = `transpose::trans` or `transa` = `transpose::conjtrans`
Column major	Must be at least `lda*n`	Must be at least `lda*m`
Row major	Must be at least `lda*m`	Must be at least `lda*n`

beta

Scaling factor for the matrices B.

b

Array holding the input matrices B. Must have size at least stride_b*batch_size.

ldb

Leading dimension of the B matrices. Must be positive and satisfy:

	`transb` = `transpose::nontrans`	`transb` = `transpose::trans` or `transb` = `transpose::conjtrans`
Column major	Must be at least `m`	Must be at least `n`
Row major	Must be at least `n`	Must be at least `m`

stride_b

Stride between the different B matrices in the array b. Must be positive and satisfy:

	`transb` = `transpose::nontrans`	`transb` = `transpose::trans` or `transb` = `transpose::conjtrans`
Column major	Must be at least `ldb*n`	Must be at least `ldb*m`
Row major	Must be at least `ldb*m`	Must be at least `ldb*n`

ldc

stride_c

batch_size

Specifies the number of input and output matrices to add. Must be at least zero.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Output Parameters

c: Output array, overwritten by batch_size matrix addition operations of the form alpha*op(A) + beta*op(B). Must have size at least stride_c*batch_size.

Return Values

Output event to wait on to ensure computation is complete.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

oneMKL - Data Parallel C++ Developer Reference

omatadd_batch

Description