Visible to Intel only — GUID: GUID-9F5CC279-96B4-4E60-B09F-40693E9916EE
Visible to Intel only — GUID: GUID-9F5CC279-96B4-4E60-B09F-40693E9916EE
omatadd_batch
Computes a group of out-of-place scaled matrix additions using general matrices.
Description
The omatadd_batch routines perform a series of out-of-place scaled matrix additions. They are similar to the omatadd routines, but the omatadd_batch routines perform matrix operations with a group of matrices.
omatadd_batch supports the following precisions:
T |
---|
float |
double |
std::complex<float> |
std::complex<double> |
omatadd_batch (Buffer Version)
Strided API
The matrices are always in a strided format for omatadd_batch. The operation is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stride_a in a
B is a matrix at offset i * stride_b in b
C is a matrix at offset i * stride_c in c
C = alpha * op(A) + beta * op(B)
end for
where:
op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')
alpha and beta are scalars
A, B and C are matrices
The input buffers a and b contain all the input matrices, and the single output buffer c contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the batch_size parameter.
In general, the a, b, and c buffers must not overlap in memory, with the exception of the following in-place operations:
a and c may point to the same memory if op(A) is non-transpose and all the A matrices within a have the same parameters as all the respective C matrices within c;
b and c may point to the same memory if op(B) is non-transpose and all the B matrices within b have the same parameters as all the respective C matrices within c.
Syntax
namespace oneapi::mkl::blas::column_major {
void omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
sycl::buffer<T, 1> &b,
std::int64_t ldb,
std::int64_t stride_b,
sycl::buffer<T, 1> &c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size);
}
namespace oneapi::mkl::blas::row_major {
void omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
T alpha,
sycl::buffer<T, 1> &a,
std::int64_t lda,
std::int64_t stride_a,
T beta,
sycl::buffer<T, 1> &b,
std::int64_t ldb,
std::int64_t stride_b,
sycl::buffer<T, 1> &c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size);
}
Input Parameters
- queue
-
The queue where the routine should be executed.
- transa
-
Specifies op(A), the transposition operation applied to the matrices A.
- transb
-
Specifies op(B), the transposition operation applied to the matrices B.
- m
-
Number of rows for the result matrix C. Must be at least zero.
- n
-
Number of columns for the result matrix C. Must be at least zero.
- alpha
-
Scaling factor for the matrices A.
- a
-
Buffer holding the input matrices A. If alpha is zero, a is never accessed. Otherwise it must have size at least stride_a*batch_size.
- lda
-
Leading dimension of the A matrices. Must be positive and satisfy:
transa = transpose::nontrans
transa = transpose::trans or transa = transpose::conjtrans
Column major
Must be at least m
Must be at least n
Row major
Must be at least n
Must be at least m
- stride_a
-
Stride between the different A matrices in the buffer a. Must be positive and satisfy:
transa = transpose::nontrans
transa = transpose::trans or transa = transpose::conjtrans
Column major
Must be at least lda*n
Must be at least lda*m
Row major
Must be at least lda*m
Must be at least lda*n
- beta
-
Scaling factor for the matrices B.
- b
-
Buffer holding the input matrices B. If beta is zero, b is never accessed. Otherwise it must have size at least stride_b*batch_size.
- ldb
-
Leading dimension of the B matrices. Must be positive and satisfy:
transb = transpose::nontrans
transb = transpose::trans or transb = transpose::conjtrans
Column major
Must be at least m
Must be at least n
Row major
Must be at least n
Must be at least m
- stride_b
-
Stride between the different B matrices in the buffer b. Must be positive and satisfy:
transb = transpose::nontrans
transb = transpose::trans or transb = transpose::conjtrans
Column major
Must be at least ldb*n
Must be at least ldb*m
Row major
Must be at least ldb*m
Must be at least ldb*n
- ldc
-
Leading dimension of the C matrices. If matrices are stored using column major layout, ldc must be at least m. If matrices are stored using row major layout, ldc must be at least n. Must be positive.
- stride_c
-
Stride between the different C matrices. If matrices are stored using column major layout, stride_c must be at least ldc*n. If matrices are stored using row major layout, stride_c must be at least ldc*m.
- batch_size
-
Specifies the number of input and output matrices to add. Must be at least zero.
- dependencies
-
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
- c
-
Output buffer, overwritten by batch_size matrix addition operations of the form alpha*op(A) + beta*op(B). Must have size at least stride_c*batch_size.
omatadd_batch (USM Version)
Strided API
The matrices are always in a strided format for omatadd. The operation is defined as:
for i = 0 … batch_size – 1
A is a matrix at offset i * stride_a in a
B is a matrix at offset i * stride_b in b
C is a matrix at offset i * stride_c in c
C = alpha * op(A) + beta * op(B)
end for
where:
op(X) is one of op(X) = X, op(X) = X', or op(X) = conjg(X')
alpha and beta are scalars
A, B and C are matrices
The input buffers a and b contain all the input matrices, and the single output buffer c contains all the output matrices. The locations of the individual matrices within the buffer or array are given by stride lengths, while the number of matrices is given by the batch_size parameter.
In general, the a, b, and c buffers must not overlap in memory, with the exception of the following in-place operations:
a and c may point to the same memory if op(A) is non-transpose and all the A matrices within a have the same parameters as all the respective C matrices within c;
b and c may point to the same memory if op(B) is non-transpose and all the B matrices within b have the same parameters as all the respective C matrices within c.
Syntax
namespace oneapi::mkl::blas::column_major {
sycl::event omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
oneapi::mkl::value_or_pointer<T> alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
oneapi::mkl::value_or_pointer<T> beta,
const T *b,
std::int64_t ldb,
std::int64_t stride_b,
T *c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
namespace oneapi::mkl::blas::row_major {
sycl::event omatadd_batch(sycl::queue &queue,
transpose transa,
transpose transb,
std::int64_t m,
std::int64_t n,
oneapi::mkl::value_or_pointer<T> alpha,
const T *a,
std::int64_t lda,
std::int64_t stride_a,
oneapi::mkl::value_or_pointer<T> beta,
const T *b,
std::int64_t ldb,
std::int64_t stride_b,
T *c,
std::int64_t ldc,
std::int64_t stride_c,
std::int64_t batch_size,
const std::vector<sycl::event> &dependencies = {});
}
Input Parameters
- queue
-
The queue where the routine should be executed.
- transa
-
Specifies op(A), the transposition operation applied to the matrices A.
- transb
-
Specifies op(B), the transposition operation applied to the matrices B.
- m
-
Number of rows for the result matrix C. Must be at least zero.
- n
-
Number of columns for the result matrix C. Must be at least zero.
- alpha
-
Scaling factor for the matrices A. See Scalar Arguments for more information on the value_or_pointer data type.
- a
-
Array holding the input matrices A. If alpha is zero, a is never accessed and may be a null pointer. Otherwise it must have size at least stride_a*batch_size.
- lda
-
Leading dimension of the A matrices. Must be positive and satisfy:
transa = transpose::nontrans
transa = transpose::trans or transa = transpose::conjtrans
Column major
Must be at least m
Must be at least n
Row major
Must be at least n
Must be at least m
- stride_a
-
Stride between the different A matrices in the array a. Must be positive and satisfy:
transa = transpose::nontrans
transa = transpose::trans or transa = transpose::conjtrans
Column major
Must be at least lda*n
Must be at least lda*m
Row major
Must be at least lda*m
Must be at least lda*n
- beta
-
Scaling factor for the matrices B. See Scalar Arguments for more information on the value_or_pointer data type.
- b
-
Array holding the input matrices B. If beta is zero, b is never accessed and may be a null pointer. Otherwise it must have size at least stride_b*batch_size.
- ldb
-
Leading dimension of the B matrices. Must be positive and satisfy:
transb = transpose::nontrans
transb = transpose::trans or transb = transpose::conjtrans
Column major
Must be at least m
Must be at least n
Row major
Must be at least n
Must be at least m
- stride_b
-
Stride between the different B matrices in the array b. Must be positive and satisfy:
transb = transpose::nontrans
transb = transpose::trans or transb = transpose::conjtrans
Column major
Must be at least ldb*n
Must be at least ldb*m
Row major
Must be at least ldb*m
Must be at least ldb*n
- ldc
-
Leading dimension of the C matrices. If matrices are stored using column major layout, ldc must be at least m. If matrices are stored using row major layout, ldc must be at least n. Must be positive.
- stride_c
-
Stride between the different C matrices. If matrices are stored using column major layout, stride_c must be at least ldc*n. If matrices are stored using row major layout, stride_c must be at least ldc*m.
- batch_size
-
Specifies the number of input and output matrices to add. Must be at least zero.
- dependencies
-
List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.
Output Parameters
- c
-
Output array, overwritten by batch_size matrix addition operations of the form alpha*op(A) + beta*op(B). Must have size at least stride_c*batch_size.
Return Values
Output event to wait on to ensure computation is complete.