trsm_batch

oneMKL - Data Parallel C++ Developer Reference

Download PDF

ID 772045

Date 3/31/2021

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-7DB37A20-4C5B-44F3-B7F3-4E71108E4908

View Details

trsm_batch

Computes a group of trsm operations.

Description

The trsm_batch routines are batched versions of trsm, performing multiple trsm operations in a single call. Each trsm solves an equation of the form op(A) * X = alpha * B or X * op(A) = alpha * B.

trsm_batch supports the following precisions:

T
`float`
`double`
`std::complex<float>`
`std::complex<double>`

trsm_batch (Buffer Version)

Buffer version of trsm_batch supports only strided API.

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    A and B are matrices at offset i * stridea and i * strideb in a and b.
    if (left_right == side::left) then
       compute X such that op(A) * X = alpha * B
    else
       compute X such that X * op(A) = alpha * B
    B = X
end for

where:

op(A) is one of op(A) = A, or op(A) = A^T, or op(A) = A^H
alpha is a scalar
A is either m x m or n x n triangular matrix
B and X are m x n general matrices

On return, matrix B is overwritten by solution matrix X.

For strided API, a and b buffers contains all the input matrices. The stride between matrices is given by the stride parameters. Total number of matrices in a and b buffers is given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    void trsm_batch(sycl::queue &queue,
                    oneapi::mkl::side left_right,
                    oneapi::mkl::uplo upper_lower,
                    oneapi::mkl::transpose trans,
                    oneapi::mkl::diag unit_diag,
                    std::int64_t m,
                    std::int64_t n,
                    T alpha,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &b,
                    std::int64_t ldb,
                    std::int64_t strideb,
                    std::int64_t batch_size,
                    compute_mode mode = compute_mode::unset)
}

namespace oneapi::mkl::blas::row_major {
    void trsm_batch(sycl::queue &queue,
                    oneapi::mkl::side left_right,
                    oneapi::mkl::uplo upper_lower,
                    oneapi::mkl::transpose trans,
                    oneapi::mkl::diag unit_diag,
                    std::int64_t m,
                    std::int64_t n,
                    T alpha,
                    sycl::buffer<T,1> &a,
                    std::int64_t lda,
                    std::int64_t stridea,
                    sycl::buffer<T,1> &b,
                    std::int64_t ldb,
                    std::int64_t strideb,
                    std::int64_t batch_size,
                    compute_mode mode = compute_mode::unset)
}

Input Parameters

queue: The queue where the routine should be executed.
left_right: Specifies whether matrices A are on the left side or right side of the multiplication. See Data Types for more details.
upper_lower: Specifies whether matrices A are upper or lower triangular. See Data Types for more details.
trans: Specifies op(A), transposition operation applied to matrices A. See Data Types for more details.
unit_diag: Specifies whether matrices A are unit triangular or not. See Data Types for more details.
m: Number of rows of matrices B. Must be at least zero.
n: Number of columns of matrices B. Must be at least zero.
alpha: Scaling factor for the solution.
a: Buffer holding input matricees A. Size of the buffer must be at least stridea * batch_size.
lda: Leading dimension of matrices A. Must be at least m if left_right = side::left or at least n if left_right = side::right. Must be positive.
stridea: Stride between two consecutive A matrices.
b: Buffer holding input/output matrices B. Size of the buffer must be at least strideb * batch_size.
ldb: Leading dimension of matrices B. Must be at least m if column major layout or at least n if row major layout is used. Must be positive.
strideb: Stride between two consecutive B matrices.
batch_size: Specifies number of triangular linear systems to solve.
mode: Optional. Compute mode settings. See Compute Modes for more details.

Output Parameters

b: Output buffer overwritten by batch_size solution matrices X.

NOTE:

If alpha = 0, matrices B are set to zero, and A and B do not need to be initialized before calling trsm_batch..

trsm_batch (USM Version)

USM version of trsm_batch supports group API and strided API.

Group API

Group API operation is defined as:

idx = 0
for i = 0 … group_count – 1
    for j = 0 … group_size – 1
        A and B are matrices in a[idx] and b[idx]
        if (left_right == side::left) then
            compute X such that op(A) * X = alpha[i] * B
        else
            compute X such that X * op(A) = alpha[i] * B
        end if
        B = X
        idx = idx + 1
    end for
end for

where:

op(A) is one of op(A) = A, or op(A) = A^T, or op(A) = A^H
alpha is a scalar
A is either m x m or n x n triangular matrix
B and X are m x n general matrices

On return, matrix B is overwritten by solution matrix X.

For group API, a and b arrays contain the pointers for all the input matrices. The total number of matrices in a and b are given by:

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event trsm_batch(sycl::queue &queue,
                           const oneapi::mkl::side *left_right,
                           const oneapi::mkl::uplo *upper_lower,
                           const oneapi::mkl::transpose *trans,
                           const oneapi::mkl::diag *unit_diag,
                           const std::int64_t *m,
                           const std::int64_t *n,
                           const T *alpha,
                           const T **a,
                           const std::int64_t *lda,
                           T **b,
                           const std::int64_t *ldb,
                           std::int64_t group_count,
                           const std::int64_t *group_size,
                           compute_mode mode = compute_mode::unset,
                           const std::vector<sycl::event> &dependencies = {})
}

namespace oneapi::mkl::blas::row_major {
    sycl::event trsm_batch(sycl::queue &queue,
                           const oneapi::mkl::side *left_right,
                           const oneapi::mkl::uplo *upper_lower,
                           const oneapi::mkl::transpose *trans,
                           const oneapi::mkl::diag *unit_diag,
                           const std::int64_t *m,
                           const std::int64_t *n,
                           const T *alpha,
                           const T **a,
                           const std::int64_t *lda,
                           T **b,
                           const std::int64_t *ldb,
                           std::int64_t group_count,
                           const std::int64_t *group_size,
                           compute_mode mode = compute_mode::unset,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Array of group_countoneapi::mkl::side values. left_right[i] specifies whether matrices A are on the left side or right side of the multiplication in group i. See Data Types for more details.

upper_lower

Array of group_countoneapi::mkl::uplo values. upper_lower[i] specifies whether matrices A are upper or lower triangular in group i. See Data Types for more details.

trans

Array of group_countoneapi::mkl::transpose values. trans[i] specifies op(A), transposition operation applied to matrices A in each group i. See Data Types for more details.

unit_diag

Array of group_countoneapi::mkl::diag values. unit_diag[i] specifies whether matrices A are unit triangular or not. See Data Types for more details.

m

Array of group_count integers. m[i] specifies number of rows of matrices B in group i. All entries must be at least zero.

n

Array of group_count integers. n[i] specifies number of columns of matrices B in group i. All entries must be at least zero.

alpha

Array of group_count scalar elements. alpha[i] specifies scaling factors for the solutions in group i.

a

Array of total_batch_count pointers for input matrices A. See Matrix Storage for more details.

lda

Array of group_count integers. lda[i] specifies leading dimension of matrices A in group i. Must be at least m[i] if left_right[i] = side::left or at least n[i] if left_right[i] = side::right. All entries must be positive.

b

Array of total_batch_count pointers for input/output matrices B. See Matrix Storage for more details.

ldb

Array of group_count integers. ldb[i] specifies leading dimension of matrices B in group i. Must be at least m[i] if column major layout or at least n[i] if row major layout is used. All entries must be positive.

group_count

Number of groups. Must be at least zero.

group_size

Array of group_count integers. group_size[i] specifies the number of trsm operations in group i. Each element in group_size must be at least zero.

mode

Optional. Compute mode settings. See Compute Modes for more details.

dependencies

Optional. List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

mode and dependencies may be omitted independently; it is not necessary to specify mode in order to provide dependencies.

Output Parameters

b: Array of pointers to output matrices B overwritten by total_batch_count solution matrices X.

NOTE:

If alpha = 0, matrices B are set to zero, and A and B do not need to be initialized before calling trsm_batch..

Return Values

Output event to wait on to ensure computation is complete.

Strided API

Strided API operation is defined as:

for i = 0 … batch_size – 1
    A and B are matrices at offset i * stridea and i * strideb in a and b.
    if (left_right == side::left) then
       compute X such that op(A) * X = alpha * B
    else
       compute X such that X * op(A) = alpha * B
    B = X
end for

where:

op(A) is one of op(A) = A, or op(A) = A^T, or op(A) = A^H
alpha is a scalar
A is either m x m or n x n triangular matrix
B and X are m x n general matrices

On return, matrix B is overwritten by solution matrix X.

For strided API, a and b arrays contain all the input matrices. The stride between matrices is given by the stride parameters. Total number of matrices in a and b arrays is given by batch_size parameter.

Syntax

namespace oneapi::mkl::blas::column_major {
    sycl::event trsm_batch(sycl::queue &queue,
                           oneapi::mkl::side left_right,
                           oneapi::mkl::uplo upper_lower,
                           oneapi::mkl::transpose trans,
                           oneapi::mkl::diag unit_diag,
                           std::int64_t m,
                           std::int64_t n,
                           T alpha,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           T *b,
                           std::int64_t ldb,
                           std::int64_t strideb,
                           std::int64_t batch_size,
                           compute_mode mode = compute_mode::unset,
                           const std::vector<sycl::event> &dependencies = {})
}

namespace oneapi::mkl::blas::row_major {
    sycl::event trsm_batch(sycl::queue &queue,
                           oneapi::mkl::side left_right,
                           oneapi::mkl::uplo upper_lower,
                           oneapi::mkl::transpose trans,
                           oneapi::mkl::diag unit_diag,
                           std::int64_t m,
                           std::int64_t n,
                           T alpha,
                           const T *a,
                           std::int64_t lda,
                           std::int64_t stridea,
                           T *b,
                           std::int64_t ldb,
                           std::int64_t strideb,
                           std::int64_t batch_size,
                           compute_mode mode = compute_mode::unset,
                           const std::vector<sycl::event> &dependencies = {})
}

Input Parameters

queue

The queue where the routine should be executed.

left_right

Specifies whether matrices A are on the left side or right side of the multiplication. See Data Types for more details.

upper_lower

Specifies whether matrices A are upper or lower triangular. See Data Types for more details.

trans

Specifies op(A), transposition operation applied to matrices A. See Data Types for more details.

unit_diag

Specifies whether matrices A are unit triangular or not. See Data Types for more details.

m

Number of rows of matrices B. Must be at least zero.

n

Number of columns of matrices B. Must be at least zero.

alpha

Scaling factor for the solution.

a

Pointer to input matricees A. Size of the array must be at least stridea * batch_size.

lda

Leading dimension of matrices A. Must be at least m if left_right = side::left or at least n if left_right = side::right. Must be positive.

stridea

Stride between two consecutive A matrices.

b

Pointer to input/output matrices B. Size of the array must be at least strideb * batch_size.

ldb

Leading dimension of matrices B. Must be at least m if column major layout or at least n if row major layout is used. Must be positive.

strideb

Stride between two consecutive B matrices.

batch_size

Specifies number of triangular linear systems to solve.

mode

Optional. Compute mode settings. See Compute Modes for more details.

dependencies

Optional. List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

mode and dependencies may be omitted independently; it is not necessary to specify mode in order to provide dependencies.

Output Parameters

b: Pointer to output matrix B overwritten by batch_size solution matrices X.

NOTE:

If alpha = 0, matrices B are set to zero, and A and B do not need to be initialized before calling trsm_batch..

Return Values

Output event to wait on to ensure computation is complete.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

oneMKL - Data Parallel C++ Developer Reference

trsm_batch

Description