Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

p?pbtrsv

Solves a single triangular linear system via frontsolve or backsolve where the triangular matrix is a factor of a banded matrix computed by p?pbtrf.

Syntax

call pspbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work, lwork, info)

call pdpbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work, lwork, info)

call pcpbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work, lwork, info)

call pzpbtrsv(uplo, trans, n, bw, nrhs, a, ja, desca, b, ib, descb, af, laf, work, lwork, info)

Description

The p?pbtrsvroutine solves a banded triangular system of linear equations

A(1:n, ja:ja+n-1)*X = B(jb:jb+n-1, 1:nrhs)

or

A(1:n, ja:ja+n-1)T*X = B(jb:jb+n-1, 1:nrhs) for real flavors,

A(1:n, ja:ja+n-1)H*X = B(jb:jb+n-1, 1:nrhs) for complex flavors,

where A(1:n, ja:ja+n-1) is a banded triangular matrix factor produced by the Cholesky factorization code p?pbtrf and is stored in A(1:n, ja:ja+n-1) and af. The matrix stored in A(1:n, ja:ja+n-1) is either upper or lower triangular according to uplo.

The routine p?pbtrf must be called first.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Input Parameters

uplo

(global) CHARACTER. Must be 'U' or 'L'.

If uplo = 'U', upper triangle of A(1:n, ja:ja+n-1) is stored;

If uplo = 'L', lower triangle of A(1:n, ja:ja+n-1) is stored.

trans

(global) CHARACTER. Must be 'N' or 'T' or 'C'.

If trans = 'N', solve with A(1:n, ja:ja+n-1);

If trans = 'T' or 'C' for real flavors, solve with A(1:n, ja:ja+n-1)T.

If trans = 'C' for complex flavors, solve with conjugate transpose (A(1:n, ja:ja+n-1)H.

n

(global) INTEGER.

The number of rows and columns to be operated on, that is, the order of the distributed submatrix A(1:n, ja:ja+n-1). n 0.

bw

(global) INTEGER.

The number of subdiagonals in 'L' or 'U', 0 bwn-1.

nrhs

(global) INTEGER.

The number of right hand sides; the number of columns of the distributed submatrix B(jb:jb+n-1, 1:nrhs); nrhs 0.

a

(local)

REAL for pspbtrsv

DOUBLE PRECISION for pdpbtrsv

COMPLEX for pcpbtrsv

COMPLEX*16 for pzpbtrsv.

Pointer into the local memory to an array with the first size lld_a (bw+1), stored in desca.

On entry, this array contains the local pieces of the n-by-n symmetric banded distributed Cholesky factor L or LT*A(1:n, ja:ja+n-1).

This local portion is stored in the packed banded format used in LAPACK. See the Application Notes below and the ScaLAPACK manual for more detail on the format of distributed matrices.

ja

(global) INTEGER. The index in the global in the global matrix A that points to the start of the matrix to be operated on (which may be either all of A or a submatrix of A).

desca

(global and local) INTEGER array of size dlen_. The array descriptor for the distributed matrix A.

If 1D type (dtype_a = 501), then dlen 7;

If 2D type (dtype_a = 1), then dlen 9.

Contains information on mapping of A to memory. (See ScaLAPACK manual for full description and options.)

b

(local)

REAL for pspbtrsv

DOUBLE PRECISION for pdpbtrsv

COMPLEX for pcpbtrsv

COMPLEX*16 for pzpbtrsv.

Pointer into the local memory to an array of local lead size lld_bnb.

On entry, this array contains the local pieces of the right hand sides B(jb:jb+n-1, 1:nrhs).

ib

(global) INTEGER. The row index in the global matrix B that points to the first row of the matrix to be operated on (which may be either all of B or a submatrix of B).

descb

(global and local) INTEGER array of size dlen_. The array descriptor for the distributed matrix B.

If 1D type (dtype_b = 502), then dlen 7;

If 2D type (dtype_b = 1), then dlen 9.

Contains information on mapping of B to memory. Please, see ScaLAPACK manual for full description and options.

laf

(local)

INTEGER. The size of user-input auxiliary fill-in space af. Must be laf (nb+2*bw)*bw . If laf is not large enough, an error code will be returned and the minimum acceptable size will be returned in af(1).

work

(local)

REAL for pspbtrsv

DOUBLE PRECISION for pdpbtrsv

COMPLEX for pcpbtrsv

COMPLEX*16 for pzpbtrsv.

The array work is a temporary workspace array of size lwork. This space may be overwritten in between calls to routines.

lwork

(local or global) INTEGER. The size of the user-input workspace work, must be at least lworkbw*nrhs. If lwork is too small, the minimal acceptable size will be returned in work(1) and an error code is returned.

Output Parameters

af

(local)

REAL for pspbtrsv

DOUBLE PRECISION for pdpbtrsv

COMPLEX for pcpbtrsv

COMPLEX*16 for pzpbtrsv.

The array af is of size laf. It contains auxiliary fill-in space. The fill-in space is created in a call to the factorization routine p?pbtrf and is stored in af. If a linear system is to be solved using p?pbtrs after the factorization routine, af must not be altered after the factorization.

b

On exit, this array contains the local piece of the solutions distributed matrix X.

work(1)

On exit, work(1) contains the minimum value of lwork.

info

(local) INTEGER.

= 0: successful exit

< 0: if the i-th argument is an array and the j-th entry had an illegal value,

then info = - (i*100 +j),

if the i-th argument is a scalar and had an illegal value,

then info = -i.

Application Notes

If the factorization routine and the solve routine are to be called separately to solve various sets of right-hand sides using the same coefficient matrix, the auxiliary space af must not be altered between calls to the factorization routine and the solve routine.

The best algorithm for solving banded and tridiagonal linear systems depends on a variety of parameters, especially the bandwidth. Currently, only algorithms designed for the case N/P>>bw are implemented. These algorithms go by many names, including Divide and Conquer, Partitioning, domain decomposition-type, etc.


The Divide and Conquer algorithm assumes the matrix is narrowly banded compared with the number of equations. In this situation, it is best to distribute the input matrix A one-dimensionally, with columns atomic and rows divided amongst the processes. The basic algorithm divides the banded matrix up into P pieces with one stored on each processor, and then proceeds in 2 phases for the factorization or 3 for the solution of a linear system.

  1. Local Phase: The individual pieces are factored independently and in parallel. These factors are applied to the matrix creating fill-in, which is stored in a non-inspectable way in auxiliary space af. Mathematically, this is equivalent to reordering the matrix A as PAPT and then factoring the principal leading submatrix of size equal to the sum of the sizes of the matrices factored on each processor. The factors of these submatrices overwrite the corresponding parts of A in memory.

  2. Reduced System Phase: A small (bw*(P-1)) system is formed representing interaction of the larger blocks and is stored (as are its factors) in the space af. A parallel Block Cyclic Reduction algorithm is used. For a linear system, a parallel front solve followed by an analogous backsolve, both using the structure of the factored matrix, are performed.

  3. Back Subsitution Phase: For a linear system, a local backsubstitution is performed on each processor in parallel.

See Also