Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 10/31/2024
Public
Document Table of Contents

p?laqr1

Sets a scalar multiple of the first column of the product of a 2-by-2 or 3-by-3 matrix and specified shifts.

Syntax

call pslaqr1( wantt, wantz, n, ilo, ihi, a, desca, wr, wi, iloz, ihiz, z, descz, work, lwork, iwork, ilwork, info )

call pdlaqr1( wantt, wantz, n, ilo, ihi, a, desca, wr, wi, iloz, ihiz, z, descz, work, lwork, iwork, ilwork, info )

Description

p?laqr1 is an auxiliary routine used to find the Schur decomposition and/or eigenvalues of a matrix already in Hessenberg form from columns ilo to ihi.

This is a modified version of p?lahqr from ScaLAPACK version 1.7.3. The following modifications were made:

  • Workspace query functionality was added.

  • Aggressive early deflation is implemented.

  • Aggressive deflation (looking for two consecutive small subdiagonal elements by PSLACONSB) is abandoned.

  • The returned Schur form is now in canonical form, i.e., the returned 2-by-2 blocks really correspond to complex conjugate pairs of eigenvalues.

  • For some reason, the original version of p?lahqr sometimes did not read out the converged eigenvalues correctly. This is now fixed.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Input Parameters

wantt

(global ) LOGICAL

= .TRUE. : the full Schur form T is required;

= .FALSE.: only eigenvalues are required.

wantz

(global ) LOGICAL

= .TRUE. : the matrix of Schur vectors Z is required;

= .FALSE.: Schur vectors are not required.

n

(global ) LOGICAL

The order of the Hessenberg matrix A (and Z if wantz). n 0.

ilo, ihi

(global ) INTEGER

It is assumed that the matrix A is already upper quasi-triangular in rows and columns ihi+1:n, and that A(ilo,ilo-1) = 0 (unless ilo = 1). p?laqr1 works primarily with the Hessenberg submatrix in rows and columns ilo to ihi, but applies transformations to all of H if wantt is .TRUE..

1 ilo max(1,ihi); ihin.

a

REAL for pslaqr1

DOUBLE PRECISION for pdlaqr1

(global ) array of size (lld_a,LOCc(n))

On entry, the upper Hessenberg matrix A.

desca

(global and local ) INTEGER array of size dlen_.

The array descriptor for the distributed matrix A.

iloz, ihiz

(global ) INTEGER

Specify the rows of the matrix Z to which transformations must be applied if wantz is .TRUE..

1 ilozilo; ihiihizn.

z

REAL for pslaqr1

DOUBLE PRECISION for pdlaqr1

(global ) array of size (lld_z,LOCc(n)).

If wantz is .TRUE., on entry z must contain the current matrix Z of transformations accumulated by p?hseqr

If wantz is .FALSE., z is not referenced.

descz

(global and local ) INTEGER array of size dlen_.

The array descriptor for the distributed matrix Z.

work

REAL for pslaqr1

DOUBLE PRECISION for pdlaqr1

(local output) array of size lwork

lwork

(local ) INTEGER

The size of the work array (lwork>=1).

If lwork=-1, then a workspace query is assumed.

iwork

(global and local ) INTEGER array of size ilwork

This holds the some of the IBLK integer arrays.

ilwork

(local ) INTEGER

The size of the iwork array (ilwork 3 ).

OUTPUT Parameters

a

If wantt is .TRUE., the matrix A is upper quasi-triangular in rows and columns ilo:ihi, with any 2-by-2 or larger diagonal blocks not yet in standard form. If wanttis .FALSE., the contents of a are unspecified on exit.

wr, wi

REAL for pslaqr1

DOUBLE PRECISION for pdlaqr1

(global replicated ) array of size n

The real and imaginary parts, respectively, of the computed eigenvalues ilo to ihi are stored in the corresponding elements of wr and wi. If two eigenvalues are computed as a complex conjugate pair, they are stored in consecutive elements of wr and wi, say the i-th and (i+1)th, with wi(i) > 0 and wi(i+1) < 0. If wantt is .TRUE., the eigenvalues are stored in the same order as on the diagonal of the Schur form returned in a. a may be returned with larger diagonal blocks until the next release.

z

On exit z is updated; transformations are applied only to the submatrix Z(iloz:ihiz,ilo:ihi).

If wantzis .FALSE., z is not referenced.

work(1)

On exit, if info = 0, work(1) returns the optimal lwork.

info

(global ) INTEGER

< 0: parameter number -info incorrect or inconsistent

= 0: successful exit

> 0: p?laqr1 failed to compute all the eigenvalues ilo to ihi in a total of 30*(ihi-ilo+1) iterations; if info = i, elements i+1:ihi of wr and wi contain those eigenvalues which have been successfully computed.

Application Notes

This algorithm is very similar to p?ahqr. Unlike p?lahqr, instead of sending one double shift through the largest unreduced submatrix, this algorithm sends multiple double shifts and spaces them apart so that there can be parallelism across several processor row/columns. Another critical difference is that this algorithm aggregrates multiple transforms together in order to apply them in a block fashion.

Current Notes and/or Restrictions:

  • This code requires the distributed block size to be square and at least six (6); unlike simpler codes like LU, this algorithm is extremely sensitive to block size. Unwise choices of too small a block size can lead to bad performance.

  • This code requires a and z to be distributed identically and have identical contxts.

  • This release currently does not have a routine for resolving the Schur blocks into regular 2x2 form after this code is completed. Because of this, a significant performance impact is required while the deflation is done by sometimes a single column of processors.

  • This code does not currently block the initial transforms so that none of the rows or columns for any bulge are completed until all are started. To offset pipeline start-up it is recommended that at least 2*LCM(NPROW,NPCOL) bulges are used (if possible)

  • The maximum number of bulges currently supported is fixed at 32. In future versions this will be limited only by the incoming work array.

  • The matrix A must be in upper Hessenberg form. If elements below the subdiagonal are nonzero, the resulting transforms may be nonsimilar. This is also true with the LAPACK routine.

  • For this release, it is assumed rsrc_=csrc_=0

  • Currently, all the eigenvalues are distributed to all the nodes. Future releases will probably distribute the eigenvalues by the column partitioning.

  • The internals of this routine are subject to change.

See Also