Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

p?gesvd

Computes the singular value decomposition of a general matrix, optionally computing the left and/or right singular vectors.

Syntax

void psgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , float *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *s , float *u , MKL_INT *iu , MKL_INT *ju , MKL_INT *descu , float *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , float *work , MKL_INT *lwork , float *rwork , MKL_INT *info );

void pdgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , double *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *s , double *u , MKL_INT *iu , MKL_INT *ju , MKL_INT *descu , double *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , double *work , MKL_INT *lwork , double *rwork , MKL_INT *info );

void pcgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex8 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , float *s , MKL_Complex8 *u , MKL_INT *iu , MKL_INT *ju , MKL_INT *descu , MKL_Complex8 *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , MKL_Complex8 *work , MKL_INT *lwork , float *rwork , MKL_INT *info );

void pzgesvd (char *jobu , char *jobvt , MKL_INT *m , MKL_INT *n , MKL_Complex16 *a , MKL_INT *ia , MKL_INT *ja , MKL_INT *desca , double *s , MKL_Complex16 *u , MKL_INT *iu , MKL_INT *ju , MKL_INT *descu , MKL_Complex16 *vt , MKL_INT *ivt , MKL_INT *jvt , MKL_INT *descvt , MKL_Complex16 *work , MKL_INT *lwork , double *rwork , MKL_INT *info );

Include Files

  • mkl_scalapack.h

Description

The p?gesvd function computes the singular value decomposition (SVD) of an m-by-n matrix A, optionally computing the left and/or right singular vectors. The SVD is written

A = U*Σ*VT,

where Σ is an m-by-n matrix that is zero except for its min(m, n) diagonal elements, U is an m-by-m orthogonal matrix, and V is an n-by-n orthogonal matrix. The diagonal elements of Σ are the singular values of A and the columns of U and V are the corresponding right and left singular vectors, respectively. The singular values are returned in array s in decreasing order and only the first min(m,n) columns of U and rows of vt = VT are computed.

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

NOTE:

The distributed submatrix sub(A) must verify certain alignment properties. These expressions must be true:

  • mb_a = nb_a = nb
  • iroffa = icoffa
where:
  • iroffa = mod(ia-1, nb )
  • icoffa = mod(ja-1, nb )

Input Parameters

mp = number of local rows in A and U

nq = number of local columns in A and VT

size = min(m, n)

sizeq = number of local columns in U

sizep = number of local rows in VT

jobu

(global) Specifies options for computing all or part of the matrix U.

If jobu = 'V', the first size columns of U (the left singular vectors) are returned in the array u;

If jobu ='N', no columns of U (no left singular vectors)are computed.

jobvt

(global)

Specifies options for computing all or part of the matrix VT.

If jobvt = 'V', the first size rows of VT (the right singular vectors) are returned in the array vt;

If jobvt = 'N', no rows of VT(no right singular vectors) are computed.

m

(global) The number of rows of the matrix A(m 0).

n

(global) The number of columns in A(n 0).

a

(local).

Block cyclic array, global size (m, n), local size (mp, nq).

ia, ja

(global) The row and column indices in the global matrix A indicating the first row and the first column of the submatrix A, respectively.

desca

(global and local) array of size dlen_. The array descriptor for the distributed matrix A.

iu, ju

(global) The row and column indices in the global matrix U indicating the first row and the first column of the submatrix U, respectively.

descu

(global and local) array of size dlen_. The array descriptor for the distributed matrix U.

ivt, jvt

(global) The row and column indices in the global matrix VT indicating the first row and the first column of the submatrix VT, respectively.

descvt

(global and local) array of size dlen_. The array descriptor for the distributed matrix VT.

work

(local).

Workspace array of size lwork

lwork

(local) The size of the array work;

lwork > 2 + 6*sizeb + max(watobd, wbdtosvd),

where sizeb = max(m, n), and watobd and wbdtosvd refer, respectively, to the workspace required to bidiagonalize the matrix A and to go from the bidiagonal matrix to the singular value decomposition USVT.

For watobd, the following holds:

watobd = max(max(wp?lange,wp?gebrd), max(wp?lared2d, wp?lared1d)),

where wp?lange, wp?lared1d, wp?lared2d, wp?gebrd are the workspaces required respectively for the subprograms p?lange, p?lared1d, p?lared2d, p?gebrd. Using the standard notation

mp = numroc(m, mb, MYROW, desca[ctxt_ - 1], NPROW),

nq = numroc(n, nb, MYCOL, desca[lld_ - 1], NPCOL),

the workspaces required for the above subprograms are

wp?lange = mp,

wp?lared1d = nq0,

wp?lared2d = mp0,

wp?gebrd = nb*(mp + nq + 1) + nq,

where nq0 and mp0 refer, respectively, to the values obtained at MYCOL = 0 and MYROW = 0. In general, the upper limit for the workspace is given by a workspace required on processor (0,0):

watobdnb*(mp0 + nq0 + 1) + nq0.

In case of a homogeneous process grid this upper limit can be used as an estimate of the minimum workspace for every processor.

For wbdtosvd, the following holds:

wbdtosvd = size*(wantu*nru + wantvt*ncvt) + max(w?bdsqr, max(wantu*wp?ormbrqln, wantvt*wp?ormbrprt)),

where

wantu(wantvt) = 1, if left/right singular vectors are wanted, and wantu(wantvt) = 0, otherwise. w?bdsqr, wp?ormbrqln, and wp?ormbrprt refer respectively to the workspace required for the subprograms ?bdsqr, p?ormbr(qln), and p?ormbr(prt), where qln and prt are the values of the arguments vect, side, and trans in the call to p?ormbr. nru is equal to the local number of rows of the matrix U when distributed 1-dimensional "column" of processes. Analogously, ncvt is equal to the local number of columns of the matrix VT when distributed across 1-dimensional "row" of processes. Calling the LAPACK procedure ?bdsqr requires

w?bdsqr = max(1, 2*size + (2*size - 4)* max(wantu, wantvt))

on every processor. Finally,

wp?ormbrqln = max((nb*(nb-1))/2, (sizeq+mp)*nb)+nb*nb,

wp?ormbrprt = max((mb*(mb-1))/2, (sizep+nq)*mb)+mb*mb,

If lwork = -1, then lwork is global input and a workspace query is assumed; the function only calculates the minimum size for the work array. The required workspace is returned as the first element of work and no error message is issued by pxerbla.

rwork

Workspace array of size 1 + 4*sizeb. Not used for psgesvd and pdgesvd.

Output Parameters

a

On exit, the contents of a are destroyed.

s

(global).

Array of size size.

Contains the singular values of A sorted so that s(i) s(i+1).

u

(local).

local size mp*sizeq, global size m*size)

If jobu = 'V', u contains the first min(m, n) columns of U.

If jobu = 'N' or 'O', u is not referenced.

vt

(local).

local size (sizep, nq), global size (size, n)

If jobvt = 'V', vt contains the first size rows of VTif jobu = 'N', vt is not referenced.

work

On exit, if info = 0, then work[0] returns the required minimal size of lwork.

rwork

On exit, if info = 0, then rwork[0] returns the required size of rwork.

info

(global)

If info = 0, the execution is successful.

If info < 0, If info = -i, the ith parameter had an illegal value.

If info > 0 i, then if ?bdsqr did not converge,

If info = min(m,n) + 1, then p?gesvd has detected heterogeneity by finding that eigenvalues were not identical across the process grid. In this case, the accuracy of the results from p?gesvd cannot be guaranteed.

See Also