Visible to Intel only — GUID: GUID-35A4D7F4-D6DF-4057-967D-406B5073354B
Visible to Intel only — GUID: GUID-35A4D7F4-D6DF-4057-967D-406B5073354B
cluster_sparse_solver iparm Parameter
The following table describes all individual components of the Parallel Direct Sparse Solver for Clusters Interface iparm parameter. Components which are not used must be initialized with 0. Default values are denoted with an asterisk (*).
Component | Description | |
---|---|---|
iparm(1) input |
Use default values. |
|
0 | iparm(2) - iparm(64) are filled with default values. | |
!=0 | You must supply all values in components iparm(2) - iparm(64). | |
iparm(2) input |
Fill-in reducing ordering for the input matrix. |
|
2* | The nested dissection algorithm from the METIS package [Karypis98]. | |
3 | The parallel version of the nested dissection algorithm. It can decrease the time of computations on multi-core computers, especially when Phase 1 takes significant time. | |
10 | The MPI version of the nested dissection and symbolic factorization algorithms for the matrix in distributed assembled matrix input format (iparm(40) > 0) . The input matrix for the reordering must be distributed among different MPI processes without any intersection and all MPI ranks must have at least one row of the input matrix. Use iparm(41) and iparm(42) to set the bounds of the domain. During all of Phase 1, the entire matrix is not gathered on any one process, which can decrease computation time (especially when Phase 1 takes significant time) and decrease memory usage for each MPI process on the cluster.
NOTE:
Distributed reordering does not work if any of matching(iparm(13)=1)/scaling(iparm(11)=1)/BSR format(iparm(37)>1)/Schur complement matrix computation control(iparm(36)>0)/Partial solve(iparm(31) > 0) is turned on, or if the distributed input matrix has overlapping distribution of rows across MPI processes.
NOTE:
If you set iparm(2) = 10, comm = -1 (MPI communicator), and if there is one MPI process, optimization and full parallelization with the OpenMP version of the nested dissection and symbolic factorization algorithms proceeds. This can decrease computation time on multi-core computers. In this case, set iparm(41) = 1 and iparm(42) = n for one-based indexing, or to 0 and n - 1, respectively, for zero-based indexing. |
|
iparm(3) | Reserved. Set to zero. |
|
iparm(4) | Reserved. Set to zero. |
|
iparm(5) input |
User permutation. This parameter controls whether user supplied fill-in reducing permutation is used instead of the integrated multiple-minimum degree or nested dissection algorithms. Another use of this parameter is to control obtaining the fill-in reducing permutation vector calculated during the reordering stage of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO. This option is useful for testing reordering algorithms, adapting the code to special applications problems (for instance, to move zero diagonal elements to the end of P*A*PT), or for using the permutation vector more than once for matrices with identical sparsity structures. For definition of the permutation, see the description of the perm parameter. |
|
0 | User permutation in the perm array is ignored. | |
1 | Intel® oneAPI Math Kernel Library (oneMKL) PARDISO uses the user supplied fill-in reducing permutation from theperm array. iparm(2) is ignored. | |
2 | Intel® oneAPI Math Kernel Library (oneMKL) PARDISO returns the permutation vector computed at phase 1 in theperm array. | |
iparm(6) input |
Write solution on x.
NOTE:
The array x is always used. |
|
0* | The array x contains the solution; right-hand side vector b is kept unchanged. |
|
1 | The solver stores the solution on the right-hand side b. |
|
iparm(7) output |
Number of iterative refinement steps performed. Reports the number of iterative refinement steps that were actually performed during the solve step. |
|
iparm(8) input |
Iterative refinement step. On entry to the solve and iterative refinement step, iparm(8) must be set to the maximum number of iterative refinement steps that the solver performs. |
|
0* | The solver automatically performs two steps of iterative refinement when perturbed pivots are obtained during the numerical factorization. |
|
>0 | Maximum number of iterative refinement steps that the solver performs. The solver performs not more than the absolute value of iparm(8) steps of iterative refinement. The solver might stop the process before the maximum number of steps if
The number of executed iterations is reported in iparm(7). |
|
<0 | Same as above, but the accumulation of the residuum uses extended precision real and complex data types. Perturbed pivots result in iterative refinement (independent of iparm(8)=0) and the number of executed iterations is reported in iparm(7). |
|
iparm(9) | Reserved. Set to zero. |
|
iparm(10) input |
Pivoting perturbation. This parameter instructs Parallel Direct Sparse Solver for Clusters Interface how to handle small pivots or zero pivots for nonsymmetric matrices (mtype =11 or mtype =13) and symmetric matrices (mtype =-2, mtype =-4, or mtype =6). For these matrices the solver uses a complete supernode pivoting approach. When the factorization algorithm reaches a point where it cannot factor the supernodes with this pivoting strategy, it uses a pivoting perturbation strategy similar to [Li99], [Schenk04]. Small pivots are perturbed with eps = 10-iparm(10). The magnitude of the potential pivot is tested against a constant threshold of alpha = eps*||A2||inf, where eps = 10(-iparm(10)), A2 = P*PMPS*Dr*A*Dc*P, and ||A2||inf is the infinity norm of the scaled and permuted matrix A. Any tiny pivots encountered during elimination are set to the sign (lII)*eps*||A2||inf, which trades off some numerical stability for the ability to keep pivots from getting too small. Small pivots are therefore perturbed with eps = 10(-iparm(10)). |
|
13* | The default value for nonsymmetric matrices(mtype =11, mtype=13), eps = 10-13. |
|
8* | The default value for symmetric indefinite matrices (mtype =-2, mtype=-4, mtype=6), eps = 10-8. |
|
iparm(11) input |
Scaling vectors. Parallel Direct Sparse Solver for Clusters Interface uses a maximum weight matching algorithm to permute large elements on the diagonal and to scale. Use iparm(11) = 1 (scaling) and iparm(13) = 1 (matching) for highly indefinite symmetric matrices, for example, from interior point optimizations or saddle point problems. Note that in the analysis phase (phase=11) you must provide the numerical values of the matrix A in array a in case of scaling and symmetric weighted matching. |
|
0* | Disable scaling. Default for symmetric indefinite matrices. |
|
1* | Enable scaling. Default for nonsymmetric matrices. Scale the matrix so that the diagonal elements are equal to 1 and the absolute values of the off-diagonal entries are less or equal to 1. This scaling method is applied to nonsymmetric matrices (mtype = 11, mtype = 13). The scaling can also be used for symmetric indefinite matrices (mtype = -2, mtype = -4, mtype = 6) when the symmetric weighted matchings are applied (iparm(13) = 1). Note that in the analysis phase (phase=11) you must provide the numerical values of the matrix A in case of scaling. |
|
iparm(12) | Solve with transposed or conjugate transposed matrix A.
NOTE:
For real matrices, the terms transposed and conjugate transposed are equivalent. |
|
0* | Solve a linear system AX = B. |
|
1 | Solve a conjugate transposed system AHX = B based on the factorization of the matrix A. |
|
2 | Solve a transposed system ATX = B based on the factorization of the matrix A. |
|
iparm(13) input |
Improved accuracy using (non-) symmetric weighted matching. Parallel Direct Sparse Solver for Clusters Interface can use a maximum weighted matching algorithm to permute large elements close the diagonal. This strategy adds an additional level of reliability to the factorization methods and complements the alternative of using more complete pivoting techniques during the numerical factorization.
|
|
0* | Disable matching. Default for symmetric indefinite matrices. |
|
1* | Enable matching. Default for nonsymmetric matrices. Maximum weighted matching algorithm to permute large elements close to the diagonal. It is recommended to use iparm(11) = 1 (scaling) and iparm(13)= 1 (matching) for highly indefinite symmetric matrices, for example from interior point optimizations or saddle point problems. Note that in the analysis phase (phase=11) you must provide the numerical values of the matrix A in case of symmetric weighted matching. |
|
iparm(14) output |
Number of perturbed pivots. After factorization, contains the number of perturbed pivots for the matrix types: 1, 3, 11, 13, -2, -4 and 6. |
|
iparm(15) output |
Peak memory on symbolic factorization. The total peak memory in kilobytes that the solver needs during the analysis and symbolic factorization phase. This value is only computed in phase 1. |
|
iparm(16) output |
Permanent memory on symbolic factorization. Permanent memory from the analysis and symbolic factorization phase in kilobytes that the solver needs in the factorization and solve phases. This value is only computed in phase 1. |
|
iparm(17) output |
Size of factors/Peak memory on numerical factorization and solution. This parameter provides the size in kilobytes of the total memory consumed by in-core Intel® oneAPI Math Kernel Library (oneMKL) PARDISO for internal floating point arrays. This parameter is computed in phase 1. Seeiparm(63) for the OOC mode. The total peak memory consumed by Intel® oneAPI Math Kernel Library (oneMKL) PARDISO ismax(iparm(15), iparm(16)+iparm(17)) |
|
iparm(18) input/output |
Report the number of non-zero elements in the factors. |
|
<0 | Enable reporting if iparm(18) < 0 on entry. The default value is -1. |
|
>=0 | Disable reporting. |
|
iparm(19) - iparm(20) | Reserved. Set to zero. |
|
iparm(21) input |
Pivoting for symmetric indefinite matrices. |
|
0 | Apply 1x1 diagonal pivoting during the factorization process. |
|
1* | Apply 1x1 and 2x2 Bunch-Kaufman pivoting during the factorization process. Bunch-Kaufman pivoting is available for matrices of mtype=-2, mtype=-4, or mtype=6. |
|
iparm(22) output |
Inertia: number of positive eigenvalues. Intel® oneAPI Math Kernel Library (oneMKL) PARDISO reports the number of positive eigenvalues for symmetric indefinite matrices. |
|
iparm(23) output |
Inertia: number of negative eigenvalues. Intel® oneAPI Math Kernel Library (oneMKL) PARDISO reports the number of negative eigenvalues for symmetric indefinite matrices. |
|
iparm(24) - iparm(26) | Reserved. Set to zero. |
|
iparm(27) input |
Matrix checker. |
|
0* | Do not check the sparse matrix representation for errors. |
|
1 | Check integer arrays ia and ja. In particular, check whether the column indices are sorted in increasing order within each row. |
|
iparm(28) input |
Single or double precision Parallel Direct Sparse Solver for Clusters Interface. See iparm(8) for information on controlling the precision of the refinement steps. |
|
0* | Input arrays (a, x and b) and all internal arrays must be presented in double precision. |
|
1 | Input arrays (a, x and b) must be presented in single precision. In this case all internal computations are performed in single precision. |
|
iparm(29) | Reserved. Set to zero. |
|
iparm(30) output |
Number of zero or negative pivots. If Intel® oneAPI Math Kernel Library (oneMKL) PARDISO detects zero or negative pivot formtype=2 or mtype=4 matrix types, the factorization is stopped. Intel® oneAPI Math Kernel Library (oneMKL) PARDISO returns immediately with anerror = -4, and iparm(30) reports the number of the equation where the zero or negative pivot is detected. Note: The returned value can be different for the parallel and sequential version in case of several zero/negative pivots. |
|
iparm(31) input |
Partial solve and computing selected components of the solution vectors. This parameter controls the solve step of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO. It can be used if only a few components of the solution vectors are needed or if you want to reduce the computation cost at the solve step by utilizing the sparsity of the right-hand sides. To use this option the input permutation vector defineperm so that when perm(i) = 1 it means that either the i-th component in the right-hand sides is nonzero, or the i-th component in the solution vectors is computed, or both, depending on the value of iparm(31). The permutation vector permmust be present in all phases of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO software. At the reordering step, the software overwrites the input vectorperm by a permutation vector used by the software at the factorization and solver step. If m is the number of components such that perm(i) = 1, then the last m components of the output vector perm are a set of the indices i satisfying the condition perm(i) = 1 on input.
NOTE:
Turning on this option often increases the time used by Intel® oneAPI Math Kernel Library (oneMKL) PARDISO for factorization and reordering steps, but it can reduce the time required for the solver step. |
|
0* | Disables this option. |
|
1 | it is assumed that the right-hand sides have only a few non-zero components* and the input permutation vector perm is defined so that perm(i) = 1 means that the (i)-th component in the right-hand sides is nonzero. In this case Intel® oneAPI Math Kernel Library (oneMKL) PARDISO only uses the non-zero components of the right-hand side vectors and computes only corresponding components in the solution vectors. That means thei-th component in the solution vectors is only computed if perm(i) = 1. |
|
2 | It is assumed that the right-hand sides have only a few non-zero components* and the input permutation vector perm is defined so that perm(i) = 1 means that the i-th component in the right-hand sides is nonzero. Unlike for iparm(31)=1, all components of the solution vector are computed for this setting and all components of the right-hand sides are used. Because all components are used, for iparm(31)=2 you must set the i-th component of the right-hand sides to zero explicitly if perm(i) is not equal to 1. |
|
3 | Selected components of the solution vectors are computed. The perm array is not related to the right-hand sides and it only indicates which components of the solution vectors should be computed. In this case perm(i) = 1 means that the i-th component in the solution vectors is computed. |
|
iparm(31) - iparm(33) | Reserved. Set to zero. |
|
iparm(35) input |
One- or zero-based indexing of columns and rows. |
|
0* | One-based indexing: columns and rows indexing in arrays ia, ja, and perm starts from 1 (Fortran-style indexing). |
|
1 | Zero-based indexing: columns and rows indexing in arrays ia, ja, and perm starts from 0 (C-style indexing). |
|
iparm(36) input |
Schur complement matrix computation control. To calculate this matrix, you must set the input permutation vector perm to a set of indexes such that when perm(i) = 1, the i-th element of the initial matrix is an element of the Schur matrix. |
|
0* | Do not compute Schur complement. |
|
1 | Compute Schur complement matrix as part of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO factorization step and return it in the solution vector.
NOTE:
This option only computes the Schur complement matrix, and does not calculate factorization arrays. |
|
2 | Compute Schur complement matrix as part of Intel® oneAPI Math Kernel Library (oneMKL) PARDISO factorization step and return it in the solution vector. Since this option calculates factorization arrays you can use it to launch partial or full solution of the entire problem after the factorization step. |
|
iparm(37) input |
Format for matrix storage. |
|
0* | Use CSR format (see Three Array Variation of BSR Format) for matrix storage. |
|
1 | Use CSR format (see Three Array Variation of BSR Format) for matrix storage. |
|
< 0 | Convert supplied matrix to variable BSR (VBSR) format (see Sparse Data Storage) for matrix storage. Intel® oneAPI Math Kernel Library (oneMKL) PARDISO analyzes the matrix provided in CSR3 format and converts it to an internal VBSR format. Setiparm(37) = -t, 0 < t≤ 100. |
|
iparm(38) - iparm(39) | Reserved. Set to zero. |
|
iparm(40) input |
Matrix input format.
NOTE:
Performance of the reordering step of the Parallel Direct Sparse Solver for Clusters Interface is slightly better for assembled format (CSR, iparm(40) = 0) than for distributed format (DCSR, iparm(40) > 0) for the same matrices, so if the matrix is assembled on one node do not distribute it before calling cluster_sparse_solver. |
|
0* | Provide the matrix in usual centralized input format: the master MPI process stores all data from matrix A, with rank=0. |
|
1 | Provide the matrix in distributed assembled matrix input format. In this case, each MPI process stores only a part (or domain) of the matrix A data. Set the bounds of the domain using iparm(41) and iparm(42). The solution vector is placed on the master process. |
|
2 | Provide the matrix in distributed assembled matrix input format. In this case, each MPI process stores only a part (or domain) of the matrix A data. Set the bounds of the domain using iparm(41) and iparm(42). The solution vector, A, and RHS elements are distributed between processes in same manner. |
|
3 | Provide the matrix in distributed assembled matrix input format. In this case, each MPI process stores only a part (or domain) of the matrix A data. Set the bounds of the domain using iparm(41) and iparm(42). The A and RHS elements are distributed between processes in same manner and the solution vector is the same on each process |
|
iparm(41) input |
Beginning of input domain. The number of the matrix A row, RHS element, and, for iparm(40)=2, solution vector that begins the input domain belonging to this MPI process. Only applicable to the distributed assembled matrix input format (iparm(40)> 0). See Sparse Matrix Storage Formats for more details. |
|
iparm(42) input |
End of input domain. The number of the matrix A row, RHS element, and, for iparm(40)=2, solution vector that ends the input domain belonging to this MPI process. Only applicable to the distributed assembled matrix input format (iparm(40)> 0). See Sparse Matrix Storage Formats for more details. |
|
iparm(43) - iparm(59) input |
Reserved. Set to zero. |
|
iparm(60) input |
cluster_sparse_solver mode. iparm(60) switches between in-core (IC) and out-of-core (OOC) of cluster_sparse_solver. OOC can solve very large problems by holding the matrix factors in files on the disk, which requires a reduced amount of main memory compared to IC. Unless you are operating in sequential mode, you can switch between IC and OOC modes after the reordering phase. However, you can get better cluster_sparse_solver performance by setting iparm(60) before the reordering phase. The amount of memory used in OOC mode depends on the number of OpenMP threads.
WARNING:
Do not increase the number of OpenMP threads used for cluster_sparse_solver between the first call and the factorization or solution phase. Because the minimum amount of memory required for out-of-core execution depends on the number of OpenMP threads, increasing it after the initial call can cause incorrect results. |
|
0* | IC mode. |
|
1 | IC mode is used if the total amount of RAM (in megabytes) needed for storing the matrix factors is less than sum of two values of the environment variables: MKL_PARDISO_OOC_MAX_CORE_SIZE (default value 2000 MB) and MKL_PARDISO_OOC_MAX_SWAP_SIZE (default value 0 MB); otherwise OOC mode is used. In this case amount of RAM used by OOC mode cannot exceed the value of MKL_PARDISO_OOC_MAX_CORE_SIZE. If the total peak memory needed for storing the local arrays is more than MKL_PARDISO_OOC_MAX_CORE_SIZE, increase MKL_PARDISO_OOC_MAX_CORE_SIZE if possible.
NOTE:
Conditional numerical reproducibility (CNR) is not supported for this mode. |
|
2 | OOC mode. The OOC mode can solve very large problems by holding the matrix factors in files on the disk. Hence the amount of RAM required by OOC mode is significantly reduced compared to IC mode. If the total peak memory needed for storing the local arrays is more than MKL_PARDISO_OOC_MAX_CORE_SIZE, increase MKL_PARDISO_OOC_MAX_CORE_SIZE if possible. To obtain better cluster_sparse_solver performance, during the numerical factorization phase you can provide the maximum number of right-hand sides, which can be used further during the solving phase.
NOTE:
To use OOC mode, you must disable iparm(11) (scaling) and iparm(13) = 1 (matching).
|
|
iparm(61) - iparm(62) input |
Reserved. Set to zero. |
|
iparm(63) output |
Size of the minimum OOC memory for numerical factorization and solution. This parameter provides the size in kilobytes of the minimum memory required by OOC Intel® oneAPI Math Kernel Library (oneMKL) PARDISO for internal floating point arrays. This parameter is computed in phase 1. Total peak memory consumption of OOC Intel® oneAPI Math Kernel Library (oneMKL) PARDISO can be estimated asmax(iparm(15), iparm(16) + iparm(63)). |
|
iparm(64) input |
Reserved. Set to zero. |
Generally in sparse matrices, components which are equal to zero can be considered non-zero if necessary. For example, in order to make a matrix structurally symmetric, elements which are zero can be considered non-zero. See Sparse Matrix Storage Formats for an example.
Product and Performance Information |
---|
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |