Visible to Intel only — GUID: ysl1582846255736
Ixiasoft
Visible to Intel only — GUID: ysl1582846255736
Ixiasoft
A.3. Cholesky Decomposition Library
The Cholesky Decomposition library provided with the Intel HLS Compiler provides an FPGA-optimized templated library to factor a Hermitian (symmetric) positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose .
Header File
#include ”HLS/cholesky_decompose.h”
Variants
You can call four different variants. All variants are in the ihc::cholesky namespace.
Cholesky decomposition iterates on the matrix heavily. For the best component performance, the component need low latency access to the matrix iterated on, and the matrix must be vectorized. However, this requirement is not always easy to meet, so two of the variants allow you to use separate pieces of memory for the iterative decomposition and recording the final output.
The variants provided are as follows:
- cholesky_decompose_real(A_input, L_iter, n);
Used for real-valued matrices.
- cholesky_decompose_real(A_input, L_output, L_iter, n);
Used for real-valued matrices with separate memory for iterative decomposition and final output.
- cholesky_decompose_complex(A_input, L_iter, n);
Used for complex-valued matrices.
- cholesky_decompose_complex(A_input, L_output, L_iter, n);
Used for complex-valued matrices with separate memory for iterative decomposition and final output.
Arguments
- A_input
-
A pointer to an array of size MATRIX_SIZE * MATRIX_SIZE, for providing input A matrix.
Its data width and alignment should match that of a single element in the matrix.
- L_iter
-
A pointer to an array of size MATRIX_SIZE * MATRIX_SIZE, for holding the L matrix that the program can iterate on.
In the 3-argument variants, this is the same place for recording the result. Its data width and alignment should match that of a vector of elements in the matrix, with specified vectorization width.
If a pointer to component memory is given, the compiler should be able to optimize that automatically.
- L_output
-
A pointer to an array of size MATRIX_SIZE * MATRIX_SIZE, for recording the final matrix result in the 4-argument variants.
Its data width and alignment should match that of a single element in the matrix.
- n
-
The actual matrix size.
It should not be greater than MATRIX_SIZE
Template Arguments
- FP_T
- The floating-point data type. Can be float or double.
- VEC_SIZE_PWR
- The logarithm (base 2) of the vectorization width. Vectorization with a width of is used for the access of L_iter matrix and the dot product computation.
- INNER_SAFELEN value applies to the partial_dot array,
- OUTER_SAFELEN_OVERWRITE value applies to the L_iter matrix
- INNER_SAFELEN
- The INNER_SAFELEN value applies to the partial_dot array. Its value roughly matches the latency of index calculation and a floating point addition in the algorithm. This safelen() value does not grow with matrix size. A default value of 16 is given, which should be ideal for single precision. For double precision, increase it accordingly.
- OUTER_SAFELEN_OVERWRITE
-
The OUTER_SAFELEN_OVERWRITE value applies to the L_iter matrix, and is related to intercolumn dependencies. An estimate of this safelen() value is provided and along with an estimation of how this value grows along with matrix size. However, the optimal value for this template argument depends on clock target values, target device families, and the precision used.
Use this parameter to overwrite the set safelen() value.
If OUTER_SAFELEN_OVERWRITE=-1, the safelen() value is tuned for single-precision float data types on an Intel® Arria® 10 device with a default clock target. The tuning also assumes that the L_iter matrix is in component memory with proper vectorization provided.