7.7.12. Single-Precision Complex Floating-Point Matrix Multiply

DSP Builder for Intel® FPGAs (Advanced Blockset): Handbook

Download PDF

ID 683337

Date 6/26/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: hco1423076645786

Ixiasoft

View Details

7.7.12. Single-Precision Complex Floating-Point Matrix Multiply

This design example uses a similar flow control style to that in the floating-point Mandlebrot set design example. The design example uses a limited number of multiply-adds, set by the vector size, to perform a complex single precision matrix multiply.

A matrix multiplication must multiply row and column dot product for each output element. For 8×8 matrices A and B: 

Equation 1. Matrix Multiply Equation

$A B_{i j} = \sum_{k = 1}^{8} A_{i k} B_{k j}$

 You may accumulate the adjacent partial results, or build adder trees, without considering any latency. However, to implement with a smaller dot product, consider resource usage folding, which uses a smaller number of multipliers rather than performing everything in parallel. Also split up the loop over k into smaller chunks. Then reorder the calculations to avoid adjacent accumulations.

A traditional implementation of a matrix multiply design is structured around a delay line and an adder tree:

A₁₁B₁₁ +A₁₂B₂₁ +A₁₃B₃₁ and so on.

The traditional implementation has the following features:

The length and size grow with folding size (typically 8 to 12)
Uses adder trees of 7 to 10 adders that are only used once every 10 cycles.
Each matrix size needs different length, so you must provide for the worst case

A better implementation is to use FIFO buffers to provide self-timed control. New data is accumulated when both FIFO buffers have data. This implementation has the following advantages:

Runs as fast as possible
Is not sensitive to latency of dot product on devices or f_MAX
Is not sensitive to matrix size (hardware just stalls for small N)
Can be responsive to back pressure, which stops FIFO buffers emptying and full feedback to control

The model file is matmul_CS.mdl.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

DSP Builder for Intel® FPGAs (Advanced Blockset): Handbook

7.7.12. Single-Precision Complex Floating-Point Matrix Multiply