Variable Precision DSP Blocks User Guide: Agilex™ 5 FPGAs and SoCs

ID 813968
Date 9/20/2024
Public
Document Table of Contents

2.3. Tensor Mode

In Tensor mode, there are two DOT engines and ALUs arranged as two columns as shown in the following figure.

Figure 18. Tensor Mode High-Level Block Diagram

The DOT engine computes the fixed-point product of 10 individual 8-bit data inputs and 10 pre-loaded 8-bit buffers, and sums all of the results together. The data inputs are common between the two columns and there are two separate sets of buffers per column intended for storing weights.

A load_buf_sel signal controls which set of weight is active. This allows one set to be updated while the other set is being used for computations. The DOT product outputs from each column are fed into independent ALUs.

Figure 19. DOT Engine for a Single Column
There are three operational modes:
  • Tensor Floating-point Mode
    • The DOT engine feeds the ALU which operates in floating-point mode. A shared exponent is supplied with the input data and separate shared exponents are loaded into the two sets of ping pong buffers. These shared exponents are used in the ALU to perform a fixed-point to floating-point conversion. It represents and manipulates rational numbers and operates in the floating-point format (IEEE-754 standard).
  • Tensor Fixed-point Mode
    • The DOT engine feeds the ALU which operates in fixed-point mode. It represents and manipulates integers and fixed-point numbers.
  • Tensor Accumulation Mode
    • The DOT engine is bypassed and ALU operates in floating-point mode. A 32-bit floating-point input is supplied directly to the ALU for each column. It represents and manipulates rational numbers and operates in the floating-point format (IEEE-754 standard).

In all three operational modes, the ALU features an accumulator and cascade signals between DSP blocks.