Variable Precision DSP Blocks User Guide: Agilex™ 5 FPGAs and SoCs

ID 813968
Date 4/01/2024
Public
Document Table of Contents

2.3.1.2. Side Input Feed Preloading Method

The side input feed preloading method preloads the 80-bit weight data and 8-bit shared exponent data into the ping-pong buffers using data_in[87:80] and data_in[95:88] buses. The preloading process takes 12 cycles to complete preloading for one set of ping-pong buffers or 24 cycles to complete the preloading of two sets of ping-pong buffers. The weight and shared exponent data are preloaded independently even if the DSP blocks are cascaded. This enables the tensor computation to continue while the second and subsequent sets are preloaded. The following figure shows the dataflow for side input feed.

Figure 20. Dataflow for Side Input Feed MethodThe feed paths are highlighted in red in this figure.
Figure 21. Side Input Feed Method Timing Diagrams
  1. In cycle 1 to 12, the dynamic control signals are set as follow:
    • load_bb_one = 1’b1 and load_bb_two = 1’b0 to preload the weight and shared exponent data into the first set of ping-pong buffers
    • load_buf_sel = 1’b0 to not switch the ping-pong buffers
  2. In cycle 13 to cycle 24, the DSP block takes the activation data and the shared exponents from data_in[79:0] and shared_exponent[7:0] respectively. The N represents the cycle number for the DSP block to complete loading the activation data and the shared exponents. The load_buf_sel is set to 1’b0 to disable the ping-pong buffers loading. The DSP block takes the loaded weights from the first set of ping-pong buffers for DOT product computations.
  3. Simultaneously, in cycle 13 to 24, the dynamic control signals are set as follow:
    • load_bb_one = 1’b0 and load_bb_two = 1’b1 to preload the weight and shared exponent data into the second set of ping-pong buffers
    • load_buf_sel = 1’b0 to not switch the ping-pong buffers
  4. From cycle N+1 to cycle 2N+1, the DSP block takes the activation data and the shared exponents from data_in[79:0] and shared_exponent[7:0]. The load_buf_sel is set to 1’b1 to switch to the data in the second set of ping-pong buffers for DOT product computations.
  5. The overall process repeats when all the data in the ping-pong buffers have been processed.