Variable Precision DSP Blocks User Guide: Agilex™ 5 FPGAs and SoCs

ID 813968
Date 9/20/2024
Public
Document Table of Contents

3.3.2. Side Input Feed Preloading Method

The side input feed preloading method preloads the ten 8-bit weight data and 8-bit shared exponent data into the ping-pong buffers using side_in_1[7:0] and side_in_2[7:0] buses. The preloading process takes 12 cycles to complete preloading for one set of ping-pong buffers. The weight and shared exponent data are preloaded independently even if the DSP blocks are cascaded. This enables the tensor computation to continue using one set of buffers whilst the other set is being pre-loaded. The following figure shows the dataflow for side input feed.

Figure 54. Dataflow for Side Input Feed MethodThe feed paths are highlighted in red in this figure.
Figure 55. Side Input Feed Method Timing Diagrams
Note: SEC1 refers to the shared exponent for column 1 and SEC2 refers to the shared exponent for column 2. BxCy refers to the buffer associated with data_in_x in column y, i.e. B10C2 is the buffer multiplied with data_in_10 in column 2.
  1. The update process starts in cycle 1 by setting load_bb_one or load_bb_two to 1’b1. During this first cycle, computation may continue using the buffer set determined by load_buf_sel.
  2. New shared exponents and weights are loaded during cycles 2 through to cycle 13.
    • Data is loaded in accordance with the pattern in the previous diagram via side_in_2 and side_in_1.
    • Note that cycles 2 and 3 are still required when the shared exponent is not used as in tensor fixed-point mode.
    • During side feed loading, computation can continue using the other set of buffers as determined by the load_buf_sel signal.
    • load_bb_one or load_bb_two should become inactive in cycle 13, one cycle before the last of the data is loaded via side_in_2 and side_in_1.
    • load_bb_one or load_bb_two becomes inactive one cycle before the last data is fed in through side_in_1 and side_in_2.
  3. In cycle 14, the newly loaded buffer content is ready for computation and the load_buf_sel signal can be switched to the new buffer set for computation to continue with the previous set.