FPGA AI Suite: IP Reference Manual

ID 768974
Date 12/16/2024
Public
Document Table of Contents

2.5.2.3. Parameter Group: pe_array

This parameter group configures the PE Array. The PE Array is used to calculate dot products.

Parameter: pe_array/dsp_limit

Use this parameter to force the PE array to implement multipliers in ALM logic on the FPGA.

The number of multipliers that the PE requires is determined by the k_vector and c_vector global parameters. Given the value of the arch_precision global parameter and the target architecture (for example, Arria® 10 or Agilex™ 7), the number of multipliers determines the number of DSPs that the PE Array tries to use. If this number exceeds the value set in the dsp_limit parameter, then some multipliers are implemented in ALM logic to ensure that the PE Array DSP usage does not exceed the limit set by the dsp_limit parameter.

If this option is omitted, then all multipliers are implemented in the FPGA AI Suite IP as DSPs.

Typically, this parameter is set by the architecture optimizer.

Parameters: pe_array/num_interleaved_features, pe_array/num_interleaved_filters

To support layers with bias values, the PE array uses a threaded accumulator that is time-multiplexed to handle multiple accumulations. Each accumulation corresponds to an output filter and feature.

Common Values:
Agilex™ 5 devices
12x1
Agilex™ 7 devices
5x1, 3x2
Arria® 10 devices
4x1, 2x2
Cyclone® 10 GX
Stratix® 10 devices
5x1, 3x2

All architectures support a 1x1 interleave. Selecting a 1x1 interleave typically reduces ALM consumption, but the IP associated with this architecture does not support layers with bias. Because most deep learning graphs include bias, the 1x1 interleave is typically not used.

The architecture optimizer does not modify the num_interleaved_features and num_interleaved_filters values. You must set them manually.

The filter interleave multiplies the effective KVEC, which means that graphs with a depthwise convolution (such as certain versions of MobileNet) might perform best when using num_interleaved_filters=1. Multilayer perceptron graphs might perform best when using num_interleaved_features=1.

Except in the 1x1 case, the value of num_interleaved_features multiplied by num_interleaved_filters must meet the following requirements:
Agilex™ 5 devices
The value of num_interleaved_features must be greater than or equal to 12.
Agilex™ 7 devices
The value of num_interleaved_features multiplied by num_interleaved_filters must be greater than or equal to five.
Arria® 10 devices
The value of num_interleaved_features multiplied by num_interleaved_filters must be greater than or equal to four.
Cyclone® 10 GX devices
Stratix® 10 devices
The value of num_interleaved_features multiplied by num_interleaved_filters must be greater than or equal to five.

There is no advantage in choosing interleave factors larger than the minimum required.

Parameter: pe_array/exit_fifo_depth

This parameter controls the depth of the PE Array exit FIFO. Larger values might reduce the incidence of stalling, but at the cost of area.

Typically, this parameter is not modified.

Parameter: pe_array/enable_scale

This parameter controls whether the IP supports scaling feature values by a per-channel weight. This is used to support batch normalization and INT8 scaling (in graphs that are INT-8 quantized and do not use block floating point).

In most graphs, the graph compiler (dla_compiler command) adjusts the convolution weights to account for scale, so this option is usually not required. (Similarly, if a shift is required, then the convolution bias values are adjusted).

Legal values:
true, false