Visible to Intel only — GUID: mil1659542923888
Ixiasoft
Visible to Intel only — GUID: mil1659542923888
Ixiasoft
2.5. FPGA AI Suite IP Block Configuration
The FPGA AI Suite IP block has many important parameters that describe arithmetic precision, feature set, size of various modules (such as the PE Array), and details regarding the internal buses and the external AXI interface.
Configurable parameters are specified in the Architecture Description (.arch) file, as described in Architecture Description File Format for Instance Parameterization and Architecture Description File Parameters.
The table below shows the major parameters, some of which are not configurable, that describe the IP block.
Common Parameter Name |
Description |
Valid Range |
---|---|---|
c_vector (CVEC) |
Size of the dot product performed by each PE in the PE Array. Typically optimized when generating an optimized architecture with the FPGA AI Suite compiler. |
[4,8,16,32,64] |
k_vector (PE KVEC) |
Number of PEs in the PE Array |
[4-128] Must be a multiple of c_vector |
N/A | Number of auxiliary modules connecting to the crossbar (XBAR) |
[1-4] |
pool k_vector (Pool KVEC) |
Width of the pool interface. Typically optimized when generating an optimized architecture with the FPGA AI Suite compiler. |
[1,2,4,8,16,32,64] |
pool max_window_height pool max_window_width |
Size of the pooling window |
[3x3, 7x7,13x13] |
depthwise k_vector (Depthwise KVEC) |
Number of output channels processed in parallel. Typically optimized when generating an optimized architecture with the FPGA AI Suite compiler. | [16, 32, 64] Must be equal to k_vector |
depthwise max_window_height depthwise max_window_width |
Size of the depthwise filter |
[3x3, 5x5, 7x7] |
depthwise max_dilation_vertical depthwise max_dilation_horizontal |
Maximum supported value for the depthwise dilation | [1-6] |
activation k_vector (Activation KVEC) |
Width of the activation interface. Typically optimized when generating an optimized architecture with the FPGA AI Suite compiler. |
[2,4,8,16,32,64] |
enable_clamp | Enables clamp activation function | [true, false] |
enable_relu | Enables ReLU activation function | [true, false] |
enable_leaky_relu | Enables Leaky ReLU activation function | [true, false] |
enable_prelu | Enables PReLU activation function | [true, false] |
enable_round_clamp | Enables round clamp activation function | [true, false] |
enable_sigmoid | Enables Sigmoid and Swish activation functions | [true, false] |
enable_tanh | Enables Tanh activation function | [true, false] |
enable_parameter_rom | Enables storing graph parameters in on-chip memory, which requires input and output streaming to be enabled and configured. For details about DDR-free operation, refer to "Generating Artifacts for DDR-Free Operation" in the FPGA AI Suite Compiler Reference Manual . |
[true, false] |
arch_precision (PE precision) |
Precision of features and weights in the PE Array. |
"FP11" (INT7-BFP / 1s.6m.5e) "FP12AGX" (INT8-BFP / 8m.5e, two's complement) "FP13AGX" (INT9-BFP / 9m.5e, two's complement) "FP16" (INT12-BFP / 1s.11m.5e) |
PE bias add precision |
Precision of accumulator bias value in the PE Array. |
fp16 |
PE accumulator precision |
Precision of the accumulators in the PE Array. |
fp32 |
PE drain precision |
Precision of values drained from the PE Accumulators to the XBAR and AUX Modules. |
fp16 |
PE interleave factor |
Multi-threading factor for the features x filters in the PE array accumulators. |
Agilex™ 5 devices: 12x1 Agilex™ 7 devices: 2x3, 3x2, 5x1, 1x5 Arria® 10 devices: 2x2, 4x1, 1x4 Stratix® 10 devices: 2x3, 3x2, 5x1, 1x5 1×1 supported for graphs with no bias |
PE scale precision | Precision of scale multiplier in the PE array | fp16 |
Aux module precision |
Precision of the Aux Modules |
fp16 |
Memory port width |
Width of memory port |
[64, 128, 256, 512] |
enable_debug | Toggle the FPGA AI Suite debug network that includes interface profiling counters that can be queried with the CSR. Enabled by default. |
[true, false] |
enable_layout_transform | Enables the dedicated input tensor layout transform module.
Early access only: This feature has early access support only. Full support for this feature is planned for a future release.
|
[true, false] |
The major constraints include:
- PE KVEC must be a multiple of CVEC
- PE KVEC must be divisible by XBAR and AUX KVECs
- PE drain width must be equal to XBAR KVEC
Graph limitations include:
- Convolution filter size: 1×1 -> 28×28, including asymmetric
- Convolution filter stride: 1 .. 15
- No limitation on convolution padding
- The limits of the depthwise layers are the same as normal convolution. Depthwise convolution is handled with software emulation using regular convolution passes.
The maximum supported DDR size is 4GB.