FPGA AI Suite: IP Reference Manual

ID 768974
Date 12/16/2024
Public
Document Table of Contents

2.6.4.2. Input Folding

Many graphs, particularly those processing image data, have very shallow input channels. In high CVEC instantiations, very shallow input channels can lead to low computational efficiency.

The folding can be done in conjunction with the FPGA AI Suite compiler and the FPGA AI Suite IP, or it can be performed in hardware using the hardware layout transform described in Input Layout Transform Hardware.

Input folding is typically most beneficial to the first layer of a graph.

The following figure illustrates an example of the functionality of the folding transform performed by the FPGA AI Suite compiler.

Figure 4. Illustration of the folding transform for a 1x1×5×5 input, 1×1×3×3 filter and stride_height = stride_width = 2

In this transformation, the input depth, height, and width are folded into the channel dimension by a factor corresponding to the stride of the first convolution of a network. In the earlier figure, this factor corresponds to transforming the input channels from 1 to 4 (), input height from 5 to 3 () and input width from 5 to 3. Each color corresponds to the new filter window, which in this case would be 4×1×2×2, with the gray boxes corresponding to 0 padding for the filters. Folding is done in a similar way for inputs with depths greater than one, but the folding transform illustration excludes it for simplicity.

The FPGA AI Suite IP has various enhancements that reduce, but not eliminate the efficiency hit of shallow first layers. In many cases, you can disable first layer folding in the compiler and pass shallow-channel tensors directly to the IP hardware.

You can disable or enable folding with the following FPGA AI Suite compiler options:

  • NoFolding

    No folding is performed by the host or an external module. This leads to low efficiency for the FPGA AI Suite IP but might be useful for debugging purposes.

  • ExternalFullFolding

    Folding performed by the host or an external module for the depth, height, and width stride of the first convolution layer.

  • ExternalFullExtraPEFolding

    Folding is performed by the host or an external module for the depth, height, and width stride of the first convolution layer with additional folding performed afterward via the FPGA AI Suite IP. This might lead to better performance than ExternalFullFolding depending on the instantiation parameters of IP.

  • PEFolding

    Folding is performed entirely by the FPGA AI Suite IP without the need for any host or external module. This should lead to similar performance to ExternalFullFolding depending on the instantiation parameters of the FPGA AI Suite IP and the neural network topology.

    PEFolding mode does not support input with a depth greater than one.