Intel® FPGA AI Suite: IP Reference Manual

ID 768974
Date 7/03/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

2.5.4.2. Input Folding

Many graphs, particularly those processing image data, have very shallow input channels. In high CVEC instantiations, very shallow input channels can lead to low computational efficiency.

The folding can be done in conjunction with the Intel® FPGA AI Suite compiler and the Intel® FPGA AI Suite IP, or it can be performed by a separate layout transform block that you provide.

Input folding is typically most beneficial to the first layer of a graph.

The following figure illustrates an example of the functionality of the folding transform performed by the Intel® FPGA AI Suite compiler.

Figure 4. Illustration of the folding transform for a 1x1×5×5 input, 1×1×3×3 filter and stride_height = stride_width = 2

In this transformation, the input depth, height, and width are folded into the channel dimension by a factor corresponding to the stride of the first convolution of a network. In the earlier figure, this factor corresponds to transforming the input channels from 1 to 4 (), input height from 5 to 3 () and input width from 5 to 3. Each color corresponds to the new filter window, which in this case would be 4×1×2×2, with the gray boxes corresponding to 0 padding for the filters. Folding is done in a similar way for inputs with depths greater than one, but the folding transform illustration excludes it for simplicity.

The Intel® FPGA AI Suite IP has various enhancements that reduce, but not eliminate the efficiency hit of shallow first layers. In many cases, you can disable first layer folding in the compiler and pass shallow-channel tensors directly to the IP hardware.

You can disable or enable folding with the following Intel® FPGA AI Suite compiler options:

  • NoFolding

    No folding is performed by the host or an external module. This leads to low efficiency for the Intel® FPGA AI Suite IP but might be useful for debugging purposes.

  • ExternalFullFolding

    Folding performed by the host or an external module for the depth, height, and width stride of the first convolution layer.

  • ExternalFullExtraPEFolding

    Folding is performed by the host or an external module for the depth, height, and width stride of the first convolution layer with additional folding performed afterward via the Intel® FPGA AI Suite IP. This might lead to better performance than ExternalFullFolding depending on the instantiation parameters of IP.

  • PEFolding

    Folding is performed entirely by the Intel® FPGA AI Suite IP without the need for any host or external module. This should lead to similar performance to ExternalFullFolding depending on the instantiation parameters of the Intel® FPGA AI Suite IP and the neural network topology.

    PEFolding mode does not support input with a depth greater than one.