FPGA AI Suite: SoC Design Example User Guide

ID 768979
Date 3/29/2024
Public
Document Table of Contents

8.1.2.1. Review of M2M mode

To explain how the buffers are managed in streaming mode, it can help to review the existing flow for M2M mode.

The inference application loads source images from .bmp files into memory allocated from its heap. These buffers are 224x224x3 uint8_t samples (150528 bytes). During the load, the BGR channels are rearranged from interleaved channels into planes.

OpenVINO™ inference requests are created by the application using the inference engine. These inference requests allocate buffers in the on-board EMIF memory. The size of each of these buffers is the size of the input buffer plus the size of the output buffer. The input buffer size depends FPGA AI Suite IP parameters (specified in the .arch file) for which the graph was compiled.

The BGR planar image buffers are attached as input blobs to these OpenVINO™ inference requests, which are then scheduled for execution.

Preprocessing Steps

In M2M mode, the preprocessing steps are performed in software.

  • The samples are converted to 32-bit floating point, and the mean G, B and R values of the imagenet dataset are subtracted from each sample accordingly.
  • The samples are converted to 16-bit floating point.
  • A layout transform then maps these samples into a larger buffer which has padding, in the layout expected by the FPGA AI Suite.

Inference Execution Steps

  • The transformed image is written directly to board memory at its allocated address.
  • The FPGA AI Suite IP CSR registers are programmed to schedule the inference.
  • The FPGA AI Suite OpenVINO™ plugin monitors the completion count register (located on the FPGA AI Suite IP), either by polling or receiving an interrupt, and waits until the count increments.
  • The results buffer (2048 bytes) is read directly from the EMIF on the board to HPS memory.

Postprocessing Steps

  • The samples in the results buffer (1001 16-bit floating point values) are converted to 32-bit floating point.
  • The inference application receives these buffers, sorts them, and collects the top five results.