FPGA AI Suite: PCIe-based Design Example User Guide

ID 768977
Date 3/29/2024
Public
Document Table of Contents

6.3.1. OpenVINO™ FPGA Runtime Plugin

The FPGA runtime plugin uses the OpenVINO™ Inference Engine Plugin API.

The OpenVINO™ Plugin architecture is described in the OpenVINO™ Developer Guide for Inference Engine Plugin Library.

The source files are located under runtime/plugin. The three main components of the runtime plugin are the Plugin class, the Executable Network class, and the Inference Request class. The primary responsibilities for each class are as follows:

Plugin class

  • Initializes the runtime plugin with an FPGA AI Suite architecture file which you set as an OpenVINO™ configuration key (refer to Running the Ported OpenVINO Demonstration Applications).
  • Contains QueryNetwork function that analyzes network layers and returns a list of layers that the specified architecture supports. This function allows network execution to be distributed between FPGA and other devices and is enabled with the HETERO mode.
  • Creates an executable network instance in one of the following ways:
    • Just-in-time (JIT) flow: Compiles a network such that the compiled network is compatible with the hardware corresponding to the FPGA AI Suite architecture file, and then loads the compiled network onto the FPGA device.
    • Ahead-of-time (AOT) flow: Imports a precompiled network (exported by FPGA AI Suite compiler) and loads it onto the FPGA device.

Executable Network Class

  • Represents an FPGA AI Suite compiled network
  • Loads the compiled model and config data for the network onto the FPGA device that has already been programmed with an FPGA AI Suite AFU/AF bitstream. For two instances of FPGA AI Suite, the Executable Network class loads the network onto both instances, allowing them to perform parallel batch inference.
  • Stores input/output processing information.
  • Creates infer request instances for pipelining multiple batch execution.

Infer Request class

  • Runs a single batch inference serially.
  • Executes five stages in one inference job – input layout transformation on CPU, input transfer to DDR, FPGA AI Suite FPGA execution, output transfer from DDR, output layout transformation on CPU.
  • In asynchronous mode, executes the stages on multiple threads that are shared across all inference request instances so that multiple batch jobs are pipelined, and the FPGA is always active.