FPGA AI Suite: PCIe-based Design Example User Guide

ID 768977
Date 12/16/2024
Public
Document Table of Contents

6.3.1. OpenVINO™ FPGA Runtime Plugin

The FPGA runtime plugin uses the OpenVINO™ Inference Engine Plugin API.

The OpenVINO™ Plugin architecture is described in the OpenVINO™ Developer Guide for Inference Engine Plugin Library.

The source files are located under runtime/plugin. The three main components of the runtime plugin are the Plugin class, the Executable Network class, and the Inference Request class. The primary responsibilities for each class are as follows:

Plugin class

  • Initializes the runtime plugin with an FPGA AI Suite architecture file which you set as an OpenVINO™ configuration key (refer to Running the Ported OpenVINO Demonstration Applications).
  • Contains QueryNetwork function that analyzes network layers and returns a list of layers that the specified architecture supports. This function allows network execution to be distributed between FPGA and other devices and is enabled with the HETERO mode.
  • Creates an executable network instance in one of the following ways:
    • Just-in-time (JIT) flow: Compiles a network such that the compiled network is compatible with the hardware corresponding to the FPGA AI Suite architecture file, and then loads the compiled network onto the FPGA device.
    • Ahead-of-time (AOT) flow: Imports a precompiled network (exported by FPGA AI Suite compiler) and loads it onto the FPGA device.

Executable Network Class

  • Represents an FPGA AI Suite compiled network
  • Loads the compiled model and config data for the network onto the FPGA device that has already been programmed with an FPGA AI Suite bitstream. For two instances of FPGA AI Suite, the Executable Network class loads the network onto both instances, allowing them to perform parallel batch inference.
  • Stores input/output processing information.
  • Creates infer request instances for pipelining multiple batch execution.

Infer Request class

  • Runs a single batch inference serially.
  • Executes five stages in one inference job – input layout transformation on CPU, input transfer to DDR, FPGA AI Suite FPGA execution, output transfer from DDR, output layout transformation on CPU.
  • In asynchronous mode, executes the stages on multiple threads that are shared across all inference request instances so that multiple batch jobs are pipelined, and the FPGA is always active.