6.3.1. OpenVINO™ FPGA Runtime Plugin

FPGA AI Suite: PCIe-based Design Example User Guide

Download PDF

ID 768977

Date 3/28/2025

Version

Public

Visible to Intel only — GUID: hit1661659886697

Ixiasoft

View Details

6.3.1. OpenVINO™ FPGA Runtime Plugin

The FPGA runtime plugin uses the OpenVINO™ Inference Engine Plugin API.

The OpenVINO™ Plugin architecture is described in the OpenVINO™ Developer Guide for Inference Engine Plugin Library.

The source files are located under runtime/plugin. The three main components of the runtime plugin are the Plugin class, the Executable Network class, and the Inference Request class. The primary responsibilities for each class are as follows:

Plugin class

Initializes the runtime plugin with an FPGA AI Suite architecture file which you set as an OpenVINO™ configuration key (refer to Running the Ported OpenVINO Demonstration Applications).
Contains QueryNetwork function that analyzes network layers and returns a list of layers that the specified architecture supports. This function allows network execution to be distributed between FPGA and other devices and is enabled with the HETERO mode.
Creates an executable network instance in one of the following ways:
- Just-in-time (JIT) flow: Compiles a network such that the compiled network is compatible with the hardware corresponding to the FPGA AI Suite architecture file, and then loads the compiled network onto the FPGA device.
- Ahead-of-time (AOT) flow: Imports a precompiled network (exported by FPGA AI Suite compiler) and loads it onto the FPGA device.

Executable Network Class

Represents an FPGA AI Suite compiled network
Loads the compiled model and config data for the network onto the FPGA device that has already been programmed with an FPGA AI Suite bitstream. For two instances of FPGA AI Suite, the Executable Network class loads the network onto both instances, allowing them to perform parallel batch inference.
Stores input/output processing information.
Creates infer request instances for pipelining multiple batch execution.

Infer Request class

Runs a single batch inference serially.
Executes five stages in one inference job – input layout transformation on CPU, input transfer to DDR, FPGA AI Suite FPGA execution, output transfer from DDR, output layout transformation on CPU.
In asynchronous mode, executes the stages on multiple threads that are shared across all inference request instances so that multiple batch jobs are pipelined, and the FPGA is always active.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA AI Suite: PCIe-based Design Example User Guide

6.3.1. OpenVINO™ FPGA Runtime Plugin