FPGA AI Suite: SoC Design Example User Guide

ID 768979
Date 12/16/2024
Public
Document Table of Contents

8.4.1. The streaming_inference_app Application

The streaming_inference_app application is an OpenVINO™ -based application. It loads a given precompiled ResNet50 network, then creates inference requests that are executed asynchronously by the FPGA AI Suite IP.

The resulting tensors are captured from the EMIF using the mSGDMA controller. The postprocessing required in the software involves converting the output tensors to floating point, assigning the values to the appropriate image classification, sorting the results, and selecting the top 5 classification results.

For each inference, the result is displayed on the terminal, and the results for each inference up to the 1000th one are logged in a results.txt file in the application folder.

The application depends on the following shared libraries . The system build adds these libraries to the directory /home/root/app on the SD card image, along with the application binary and a plugins.xml file that defines the plugins available to OpenVINO.
  • libhps_platform_mmd.so
  • libngraph.so
  • libinference_engine.so
  • libinference_engine_transformations.so
  • libcoreDLAHeteroPlugin.so
  • libcoreDlaRuntimePlugin.so

You also need a compiled network binary file and an .arch file (which describes the FPGA AI Suite IP parameterization) to run inferences. These have been copied to the /home/root/resnet-50-tf directory.

For example, a ResNet50 model compiled for an Arria® 10 might have the following files:
  • RN50_Performance_no_folding.bin
  • A10_Performance.arch
Before running the application, set the LD_LIBRARY_PATH shell environment variable to define the location of the shared libraries:
root@arria10-1ac87246f24f:~# cd /home/root/app
root@arria10-1ac87246f24f:~# export LD_LIBRARY_PATH=.
Use the --help of the streaming_inference_app command to display the command usage:
# ./streaming_inference_app -help
Usage:
        streaming_inference_app -model=<model> -arch=<arch> -device=<device>

Where:
        <model>    is the compiled model binary file, eg /home/root/resnet-50-tf/RN50_Performance_no_folding.bin
        <arch>     is the architecture file, eg /home/root/resnet-50-tf/A10_Performance.arch
        <device>   is the OpenVINO device ID, eg HETERO:FPGA or HETERO:FPGA,CPU
Start the streaming inference app with a command like this:
# ./streaming_inference_app \
-model=/home/root/resnet-50-tf/RN50_Performance_no_folding.bin \  
-arch=/home/root/resnet-50-tf/A10_Performance.arch \
-device=HETERO:FPGA

The distribution includes a shell script utility called run_inference_stream.sh which calls this command above.

Note that the layout transform IP core does not support folding on the input buffer. For streaming, you must use models that have been compiled by the dla_compiler command with the --ffolding-option=0 command line option specified.