FPGA AI Suite: PCIe-based Design Example User Guide

ID 768977
Date 7/31/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.5.4.1. Interpreting System Throughput and Latency Metrics

The System throughput and Latency metrics are measured by the host through the OpenVINO™ API. These measurements include any overhead that is incurred by both the API and the FPGA AI Suite runtime. They also account for any time spent waiting to make inference requests and the number of available instances.

In general, the system throughput is defined as follows:

The Batch Size and Images Per Batch values are set by the --batch-size and -niter options, respectively.

For example, consider when -nireq=1 and there is a single IP instance. The System throughput value is approximately the same as the IP-reported throughput value because the runtime can perform only one inference at a time. However, if both the -nireq and the number of IP instances is greater than one, the runtime can perform requests in parallel. As such, the total system throughput is greater than the individual IP throughput.

In general, the -nireq value should be twice the number of IP instances. This setting enables the FPGA AI Suite runtime to pipeline inferences requests, which allows the host to prepare the data for the next request while an IP instance is processing the previous request.