FPGA AI Suite: PCIe-based Design Example User Guide

ID 768977
Date 7/31/2024
Public
Document Table of Contents

5.5.4. The dla_benchmark Performance Metrics

The -save_run_summary option makes the dla_benchmark demonstration application collect performance metrics during inference. These metrics can help you determine how efficient an architecture is at executing a model.

Note: The dla_benchmark application provides throughput in "frames per second". The time per frame (latency) is 1/throughput.

Statistic

Description

Count

The number of times interference was performed. This is set by the -niter option.

System duration

The total time between when the first inference request was made to when the last request was finished, as measured by the host program.

IP duration

The total time the spent-on inference. This is reported by the IP on the FPGA.

Latency

The median time of all inference requests made by the host. This includes any overhead from OpenVINO™ or the FPGA AI Suite runtime.

System throughput

The total throughput of the system, including any OpenVINO™ or FPGA AI Suite runtime overhead.

Number of hardware instances

The number of IP instances on the FPGA.

Number of network instances

The number graphs that the IP processes in parallel.

IP throughput per instance

The throughput of a single IP instance. This is reported by the IP on the FPGA.

IP throughput per fMAX per instance

The IP throughput per instance value scaled by the IP clock frequency value.

IP clock frequency

The clock frequency, as reported by the IP running on the FPGA device.

The dla_benchmark application treats this value as the IP core fMAX value.

Estimated IP throughput per instance

The estimated per-IP throughput, as estimated by the dla_compiler command with the --fanalyze-performance option.

Estimated IP throughput per fmax per instance

The Estimated IP throughput per instance value scaled by the compiler fMAX estimate.