Visible to Intel only — GUID: flu1719407121128
Ixiasoft
Visible to Intel only — GUID: flu1719407121128
Ixiasoft
5.5.4. The dla_benchmark Performance Metrics
The -save_run_summary option makes the dla_benchmark demonstration application collect performance metrics during inference. These metrics can help you determine how efficient an architecture is at executing a model.
Note: The dla_benchmark application provides throughput in "frames per second". The time per frame (latency) is 1/throughput.
Statistic |
Description |
---|---|
Count |
The number of times interference was performed. This is set by the -niter option. |
System duration |
The total time between when the first inference request was made to when the last request was finished, as measured by the host program. |
IP duration |
The total time the spent-on inference. This is reported by the IP on the FPGA. |
Latency |
The median time of all inference requests made by the host. This includes any overhead from OpenVINO™ or the FPGA AI Suite runtime. |
System throughput |
The total throughput of the system, including any OpenVINO™ or FPGA AI Suite runtime overhead. |
Number of hardware instances |
The number of IP instances on the FPGA. |
Number of network instances |
The number graphs that the IP processes in parallel. |
IP throughput per instance |
The throughput of a single IP instance. This is reported by the IP on the FPGA. |
IP throughput per fMAX per instance |
The IP throughput per instance value scaled by the IP clock frequency value. |
IP clock frequency |
The clock frequency, as reported by the IP running on the FPGA device. The dla_benchmark application treats this value as the IP core fMAX value. |
Estimated IP throughput per instance |
The estimated per-IP throughput, as estimated by the dla_compiler command with the --fanalyze-performance option. |
Estimated IP throughput per fmax per instance |
The Estimated IP throughput per instance value scaled by the compiler fMAX estimate. |