FPGA AI Suite: PCIe-based Design Example User Guide

ID 768977
Date 7/31/2024
Public
Document Table of Contents

5.5.3. Additional dla_benchmark Options

The dla_benchmark tool is part of the example design and the distributed runtime includes full source code for the tool.
Table 3.  Command Line dla_benchmark Options
Command Description
-nireq=<N> This controls the number of simultaneous inference requests that are sent to the FPGA.

Typically, this should be at least twice the number of IP instances; this ensures that each IP can execute one inference request while dla_benchmark loads the feature data for a second inference request to the FPGA-attached DDR memory.

-b=<N>

--batch-size=<N>

This controls the batch size.

A batch size greater than 1 is created by repeating configuration data for multiple copies of the graph.

A batch size of 1 is typically best.

-niter=<N> Number of images to process in each batch.
-d=<STRING> Using -d=HETERO:FPGA, CPU causes dla_benchmark to use the OpenVINO™ heterogeneous plugin to execute inference on the FPGA, with fallback to the CPU for any layers that cannot go to the FPGA.

Using -d=HETERO:CPU or -d=CPU executes inference on the CPU, which may be useful for testing the flow when an FPGA is not available. Using -d=HETERO:FPGA may be useful for ensuring that all graph layers are accelerated on the FPGA (and an error is issued if this is not possible).

-arch_file=<FILE>

--arch=<FILE>

This specifies the location of the .arch file that was used to configure the IP on the FPGA. The dla_benchmark will issue an error if this does not match the.arch file used to generate the IP on the FPGA.
-m=<FILE>

--network_file=<FILE>

This points to the XML file from OpenVINO™ Model Optimizer that describes the graph. The BIN file from Model Optimizer must be kept in the same directory and same filename (except for the file extension) as the XML file.
-i=<DIRECTORY> This points to the directory containing the input images. Each input file corresponds to one inference request. The files are read in order sorted by filename; set the environment variable VERBOSE=1 to see details describing the file order.
-api=[sync|async] The -api=async option allows dla_benchmark to fully take advantage of multithreading to improve performance. The -api=sync option may be used during debug.
-groundtruth_loc=<FILE> Location of the file with ground truth data. If not provided, then dla_benchmark will not evaluate accuracy. This may contain classification data or object detection data, depending on the graph.
-yolo_version=<STRING> This option is used when evaluating the accuracy of a YOLOv3 or TinyYOLOv3 object detection graph. The options are yolo-v3-tf and yolo-v3-tiny-tf.
-enable_object_detection_ap This option may be used with an object detection graph (YOLOv3 or TinyYOLOv3) to calculate the object detection accuracy.
-bgr When used, this flag indicates that the graph expects input image channel data to use BGR order.
-plugins_xml_file=<FILE>
Deprecated: This option is deprecated and will be removed in a future release. Use the -plugins option instead.

This option specifies the location of the file specifying the OpenVINO™ plugins to use. This should be set to $COREDLA_ROOT/runtime/plugins.xml in most cases. If you are porting the design to a new host or doing other development, it may be necessary to use a different value.

-plugins=<FILE>

This option specifies the location of the file that specifies the OpenVINO plugins to use.

The default behavior is to read the plugins.xml file from the runtime/ directory. This runs inference on the FPGA device.

If you want to run inference using the reference model, specify -plugins=reference.

If you are porting the design to a new host or doing other development, you might need to use a different value.

-mean_values=<input_name[mean_values]> Uses channel-specific mean values in input tensor creation through the following formula: .

The Model Optimizer mean values are the preferred choice and the mean values defined by this option serve as fallback values.

-scale_values=<input_name[scale_values]> Uses channel-specific scale values in input tensor creation through the following formula: .

The Model Optimizer scale values are the preferred choice and the scale values defined by this option serve as fallback values.

-pc This option reports the performance counters for the CPU subgraphs, if there is any. No sorting is done on the report.
-pcsort=[sort|no_sort|simple_sort] This option reports the performance counters for the CPU subgraph and sets the sorting option for the performance counter report:
  • sort: Report is sorted by operation time cost
  • no_sort: Report is not sorted
  • simple_sort: Report is sorted by opts time cost but print only executed operations
-save_run_summary Collect performance metrics during inference. These metrics can help you determine how efficient an architecture is at executing a model.

For more information, refer to The dla_benchmark Performance Metrics.