5.5.3. Additional dla_benchmark Options

FPGA AI Suite: PCIe-based Design Example User Guide

Download PDF

ID 768977

Date 12/16/2024

Version

Public

Visible to Intel only — GUID: zbj1661605697053

Ixiasoft

5.5.3. Additional dla_benchmark Options

The dla_benchmark tool is part of the example design and the distributed runtime includes full source code for the tool.

Table 3. Command Line `dla_benchmark` Options
Command Option	Description
`-nireq=<N>`	This option controls the number of simultaneous inference requests that are sent to the FPGA. Typically, this should be at least twice the number of IP instances; this ensures that each IP can execute one inference request while `dla_benchmark` loads the feature data for a second inference request to the FPGA-attached DDR memory.
`-b=<N>` `--batch-size=<N>`	This option controls the batch size. A batch size greater than 1 is created by repeating configuration data for multiple copies of the graph. A batch size of 1 is typically best for latency System throughput for small graphs, when inference operations are offloaded from a CPU to an FPGA, may improve by using a batch greater than 1. On very small graphs, IP throughput may also improve when using a batch greater than 1. The default value is 1.
`-niter=<N>`	Number of batches to run. Each batch has a size specified by the `--batch-size` option. The total number of images processed is the product of the `--batch-size` option value multiplied by the `-niter` option value.
`-d=<STRING>`	Using `-d=HETERO:FPGA`, CPU causes `dla_benchmark` to use the OpenVINO™ heterogeneous plugin to execute inference on the FPGA, with fallback to the CPU for any layers that cannot go to the FPGA. Using `-d=HETERO:CPU` or -`d=CPU` executes inference on the CPU, which may be useful for testing the flow when an FPGA is not available. Using `-d=HETERO:FPGA` may be useful for ensuring that all graph layers are accelerated on the FPGA (and an error is issued if this is not possible).
`-arch_file=<FILE>` `--arch=<FILE>`	This specifies the location of the .arch file that was used to configure the IP on the FPGA. The `dla_benchmark` will issue an error if this does not match the`.arch` file used to generate the IP on the FPGA.
`-m=<FILE>` `--network_file=<FILE>`	This points to the XML file from OpenVINO™ Model Optimizer that describes the graph. The BIN file from Model Optimizer must be kept in the same directory and same filename (except for the file extension) as the XML file.
`-i=<DIRECTORY>`	This points to the directory containing the input images. Each input file corresponds to one inference request. The files are read in order sorted by filename; set the environment variable `VERBOSE=1` to see details describing the file order.
`-api=[sync\|async]`	The `-api=async` option allows `dla_benchmark` to fully take advantage of multithreading to improve performance. The `-api=sync` option may be used during debug.
`-groundtruth_loc=<FILE>`	Location of the file with ground truth data. If not provided, then `dla_benchmark` will not evaluate accuracy. This may contain classification data or object detection data, depending on the graph.
`-yolo_version=<STRING>`	This option is used when evaluating the accuracy of a YOLOv3 or TinyYOLOv3 object detection graph. The options are `yolo-v3-tf` and `yolo-v3-tiny-tf`.
`-enable_object_detection_ap`	This option may be used with an object detection graph (YOLOv3 or TinyYOLOv3) to calculate the object detection accuracy.
`-bgr`	When used, this flag indicates that the graph expects input image channel data to use BGR order.
`-plugins_xml_file=<FILE>`	Deprecated: This option is deprecated and will be removed in a future release. Use the `-plugins` option instead. This option specifies the location of the file specifying the OpenVINO™ plugins to use. This should be set to `$COREDLA_ROOT/runtime/plugins.xml` in most cases. If you are porting the design to a new host or doing other development, it may be necessary to use a different value.
`-plugins=<FILE>`	This option specifies the location of the file that specifies the OpenVINO plugins to use. The default behavior is to read the plugins.xml file from the runtime/ directory. This runs inference on the FPGA device. If you want to run inference using the emulation model, specify `-plugins=emulation`. If you are porting the design to a new host or doing other development, you might need to use a different value.
`-mean_values=<input_name[mean_values]>`	Uses channel-specific mean values in input tensor creation through the following formula: $\frac{input - mean}{scale}$ . The Model Optimizer mean values are the preferred choice and the mean values defined by this option serve as fallback values.
`-scale_values=<input_name[scale_values]>`	Uses channel-specific scale values in input tensor creation through the following formula: $\frac{input - mean}{scale}$ . The Model Optimizer scale values are the preferred choice and the scale values defined by this option serve as fallback values.
`-pc`	This option reports the performance counters for the CPU subgraphs, if there is any. No sorting is done on the report.
`-pcsort=[sort\|no_sort\|simple_sort]`	This option reports the performance counters for the CPU subgraph and sets the sorting option for the performance counter report: `sort`: Report is sorted by operation time cost `no_sort`: Report is not sorted `simple_sort`: Report is sorted by opts time cost but print only executed operations
`-save_run_summary`	Collect performance metrics during inference. These metrics can help you determine how efficient an architecture is at executing a model. For more information, refer to The dla_benchmark Performance Metrics.