Run OpenVINO™ Benchmarking Tool
This tutorial tells you how to run the benchmark application on an 11th Generation Intel® Core™ processor with an integrated GPU. It uses the asynchronous mode to estimate deep learning inference engine performance and latency.
Start Docker* Container
Go to the AMR_containers folder:
cd <edge_insights_for_amr_path>/Edge_Insights_for_Autonomous_Mobile_Robots_<version>/AMR_containers
Start the Docker container as root:
./run_interactive_docker.sh eiforamr-full-flavour-sdk:<TAG> root
Set Environment Variables
The environment variables must be set before you can compile and run OpenVINO™ applications.
Run the following script:
source /opt/intel/openvino/bin/setupvars.sh --or-- source <OPENVINO_INSTALL_DIR>/bin/setupvars.sh
Build Benchmark Application
Change directory and build the benchmark application using the cmake script file using the following commands:
cd /opt/intel/openvino/inference_engine/samples/cpp ./build_samples.sh
Once the build is successful, access the benchmark application in the following directory:
cd /root/inference_engine_cpp_samples_build/intel64/Release -- or -- cd <INSTALL_DIR>/inference_engine_cpp_samples_build/intel64/Release
The benchmark_app application is available inside the Release folder.
Input File
Select an image file or a sample video file to provide an input to the benchmark application from the following directory:
cd /root/inference_engine_cpp_samples_build/intel64/Release
Application Syntax and Options
The benchmark application syntax is as follows:
./benchmark_app [OPTION]
In this tutorial, we recommend you select the following options:
./benchmark_app -m <model> -i <input> -d <device> -nireq <num_reqs> -nthreads <num_threads> -b <batch> where: <model>-------------The complete path to the model .xml file <input>-------------The path to the folder containing image or sample video file. <device>------------The device type can be GPU or CPU etc., <num_reqs>----------No of parallel inference requests <num_threads>-------No of threads to use for inference on the CPU (throughput mode) <batch>-------------Batch size
For complete details on the available options, run the following command:
./benchmark_app -h
Run the Application
The benchmark application is executed as seen below. This tutorial uses the following settings:
Benchmark application is executed on frozen_inference_graph model.
Number of parallel inference requests is set as 8.
Number of CPU threads to use for inference is set as 8.
Device type is GPU.
./benchmark_app -d GPU -i ~/<dir>/input/ -m /home/eiforamr/workspace/object_detection/src/object_detection/models/ssd_mobilenet_v2_coco/frozen_inference_graph.xml -nireq 8 -nthreads 8 ./benchmark_app -d GPU -i /home/eiforamr/data_samples/media_samples/plates_720.mp4 -m /home/eiforamr/workspace/object_detection/src/object_detection/models/ssd_mobilenet_v2_coco/frozen_inference_graph.xml -nireq 8 -nthreads 8
Expected output:
[Step 1/11] Parsing and validating input arguments [ INFO ] Parsing input parameters [ INFO ] Files were added: 1 [ INFO ] /home/eiforamr/data_samples/media_samples/plates_720.mp4 [Step 2/11] Loading Inference Engine [ INFO ] InferenceEngine: API version ............ 2.1 Build .................. 2021.2.0-1877-176bdf51370-releases/2021/2 Description ....... API [ INFO ] Device info: GPU clDNNPlugin version ......... 2.1 Build ........... 2021.2.0-1877-176bdf51370-releases/2021/2 [Step 3/11] Setting device configuration [ WARNING ] -nstreams default value is determined automatically for GPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README. [Step 4/11] Reading network files [ INFO ] Loading network files [ INFO ] Read network took 89.49 ms [Step 5/11] Resizing network to match image sizes and given batch [ INFO ] Network batch size: 1 [Step 6/11] Configuring input of the model [Step 7/11] Loading the model to the device [ INFO ] Load network took 44714.68 ms [Step 8/11] Setting optimal runtime parameters [Step 9/11] Creating infer requests and filling input blobs with images [ INFO ] Network input 'image_tensor' precision U8, dimensions (NCHW): 1 3 300 300 [ WARNING ] No supported image inputs found! Please check your file extensions: bmp, dib, jpeg, jpg, jpe, jp2, png, pbm, pgm, ppm, sr, ras, tiff, tif [ INFO ] Infer Request 0 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 1 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 2 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 3 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 4 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 5 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 6 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [ INFO ] Infer Request 7 filling [ INFO ] Fill input 'image_tensor' with random values (image is expected) [Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests using 2 streams for GPU, limits: 60000 ms duration) [ INFO ] First inference took 10.01 ms [Step 11/11] Dumping statistics report Count: 9456 iterations Duration: 60066.11 ms Latency: 51.33 ms Throughput: 157.43 FPS
Benchmark Report
Sample execution results using an 11th Gen Intel® Core™ i7-1185GRE @ 2.80 GHz.
Read network time (ms) |
89 |
Load network time (ms) |
44714.68 |
First inference time (ms) |
10.01 |
Total execution time (ms) |
60066.11 |
Total num of iterations |
9456 |
Latency (ms) |
51.33 |
Throughput (FPS) |
157.43 |
Troubleshooting
For general robot issues, go to: Troubleshooting for Robot Tutorials.