Quick steps to reduce the model load time on Vision Processing Unit (VPU)
- The time to load a model to Vision Processing Unit (VPU) is longer than loading the model on CPU.
- Code using Python API: net = ie.read_network(model=path_to_xml, weights=path_to_bin) exec_net = ie.load_network(network=net, device_name="CPU") res = exec_net.infer(inputs=data)
To reduce load time, load the model from Blob, which is a parsed graph, to bypass the model parsing stage.
- Generate the Blob file in advance before loading using one the following method:
- Generate the Blob using the myriad_compile tool in the command line:
- The precompiled tool is available in the Intel® Distribution of OpenVINO™ toolkit. You can also clone the open-source OpenVINO toolkit repo and build it.
- Generate Blob.
In path inference-engine/bin/intel64/Release, execute a command as follows: ./myriad_compile -m <model_name>.xml -o <output filename>
- Generate the Blob using the myriad_compile tool in the command line:
- Import the Blob in your code using the Inference Engine Core API: executable_network = ie.ImportNetwork(“model_name.blob”, device, config)
There are two internal processes when loading a model on VPU:
- Parse graph
- Allocate graph
During the loading process, the parsed VPU graphs are sent to the hardware, stage by stage, by xlink from the host.
Loading a model from a blob can reduce lots of time for some models, but it may not work for all models.
Besides model size, the loading time is dependent on layer type, input data size, and so on.
HDDL plugin is more efficient than MYRIAD plugin when loading model from Blob.
Follow these steps to enable the HDDL plugin instead of the MYRIAD plugin on the Intel® Neural Compute Stick 2:
- Set autoboot_settings:abort_if_hw_reset_failed to false in $HDDL_INSTALL_DIR/config/hddl_autoboot.config.
- Set autoboot_settings:total_device_num to 1.
- Start hddldaemon.