CPU has more compute power than Intel® NCS2 so it is expected to run faster when inferencing the same model
- Used Intel® Core™ i7 processor for running benchmark_app.py with -m model.xml, with random input generated
- Performance on NCS2 is slower than CPU:
For NCS2:
[ INFO ] First inference took 33.88 ms
[Step 11/11] Dumping statistics report
Count: 2596 iterations
Duration: 60141.63 ms
Latency: 92.60 ms
Throughput: 5525.09 FPSFor CPU:
[ INFO ] First inference took 17.07 ms
[Step 11/11] Dumping statistics report
Count: 148124 iterations
Duration: 60001.79 ms
Latency: 1.61 ms
Throughput: 315988.43 FPS
The performance of the CPU is expected to be better compared to Intel® NCS2 since CPU has more computing power.
Intel® NCS2 is an accelerator device that would help in certain situations, especially when additional computing power is required.
Additionally, CPU requires FP32 model format while Intel® NCS2 requires FP16 model format. FP16 might have a Quantization Error since it is squeezed from a full precision model to make it smaller. This would affect accuracy and performance.
Performance means how fast the model is in deployment with two key metrics: latency and throughput.
In OpenVINO™, there are two approaches to enhance performance:
During development: Post-training Optimization tool (POT), Neural Network Compression Framework (NNCF), Model Optimizer.
During deployment: tuning inference parameters and optimizing model execution.
it is possible to combine both approaches.