Slower Inferencing Performance on Intel® Neural Compute Stick 2...

Summary

CPU has more compute power than Intel® NCS2 so it is expected to run faster when inferencing the same model

Description

Used Intel® Core™ i7 processor for running benchmark_app.py with -m model.xml, with random input generated
Performance on NCS2 is slower than CPU:
For NCS2:
[ INFO ] First inference took 33.88 ms
[Step 11/11] Dumping statistics report
Count: 2596 iterations
Duration: 60141.63 ms
Latency: 92.60 ms
Throughput: 5525.09 FPS

For CPU:
[ INFO ] First inference took 17.07 ms
[Step 11/11] Dumping statistics report
Count: 148124 iterations
Duration: 60001.79 ms
Latency: 1.61 ms
Throughput: 315988.43 FPS

Resolution

The performance of the CPU is expected to be better compared to Intel® NCS2 since CPU has more computing power.

Intel® NCS2 is an accelerator device that would help in certain situations, especially when additional computing power is required.

Additionally, CPU requires FP32 model format while Intel® NCS2 requires FP16 model format. FP16 might have a Quantization Error since it is squeezed from a full precision model to make it smaller. This would affect accuracy and performance.

Performance means how fast the model is in deployment with two key metrics: latency and throughput.

In OpenVINO™, there are two approaches to enhance performance:

During development: Post-training Optimization tool (POT), Neural Network Compression Framework (NNCF), Model Optimizer.

During deployment: tuning inference parameters and optimizing model execution.

it is possible to combine both approaches.

Additional information

OpenVINO™ Performance Benchmark Results

Choose FP16, FP32 or int8 for Deep Learning Models

Performance Optimization Guide

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Slower Inferencing Performance on Intel® Neural Compute Stick 2 (Intel® NCS2) Compared to CPU

Need more help?

Disclaimer