Conversion of INT8 Models to Intermediate Representation (IR)

Summary

Model Optimization Flow with OpenVINO

Description

In the last paragraph of the Low Precision Optimization Guide, quantization-aware training is mentioned. It says this allows a user to get an accurate optimized model that can be converted to IR. However, no other details are provided.

Resolution

Quantization-Aware Training, using OpenVINO™ compatible training frameworks, supports models written on TensorFlow QAT or PyTorch NNCF, with optimization extensions.

The NNCF is a PyTorch-based framework that supports a wide range of Deep Learning models for various use cases. It also implements quantization-aware training supporting different quantization modes and settings, and supports various compression algorithms, including Quantization, Binarization, Sparsity, and Filter Pruning.

When fine-tuning finishes, the accurate optimized model can be exported to ONNX format, which can then be used by Model Optimizer to generate Intermediate Representation (IR) files and subsequently inferred with OpenVINO™ Inference Engine.

Additional information

Refer to the following articles:

Enhanced Low-Precision Pipeline to Accelerate Inference with OpenVINO Toolkit

Introducing a Training Add-on for OpenVINO™ toolkit: Neural Network Compression Framework

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Conversion of INT8 Models to Intermediate Representation (IR)

Need more help?

Disclaimer

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Conversion of INT8 Models to Intermediate Representation (IR)

Related Products

This article applies to 2 products

Need more help?

Disclaimer