Compress the Transformer: Optimize Your DistilBERT Models
Compress the Transformer: Optimize Your DistilBERT Models
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
Transformer natural language processing (NLP) models such as Bidirectional Encoder Representations from Transformers (BERT) are increasingly common, yet increasingly hard to use due to their size.
This session introduces a solution using the Intel® Neural Compressor—an open source Python* library that provides a low-precision inference interface across multiple deep learning frameworks.
Topics covered include:
- A comprehensive overview of neural networks and the most popular optimization techniques.
- A demo on how to use Intel Neural Compressor to optimize a transformer model through quantization on the latest Intel® Xeon® Scalable processors.
- How these optimizations significantly reduce model size and accelerate inference performance with minimal impact on accuracy.
The presentation includes a demo.
Skill level: Intermediate
Featured Software
Download the stand-alone version of Intel Neural Compressor or as part of the AI Tools.
Code Samples
Download AI analytics samples from GitHub* that includes:
Language Identification–How to train a model to perform language identification using the SpeechBrain speech toolkit from Hugging Face*.
Accelerate data science and AI pipelines-from preprocessing through machine learning-and provide interoperability for efficient model development.
You May Also Like
Related Articles & Blogs
On-Demand Webinars