Compress the Transformer: Optimize Your DistilBERT Models

Compress the Transformer: Optimize Your DistilBERT Models

Subscribe Now

Stay in the know on all things CODE. Updates are delivered to your inbox.

Overview

Transformer natural language processing (NLP) models such as Bidirectional Encoder Representations from Transformers (BERT) are increasingly common, yet increasingly hard to use due to their size.

This session introduces a solution using the Intel® Neural Compressor—an open source Python* library that provides a low-precision inference interface across multiple deep learning frameworks.

Topics covered include:

A comprehensive overview of neural networks and the most popular optimization techniques.
A demo on how to use Intel Neural Compressor to optimize a transformer model through quantization on the latest Intel® Xeon® Scalable processors.
How these optimizations significantly reduce model size and accelerate inference performance with minimal impact on accuracy.

The presentation includes a demo.

Skill level: Intermediate

Featured Software

Download the stand-alone version of Intel Neural Compressor or as part of the AI Tools.

Code Samples

Download AI analytics samples from GitHub* that includes:

Language Identification–How to train a model to perform language identification using the SpeechBrain speech toolkit from Hugging Face*.

Jump to:

You May Also Like

AI Tools

Accelerate data science and AI pipelines-from preprocessing through machine learning-and provide interoperability for efficient model development.

Get It Now

You May Also Like

Related Articles & Blogs

Quicken Text Classification by 16.47x with Intel Neural Compressor

Achieve a 4x Increase in Dynamic Neural Architecture Search

Deep Learning Model Optimizations Made Easy (or at Least Easier)

PyTorch* Inference Acceleration with Intel Neural Compressor

Optimize End-to-End AI Pipelines

On-Demand Webinars

Optimize Transformer Models with Intel and Hugging Face Tools

<link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/commons-page.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/commons-page.min.js" defer></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.css" type="text/css"><script src="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/atomVideo.min.js"></script>

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/colorBlock.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/colorBlock.min.css" type="text/css">

<link rel="preload" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/contact-us.min.css" as="style"><link rel="stylesheet" href="/etc.clientlibs/settings/wcm/designs/ver/241115/intel/clientlibs/pages/contact-us.min.css" type="text/css">

<script>!function(){var e=setInterval(function(){"undefined"!=typeof $CQ&&($CQ(function(){CQ_Analytics.SegmentMgr.loadSegments("/etc/segmentation"),CQ_Analytics.ClientContextUtils.init("/etc/clientcontext/intel",window.location.pathname.substr(0,window.location.pathname.indexOf(".")))}),clearInterval(e))},100)}();</script>

<link rel="preload" as="style" href="/etc.clientlibs/settings/wcm/designs/intel/us/en/css/resources/css/intel.rwd.override.css"/>
<link rel="stylesheet" type="text/css" href="/etc.clientlibs/settings/wcm/designs/intel/us/en/css/resources/css/intel.rwd.override.css"/>