Enhance Deep Learning Workloads on the Latest Intel® Xeon® Processors
Enhance Deep Learning Workloads on the Latest Intel® Xeon® Processors
Subscribe Now
Stay in the know on all things CODE. Updates are delivered to your inbox.
Overview
The 4th generation Intel® Xeon® Scalable processors (formerly code named Sapphire Rapids) offer several built-in features for boosting performance and efficiency of deep learning applications.
This session focuses on one of them—Intel® Advanced Matrix Extensions (Intel® AMX)—and how to take advantage of its AI acceleration power to boost model training and inference using Intel optimizations for PyTorch* and TensorFlow*.
Topics covered include:
- An overview of the Intel optimizations, including performance and features on the latest Intel CPUs and how they compare to stock PyTorch and TensorFlow.
- How the optimizations reduce a memory footprint and improve performance by automatically mixing precision using bfloat16 or float16 data types.
- Using Intel® oneAPI Deep Neural Network Library (oneDNN) with Intel optimizations for PyTorch and TensorFlow to take advantage of other 4th gen Intel Xeon processor built-in acceleration features, such as Intel® Advanced Vector Extensions 512 and Vector Neural Network Instructions (VNNI)
- Reducing model inference time with quantization features in Intel® Optimization for PyTorch*
- How speedups can be gained over stock PyTorch and TensorFlow on new Amazon Web Services* instances built on Intel Xeon Scalable processors.
Skill level: Novice
Featured Software
- The Intel optimizations are available as part of the AI Tools or you can download stand-alone versions: PyTorch Optimization | TensorFlow Optimization.
- Get the stand-alone version of oneDNN or as part of the Intel® oneAPI Base Toolkit.
Code Samples
Download a variety of samples on GitHub*, including:
- Get Started with Intel® Extension for PyTorch*
- Optimize PyTorch Models Using Quantization
- PyTorch Training Optimizations with bfloat16 for Intel AMX
Accelerate data science and AI pipelines-from preprocessing through machine learning-and provide interoperability for efficient model development.
Improve deep learning (DL) application and framework performance on CPUs and GPUs with highly optimized implementations of DL building blocks.