TensorFlow* Optimizations from Intel
Production Performance for AI and Machine Learning
Accelerate TensorFlow Training and Inference on Intel® Hardware
TensorFlow* is an open source AI and machine learning platform used widely for production AI development and deployment. Often these applications require deep neural networks and extremely large datasets, which can become compute bottlenecks.
Intel releases its newest optimizations and features in Intel® Extension for TensorFlow* before upstreaming them into open source TensorFlow.
With a few lines of code, you can extend TensorFlow to:
- Take advantage of the most up-to-date Intel software and hardware optimizations for TensorFlow.
- Speed up TensorFlow-based training and inference turnaround times on Intel hardware.
- Extend TensorFlow to further accelerate performance on Intel CPU and GPU hardware.
Intel also works closely with the open source TensorFlow project to optimize the TensorFlow framework for Intel hardware. These optimizations for TensorFlow, along with the extension, are part of the end-to-end suite of Intel® AI and machine learning development tools and resources.
Download the AI Tools
TensorFlow and Intel Extension for TensorFlow are available in the AI Tools Selector, which provides accelerated machine learning and data analytics pipelines with optimized deep learning frameworks and high-performing Python* libraries.
Download the Stand-Alone Version
Stand-alone versions of TensorFlow and Intel Extension for TensorFlow are available. You can install them using a package manager or build from the source.
Features
Open Source TensorFlow Powered by Optimizations from Intel
- Accelerate AI performance with Intel® oneAPI Deep Neural Network Library (oneDNN) features such as graph optimizations and memory pool allocation.
- Automatically use Intel® Deep Learning Boost instruction set features to parallelize and accelerate AI workloads.
- Reduce inference latency for models deployed using TensorFlow Serving.
- Starting with TensorFlow 2.9, take advantage of oneDNN optimizations automatically.
- Enable optimizations by setting the environment variable TF_ENABLE_ONEDNN_OPTS=1 in TensorFlow 2.5 through 2.8.
Intel® Extension for TensorFlow*
- Plug into TensorFlow 2.10 or later to accelerate training and inference on Intel GPUs with no code changes.
- Automatically mix precision using bfloat16 or float16 data types to reduce memory footprint and improve performance.
- Use TensorFloat-32 (TF32) math mode on Intel GPU hardware.
- Optimize CPU performance settings for latency or throughput using an autotuned CPU launcher.
- Perform more aggressive fusion through the oneDNN Graph API.
Optimized Deployment with OpenVINO™ Toolkit
- Import your TensorFlow model into OpenVINO™ Runtime and use the Neural Networks Compression Framework (NNCF) to compress model size and increase inference speed.
- Deploy with OpenVINO model server for optimized inference, accessed via the same API as TensorFlow Serving.
- Target a mix of Intel CPUs, GPUs (integrated or discrete), NPUs, or FPGAs.
- Deploy on-premise and on-device, in the browser, or in the cloud.
Access the latest AI benchmarks for TensorFlow and OpenVINO toolkit when running on data center products from Intel. Performance Data
Documentation & Code Samples
- TensorFlow Documentation
- Intel Extension for TensorFlow
- Get Started with TensorFlow in Docker* Containers:
Code Samples
- Get Started with TensorFlow
- Optimize a Pretrained Model for Inference
- Analyze TensorFlow Performance
- Train a BERT Model for Text Classification
- Speed Up Inference of Inception v4 by Advanced Automatic Mixed Precision
- Quantize Inception v3 by Intel Extension for TensorFlow on Intel® Xeon® Processors
- Perform Stable Diffusion Inference on Intel GPUs
- Accelerate ResNet*-50 Training with XPUAutoShard on Intel GPUs
More Intel Extension for TensorFlow Samples
Demonstrations
Accelerate TensorFlow Machine Learning Performance Using Intel® AMX
Learn how to accelerate training and inference on 4th and 5th generation Intel® Xeon® Scalable processors by enabling mixed precision to take advantage of Intel® Advanced Matrix Extensions (Intel® AMX).
Get Better TensorFlow Performance on CPUs and GPUs
Learn about Intel Extension for TensorFlow, including the built-in optimizations and how to get started. Analyze performance bottlenecks by examining GPU kernel and data type usage profiles.
Use New Features in Intel Extension for TensorFlow on CPUs and GPUs
Learn about the latest optimization features and how to use them to get the most performance from CPUs and GPUs, how to use OpenXLA (Accelerated Linear Algebra) compilation, and how to switch the TensorFlow back end on CPUs.
News
Enable Mixed Precision in TensorFlow v2.13 through an Environment Variable
Take advantage of Intel AMX with TensorFlow by setting an environment variable to automatically mix in smaller word-length parameters without affecting accuracy.
Accelerate TensorFlow on Intel® Data Center GPU Flex Series
Google* and Intel coarchitected PluggableDevice, a mechanism that lets hardware vendors add device support by using plug-in packages that can be installed alongside TensorFlow. Intel Extension for TensorFlow is the newest PluggableDevice.
Meituan* Optimizes TensorFlow
China's leading e-commerce platform for lifestyle services boosted distributed scalability more than tenfold in its recommendation system scenarios.
Specifications
Processor:
- Intel Xeon Scalable processor
- Intel® Core™ processor
- Intel GPU
Operating systems:
- Linux*
- Windows*
Languages:
- Python
- C++
Deploy TensorFlow models to a variety of devices and operating systems with Intel® Distribution of OpenVINO™ Toolkit.
Get Help
Your success is our success. Access these support resources when you need assistance.
Stay Up to Date on AI Workload Optimizations
Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows. Take a chance and subscribe. You can change your mind at any time.