Intel® Xeon® Scalable Processors and Intel® Advanced Matrix Extensions
Deep learning workloads, such as those that that rely on generative AI, large language models (LLMs), and computer vision, can be incredibly compute intensive, requiring high levels of performance and, often, additional specialized hardware to ensure successful AI deployment. The associated costs of these requirements can quickly escalate, and adding discrete hardware solutions can create unnecessary layers of complexity and compatibility issues.
To help make your deep learning workloads more efficient and cost-effective and easier to train and deploy, Intel® AMX on Intel® Xeon® Scalable processors delivers acceleration for inferencing and training while minimizing the need for specialized hardware.
Intel® AMX is one of two Intel® AI Engines integrated into 4th Gen Intel Xeon, 5th Gen Intel Xeon, and Intel® Xeon® 6 processors with P-cores, that can help you make the most of your CPU to power AI training and inferencing workloads at scale for benefits including improved efficiency, reduced inferencing, training, and deployment costs, and lower total cost of ownership (TCO). As a built-in accelerator that resides on each CPU core and placed near system memory, Intel® AMX is often less complex to use than discrete accelerators, leading to faster time to value.
While there are many ways organizations can support advanced AI workloads, a foundation based on Intel® Xeon® Scalable processors with powerful, integrated AI accelerators can help you achieve your training and inferencing performance objectives while reducing system complexity and deployment and operational costs for greater business return.
How Intel® AMX Works
Intel® AMX is a dedicated hardware block found on the Intel® Xeon® Scalable processor core that helps optimize and accelerate deep learning training and inferencing workloads that rely on matrix math.
Intel® AMX enables AI workloads to run on the CPU instead of offloading them to a discrete accelerator, providing a significant performance boost.2 Its architecture supports BF16 (training/inference) and int8 (inference) data types and includes two main components:
- Tiles: These consist of eight two-dimensional registers, each 1 kilobyte in size, that store large chunks of data.
- Tile Matrix Multiplication (TMUL): TMUL is an accelerator engine attached to the tiles that performs matrix-multiply computations for AI.
Together, these components enable Intel® AMX to store more data in each core and compute larger matrices in a single operation. Additionally, Intel® AMX is architected to be fully extensible and scalable.
Benefits of Intel® AMX for Better Business Outcomes
Intel® AMX enables Intel® Xeon® Scalable processors to boost the performance of deep learning training and inferencing workloads by balancing inference, the most prominent use case for a CPU in AI applications, with more capabilities for training.
Many Intel customers are taking advantage of Intel® AMX to enable better outcomes for their organizations. Focusing on GenAI workloads, Intel® Xeon® 6 processors with P-cores can deliver 2x higher GPT-J-6B (bf16) performance vs. 5th Gen Intel Xeon 3. On 5th Gen Intel® Xeon® processors, customers can experience up to 14x better training and inference vs. 3rd Gen Intel® Xeon® processors.4
Primary benefits of Intel® AMX include:
- Improved performance
CPU-based acceleration can improve power and resource utilization efficiencies, giving you better performance for the same price.
For example, 5th Gen Intel® Xeon® Platinum 8592+ with Intel® AMX BF16 has shown up to 10.7x higher real-time speech recognition inference performance (RNN-T) and 7.9x higher performance/watt vs. 3rd Gen Intel® Xeon® processors with FP32.5 - Reduced total cost of ownership (TCO)
Intel® Xeon® Scalable processors with Intel® AMX enable a range of efficiency improvements that help with decreasing costs, lowering TCO, and advancing sustainability goals.
As an integrated accelerator on Intel® Xeon® Scalable processors that you may already own, Intel® AMX enables you to maximize the investments you’ve already made and get more from your CPU, removing the cost and complexity typically associated with the addition of a discrete accelerator.
Intel® Xeon® Scalable processors with Intel® AMX can also provide a more cost-efficient server architecture compared to other available options, delivering both power and emission reduction benefits.
In a comparison with AMD Genoa 9654 servers, 5th Gen Intel® Xeon® Platinum processors with Intel® AMX delivered up to 2.69x higher batched Natural Language Processing inference (BERT-Large) performance and 2.96x higher performance per watt.6 - Reduced development time
To simplify the process of developing deep learning applications, we work closely with the open source community, including the TensorFlow and PyTorch projects, to optimize frameworks for Intel® hardware, upstreaming our newest optimizations and features so they’re immediately available to developers. This enables you to take advantage of the performance benefits of Intel® AMX with the addition of a few lines of code, reducing overall development time.
We also provide access to free Intel® development tools, libraries, and resources.
Intel® AMX Deep learning Use Cases
Intel® AMX can be deployed in a wide range of deep learning use cases to provide a significant performance boost that results in greater end user and business value.
- Recommender systems: Use Intel® AMX as a more cost-effective solution for AI recommender models that boost the responsiveness of product, content, and service recommendations for use cases, including e-commerce, social media, streaming entertainment, and personalized banking. For example, content providers often use Intel® AMX to accelerate delivery of targeted movie or book recommendations and ads or to deliver a deep learning‒based recommender system that accounts for real-time user behavior signals and context features such as time and location in near-real time. 5th Gen Intel® Xeon® processors are delivering up to 8.7x higher batch Recommendation System inference performance (DLRM) and 6.2x higher performance/watt vs. 3rd Gen Intel® Xeon® processors with FP32.7
- Natural language processing (NLP): Accelerate text-based use cases to support and scale NLP applications, such as those used in healthcare and life sciences to extract insights from clinical notes or process large amounts of medical data to help with early detection of health issues and improve care delivery. In financial services, Intel® AMX can be used to improve online chatbot responsiveness to help connect customers with the information they need more quickly while freeing limited staff up to address more-complex requests.
Similar to the cost savings benefits for recommender systems, Intel® AMX can be a more cost-effective solution for NLP. For example, when used to deploy the BERT-Large AI Natural Language model, Intel® AMX on 4th Gen Intel® Xeon® processors provided up to 79 percent savings when compared to AMD Genoa 9354.8 - Generative AI: Leverage Intel® AMX to accelerate the performance of deep learning training and inference workloads for generative AI use cases such as content generation, including images, videos, and audio, language translation, data augmentation, and summarization. For example, a performance evaluation of Intel® Xeon® Platinum 8480+ processors with Intel® AMX for BF16 data types compared to Intel® Xeon® Platinum 8380 processors for FP32 data types reduced Stable Diffusion text to image generation time to less than five seconds and fine-tuning of Stable Diffusion models to less than five minutes.9
- Computer vision: Reduce the time from video and image capture to insight and action to deliver exceptional customer experiences and help your business improve efficiency and reduce operational costs. For example, in retail stores, Intel® AMX can help minimize transaction time for customers using computer vision‒enabled frictionless checkout and support near-real-time monitoring of shelves to track inventory data and instantly notify staff when an item is out of stock. In manufacturing, accelerated analysis of video from computer vision cameras on robotic arms can help enable time and cost savings with automated defect detection capabilities.
To find additional examples of how Intel® customers are using Intel® AMX to drive better business outcomes, visit our customer spotlight library.
Get Started with Intel® AMX
We offer a wide variety of development resources to help you take advantage of the integrated Intel® AMX accelerator in your Intel® Xeon® Scalable processors.
To get started, review step-by-step instructions for boosting performance with Intel® AMX in the following guides:
- Intel® AI Optimizations Quick Start Guide: Provides directions for improving AI workload performance with Intel® Optimized AI Libraries and Frameworks. This guide includes step-by-step instructions for TensorFlow, XGBoost, PyTorch, and more.
- Tuning guide for improving deep learning AI performance: offers recommendations for tuning processors for Intel® optimized AI toolkits to achieve the best performance possible.
For more in-depth technical information, tutorials, code examples, and testing modules, access:
- Intel® AMX AI frameworks
- Intel® AMX AI reference kits
- Intel® AMX developer reference guide
- Intel® AMX code sample
You can access all of our tuning guides for Intel® Xeon® Scalable processors in our developer software tools catalog.
To help you streamline your AI development efforts, we offer our Intel® oneAPI Toolkits, components, and optimizations, including:
- Intel® oneAPI AI Analytics Toolkit
- Intel® oneAPI Math Kernel Library
- Intel® Extension for TensorFlow
- PyTorch Optimizations from Intel
Experiment with Intel® AMX Today
In addition to consulting our reference materials, you can experiment with Intel® hardware, Intel® AMX, and other integrated acceleration features using Intel® Developer Cloud.
This free online platform for learning, prototyping, testing, and running workloads also includes support for a number of Intel® software development toolkits, tools, and libraries.
Expand and Enhance AI Capabilities on Your CPU with Intel® AMX
As your organization looks for solutions to meet growing compute demands to support deep learning training and inferencing workloads, Intel® AMX can help boost performance using the Intel® hardware you may already own, without the cost and complexity that comes with additional specialized hardware and in comparably less development time, using Intel® optimizations in popular open source frameworks and access to free Intel® development tools and resources.