Intel Gaudi, Xeon and AI PC Accelerate Meta Llama 3 GenAI Workloads

April 18, 2024 Published

An electronic circuit board with various components, including a central chip, capacitors, and connectors, is shown on a reflective surface. The board has a blue base and a compact rectangular layout.

Habana Gaudi2 Mezzanine Card (Credit: Habana Labs)

Intel’s AI products – from Gaudi and Xeon in the data center, at the edge, and to AI PCs – offer developers the latest optimizations to run Meta Llama 3, its next-generation large language model.

Listen to audio

Play

Pause

Options

0:00

-:--

Playback Speed

By GSpeech

intel gaudi, xeon and ai pc accelerate meta llama 3 genai workloads. april 18, 2024 published artificial intelligence habana gaudi2 mezzanine card (credit: habana labs). intel’s ai products – from gaudi and xeon in the data center, at the edge, and to ai pcs – offer developers the latest optimizations to run meta llama 3, its next-generation large language model. in this article: what’s new: today, meta launched meta llama 3, its next-generation large language model (llm). effective on launch day, intel has validated its ai product portfolio for the first llama 3 8b and 70b models across intel® gaudi® accelerators, intel® xeon® processors, intel® core™ ultra processors and intel® arc™ graphics. “intel actively collaborates with the leaders in the ai software ecosystem to deliver solutions that blend performance with simplicity. meta llama 3 represents the next big iteration in large language models for ai. as a major supplier of ai hardware and software, intel is proud to work with meta to take advantage of models such as llama 3 that will enable the ecosystem to develop products for cutting-edge ai applications.”. –wei li, intel vice president and general manager of ai software engineering why it matters: as part of its mission to bring ai everywhere, intel invests in the software and ai ecosystem to ensure that its products are ready for the latest innovations in the dynamic ai space. in the data center, intel gaudi and intel xeon processors with intel® advanced matrix extension (intel® amx) acceleration give customers options to meet dynamic and wide-ranging requirements. intel core ultra processors and intel arc graphics products provide both a local development vehicle and deployment across millions of devices with support for comprehensive software frameworks and tools, including pytorch and intel® extension for pytorch® used for local research and development and openvino™ toolkit for model development and inference. about the llama 3 running on intel: intel’s initial testing and performance results for llama 3 8b and 70b models use open source software, including pytorch, deepspeed, intel optimum habana library and intel extension for pytorch to provide the latest software optimizations. for more performance details, visit the intel developer blog. intel® gaudi® 2 accelerators have optimized performance on llama 2 models – 7b, 13b and 70b parameters – and now have initial performance measurements for the new llama 3 model. with the maturity of the intel gaudi software, intel easily ran the new llama 3 model and generated results for inference and fine tuning. llama 3 is also supported on the recently announced intel® gaudi® 3 accelerator. intel xeon processors address demanding end-to-end ai workloads, and intel invests in optimizing llm results to reduce latency. intel® xeon® 6 processors with performance-cores (code-named granite rapids) show a 2x improvement on llama 3 8b inference latency compared with 4th gen intel® xeon® processors and the ability to run larger language models, like llama 3 70b, under 100ms per generated token. intel core ultra and intel arc graphics deliver impressive performance for llama 3. in an initial round of testing, intel core ultra processors already generate faster than typical human reading speeds. further, the intel® arc™ a770 gpu has xe matrix extensions (xmx) ai acceleration and 16gb of dedicated memory to provide exceptional performance for llm workloads. what’s next: in the coming months, meta expects to introduce new capabilities, additional model sizes and enhanced performance. intel will continue to optimize performance for its ai products to support this new llm. more context: intel developer blog | meta llama 3 blog | llama 3. the small print:. full performance disclaimers and configurations available at: integrated intel® arc™ graphics only available on select h-series intel® core™ ultra processor-powered systems.

In this article:

What’s New: Today, Meta launched Meta Llama 3, its next-generation large language model (LLM). Effective on launch day, Intel has validated its AI product portfolio for the first Llama 3 8B and 70B models across Intel® Gaudi® accelerators, Intel® Xeon® processors, Intel® Core™ Ultra processors and Intel® Arc™ graphics.

“Intel actively collaborates with the leaders in the AI software ecosystem to deliver solutions that blend performance with simplicity. Meta Llama 3 represents the next big iteration in large language models for AI. As a major supplier of AI hardware and software, Intel is proud to work with Meta to take advantage of models such as Llama 3 that will enable the ecosystem to develop products for cutting-edge AI applications.”

–Wei Li, Intel vice president and general manager of AI Software Engineering

Why It Matters: As part of its mission to bring AI everywhere, Intel invests in the software and AI ecosystem to ensure that its products are ready for the latest innovations in the dynamic AI space. In the data center, Intel Gaudi and Intel Xeon processors with Intel® Advanced Matrix Extension (Intel® AMX) acceleration give customers options to meet dynamic and wide-ranging requirements.

Intel Core Ultra processors and Intel Arc graphics products provide both a local development vehicle and deployment across millions of devices with support for comprehensive software frameworks and tools, including PyTorch and Intel® Extension for PyTorch® used for local research and development and OpenVINO™ toolkit for model development and inference.

About the Llama 3 Running on Intel: Intel’s initial testing and performance results for Llama 3 8B and 70B models use open source software, including PyTorch, DeepSpeed, Intel Optimum Habana library and Intel Extension for PyTorch to provide the latest software optimizations. For more performance details, visit the Intel Developer Blog.

Intel® Gaudi® 2 accelerators have optimized performance on Llama 2 models – 7B, 13B and 70B parameters – and now have initial performance measurements for the new Llama 3 model. With the maturity of the Intel Gaudi software, Intel easily ran the new Llama 3 model and generated results for inference and fine tuning. Llama 3 is also supported on the recently announced Intel® Gaudi® 3 accelerator.
Intel Xeon processors address demanding end-to-end AI workloads, and Intel invests in optimizing LLM results to reduce latency. Intel® Xeon® 6 processors with Performance-cores (code-named Granite Rapids) show a 2x improvement on Llama 3 8B inference latency compared with 4th Gen Intel® Xeon® processors and the ability to run larger language models, like Llama 3 70B, under 100ms per generated token.
Intel Core Ultra and Intel Arc Graphics deliver impressive performance for Llama 3. In an initial round of testing, Intel Core Ultra processors already generate faster than typical human reading speeds. Further, the Intel® Arc™ A770 GPU has X^e Matrix eXtensions (XMX) AI acceleration and 16GB of dedicated memory to provide exceptional performance for LLM workloads.

What’s Next: In the coming months, Meta expects to introduce new capabilities, additional model sizes and enhanced performance. Intel will continue to optimize performance for its AI products to support this new LLM.

More Context: Intel Developer Blog | Meta Llama 3 Blog | Llama 3

_{The Small Print:}

_{Full performance disclaimers and configurations available at: https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-meta-llama3-with-intel-ai-solutions.html}

_{Integrated Intel® Arc™ graphics only available on select H-series Intel® Core™ Ultra processor-powered systems.}

Client Computing

Office scene with five people working at desks, viewed through a large window. One person stands reading, another works at a computer. Logos for Intel Core Ultra and Deloitte appear on the left side of the image.

Intel Core Ultra Speeds Routine Tasks by 50% for Deloitte’s AI

At Intel Vision, the consulting firm discusses its strategic investment in AI PCs and how it has driven efficiency gains.

April 3, 2025

Data Center

Two Intel Xeon processors against a blue gradient background. The left processor is labeled Intel Xeon and the right processor is labeled Intel Xeon 6 processor, both featuring metallic casings.

Intel Xeon Remains Only Server CPU on MLPerf

Intel Xeon 6 with Performance-cores achieved an average 1.9x performance improvement over 5th Gen Xeon processors.

April 2, 2025

Artificial Intelligence

Rockets to Retail: Intel Core Ultra Delivers Edge AI for Video Management

At Intel Vision, Network Optix debuts natural language prompt prototype to redefine video management, offering industries faster AI-driven insights and efficiency.

April 2, 2025

Client Computing

Two men stand in front of a tech display featuring a large monitor and two laptops. One wears a green t-shirt and jeans, the other a blue vest and shirt. The background is a tech exhibit with blue and purple lighting.

Postcard from Vision: EdgeRunner Athena Delivers AI with a Side of Security

Optimized for Intel’s newest processors, Athena serves as a hyper-personalized assistant without the need for internet connectivity.

April 1, 2025

Related Posts

Intel Core Ultra Speeds Routine Tasks by 50% for Deloitte’s AI

Intel Xeon Remains Only Server CPU on MLPerf

Rockets to Retail: Intel Core Ultra Delivers Edge AI for Video Management

Postcard from Vision: EdgeRunner Athena Delivers AI with a Side of Security