Models

Model Performance Data for Intel® Gaudi® 2 AI Accelerators

These performance numbers are measured using the latest SynapseAI* software release version 1.19.0, unless otherwise noted.

Note All models for both training and inference are using the PyTorch* 2.5.1 framework. Other applicable frameworks used for training or inference are noted for each model.

Explore Intel Gaudi 3 Accelerator Performance Data

INFERENCE | TRAINING

Large Language Model (LLM) Throughput: Intel Gaudi 2 Accelerator

Model #HPU Sequence Length Precision Batch Size Throughput

Max Throughput [TpS - higher is better]
Model	# HPU	Sequence Length	Precision	Batch Size	Throughput (tokens/sec)
LLaMA V2 7B	8	4096	FP8	1024	70523
LLaMA V2 13B	16	4096	FP8	256	59397
LLaMA V2 70B	64	4096	FP8	1024	54614
LLaMA V3.1 8B	8	8192	FP8	128	37440
LLaMA V3.1 70B	64	8192	FP8	128	43332

Intel Gaudi 2 Accelerator with MLPerf* 3.1 Training Performance

Model #HPU Precision Time To Train Framework Version

Model	#HPU	Precision	Time To Train	Frameworks Version
MLPerf 3.1 - GPT3	384	fp8	153.58 min†
MLPerf 3.1 - GPT3	256	fp8	223.75 min‡
MLPerf 3.1 - Stable Diffusion v2	64	bf16	19.4 min‡	Lightning 2.1.2
MLPerf 3.1 - ResNet	8	bf16	16.4 min
MLPerf 3.1 - BERT	8	bf16	15.01 min

†The GPT3 measurement with 384 cards was taken using a prelaunch version of the SynapseAI 1.13.0 software stack.

‡ The GPT measurement with 256 cards and Stable Diffusion* were taken using the SynapseAI 1.13.0 software stack.

System Configuration

Intel Gaudi 2 Platform

System: HLS-Gaudi2 with eight Intel Gaudi 2 platform HL-225H mezzanine cards, two Intel Xeon Platinum 8380 CPUs at 2.30 GHz, and 1 TB of system memory

Common Software

Ubuntu* v22.04,
Intel Gaudi software v1.19.0 (full software support details)
PyTorch: Models run with PyTorch v2.5.1

Stay Informed

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in