Model Performance Data for Intel® Gaudi® 3 AI Accelerators
These performance numbers are measured using the latest SynapseAI* software release version 1.18.0, unless otherwise noted.
Note All models for both training and inference are using the PyTorch* 2.4.0 framework. Other applicable frameworks used for training or inference are noted for each model.
INFERENCE | TRAINING
Large Language Models (LLM) for Intel Gaudi 3 Accelerator
Model | # HPU | Sequence Length | Precision | Batch Size | Throughput (tokens/sec) |
---|---|---|---|---|---|
LLaMA V2 7B | 8 | 4,096 | FP8 | 1,024 | 99,225 |
LLaMA V2 13B | 16 | 4,096 | FP8 | 256 | 89,946 |
LLaMA V2 70B | 64 | 4,096 | FP8 | 1,024 | 78,082 |
LLaMA V3.1 8B | 8 | 8,192 | FP8 | 128 | 59,359 |
LLaMA V3.1 70B | 64 | 8,192 | FP8 | 128 | 61,153 |
System Configuration
Intel Gaudi 3 Platform
System: HLS-Gaudi2 with eight Intel Gaudi 2 platform HL-225H mezzanine cards, two Intel Xeon Platinum 8380 CPUs at 2.30 GHz, and 1 TB of system memory
Common Software
- Ubuntu* v22.04,
- Intel Gaudi software v1.18.0 (full software support details)
- PyTorch: Models run with PyTorch v2.4.0
Stay Informed
Register for the latest Intel Gaudi AI accelerator developer news, events, training, and updates.