Model Performance Data for Intel® Gaudi® 2 AI Accelerators
These performance numbers are measured using the latest SynapseAI* software release version 1.18.0, unless otherwise noted.
Note All models for both training and inference are using the PyTorch* 2.4.0 framework. Other applicable frameworks used for training or inference are noted for each model.
Large Language Models (LLM) for Throughput with Intel Gaudi 2 Accelerator
Model | # HPU | Precision | Input Length | Output Length | Batch Size | Throughput (tokens/sec) |
---|---|---|---|---|---|---|
LLaMA 2 7b | 1 | fp8 | 128 | 128 | 1,230 | 13,137 |
1 | fp8 | 128 | 2,048 | 163 | 4,791 | |
1 | fp8 | 2,048 | 128 | 94 | 1,449 | |
1 | fp8 | 2,048 | 2,048 | 81 | 1,953 | |
LLaMA 2 70b | 2 | fp8 | 128 | 128 | 1,750 | 2,844 |
4 | fp8 | 128 | 2,048 | 327 | 3,211 | |
2 | fp8 | 2,048 | 128 | 95 | 310 | |
2 | fp8 | 2,048 | 2,048 | 78 | 1,438 | |
Mistral 7b | 1 | fp8 | 128 | 128 | 896 | 18,023 |
1 | fp8 | 128 | 2,048 | 120 | 11,294 | |
1 | fp8 | 2,048 | 128 | 120 | 1,427 | |
1 | fp8 | 2,048 | 2,048 | 44 | 3,947 | |
LLaMA 3.1 8B | 1 | fp8 | 128 | 128 | 2,429 | 18,201 |
1 | fp8 | 128 | 2,048 | 289 | 11,138 | |
1 | fp8 | 2,048 | 128 | 179 | 2,028 | |
1 | fp8 | 2,048 | 2,048 | 155 | 5,511 |
LLM for Low Latency with Intel Gaudi 2 Accelerator
Model | # HPU | Precision | Input Length | Batch Size | Latency (ms) |
---|---|---|---|---|---|
LLaMA 2 7b | 1 | fp8 | 128 | 1 | 7.62 |
1 | fp8 | 2,048 | 1 | 56.31 | |
LLaMA 2 70b | 8 | fp8 | 128 | 1 | 26.93 |
8 | fp8 | 2,048 | 1 | 116 | |
LLaMA 3.1 -8B | 1 | fp8 | 128 | 1 | 8.17 |
1 | fp8 | 2,048 | 1 | 60.52 |
System Configuration
Intel Gaudi 2 Platform
System: HLS-Gaudi2 with eight Intel Gaudi 2 platform HL-225H mezzanine cards, two Intel Xeon Platinum 8380 CPUs at 2.30 GHz, and 1 TB of system memory
Common Software
- Ubuntu* v22.04,
- Intel Gaudi software v1.18.0 (full software support details)
- PyTorch: Models run with PyTorch v2.4.0