Inference Model Performance Data for Intel® Gaudi® 2 Accelerator


LLaMA 2 7b	1	fp8	128	128	1,230	13,137
1	fp8	128	2,048	163	4,791
1	fp8	2,048	128	94	1,449
1	fp8	2,048	2,048	81	1,953
LLaMA 2 70b	2	fp8	128	128	1,750	2,844
4	fp8	128	2,048	327	3,211
2	fp8	2,048	128	95	310
2	fp8	2,048	2,048	78	1,438
Mistral 7b	1	fp8	128	128	896	18,023
1	fp8	128	2,048	120	11,294
1	fp8	2,048	128	120	1,427
1	fp8	2,048	2,048	44	3,947
LLaMA 3.1 8B	1	fp8	128	128	2,429	18,201
1	fp8	128	2,048	289	11,138
1	fp8	2,048	128	179	2,028
1	fp8	2,048	2,048	155	5,511

Model #HPU Precision Input Length Batch Size Latency

Model	# HPU	Precision	Input Length	Batch Size	Latency (ms)
LLaMA 2 7b	1	fp8	128	1	7.62
LLaMA 2 7b	1	fp8	2,048	1	56.31
LLaMA 2 70b	8	fp8	128	1	26.93
LLaMA 2 70b	8	fp8	2,048	1	116
LLaMA 3.1 -8B	1	fp8	128	1	8.17
LLaMA 3.1 -8B	1	fp8	2,048	1	60.52

System Configuration

Intel Gaudi 2 Platform

System: HLS-Gaudi2 with eight Intel Gaudi 2 platform HL-225H mezzanine cards, two Intel Xeon Platinum 8380 CPUs at 2.30 GHz, and 1 TB of system memory

Common Software

Ubuntu* v22.04,
Intel Gaudi software v1.18.0 (full software support details)
PyTorch: Models run with PyTorch v2.4.0

Explore Additional Resources

Get Started
Documentation
Catalog
Forum
Blog
Events

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Model Performance Data for Intel® Gaudi® 2 AI Accelerators

Large Language Models (LLM) for Throughput with Intel Gaudi 2 Accelerator

LLM for Low Latency with Intel Gaudi 2 Accelerator

System Configuration

Explore Additional Resources

Stay informed: Register for the latest Intel Gaudi AI Accelerator developer news, events, training, and updates.