Max Throughput [TpS - higher is better]
Model# HPUSequence Length PrecisionBatch SizeThroughput (tokens/sec)
LLaMA V2 7B84096FP8102470523
LLaMA V2 13B164096FP825659397
LLaMA V2 70B644096FP8102454614
LLaMA V3.1 8B88192FP812837440
LLaMA V3.1 70B648192FP812843332

Model#HPUPrecisionTime To TrainFrameworks Version
MLPerf 3.1 - GPT3384fp8153.58 min† 
MLPerf 3.1 - GPT3256fp8223.75 min‡ 
MLPerf 3.1 - Stable Diffusion v264bf1619.4 min‡Lightning 2.1.2
MLPerf 3.1 - ResNet8bf1616.4 min 
MLPerf 3.1 - BERT8bf1615.01 min