Model# HPUSequence Length PrecisionBatch SizeThroughput (tokens/sec)
LLaMA V2 7B84,096FP81,02468,464
LLaMA V2 13B164,096FP825658,282
LLaMA V2 70B644,096FP81,02454,274
LLaMA V3.1 8B88,192FP812836,309
LLaMA V3.1 70B648,192FP812843,677

Model#HPUPrecisionTime To TrainFrameworks Version
MLPerf 3.1 - GPT3384fp8153.58 min† 
MLPerf 3.1 - GPT3256fp8223.75 min‡ 
MLPerf 3.1 - Stable Diffusion v264bf1619.4 min‡Lightning 2.1.2
MLPerf 3.1 - ResNet8bf1616.4 min 
MLPerf 3.1 - BERT8bf1615.01 min