Model#HPUPrecisionTime To TrainFrameworks Version
MLPerf 3.1 - GPT3384fp8153.58 min** 
MLPerf 3.1 - GPT3256fp8223.75 min† 
MLPerf 3.1 - Stable Diffusion v264bf1619.4 min†Lightning 2.1.2
MLPerf 3.1 - ResNet8bf1616.4 min‡ 
MLPerf 3.1 - BERT8bf1615.01 min‡ 

Model#HPUPrecisionThroughputSequence LengthTP,PP,DPBatch SizeFramework Version
LLaMA 2 7B8FP868439 tokens/sec4,0961,1,81,024Megatron DeepSpeed PR #372
LLaMA 2 13B16FP852428 tokens/sec4,0962,2,4256Megatron DeepSpeed PR #372
LLaMA 2 70B64FP852838 tokens/sec4,0968,2,41,024Megatron DeepSpeed PR #372
LLaMA 2 70B**256bf16137625 tokens/sec4,0968,8,41,024Megatron DeepSpeed PR #307
LLaMA 2 70B**512bf16226918 tokens/sec4,0968,8,82048Megatron DeepSpeed PR #307
LLaMA 2 70B**1024bf16427622 tokens/sec4,0968,8,164096Megatron DeepSpeed PR #307

 

Model#HPUPrecisionThroughputAccTTTBatchFramework Version
Llama 2 13B16bf1610 samples/sec  256DeepSpeed 0.14.0
Llama 2 70B64bf168.88 samples/sec  1024DeepSpeed 0.14.0
Llama 2 70B64FP812.9 samples/sec  1024DeepSpeed 0.14.0
Stable Diffusion64bf1611145.8 img/sec  32Lightning 2.2.4
Stable Diffusion Fine Tuning**1bf1671 img/sec  7Lightning 2.2.4
Stable Diffusion Fine Tuning Textual Inversion**1bf1620.9 img/sec  7Lightning 2.2.4
ResNet50 LARS32bf1618399 img/sec76.157.81 min256 
ResNet50 LARS8bf1647070 img/sec76.1418.98 min256 
ResNet50 LARS1bf166233 img/sec  256 
BERT Pre Training Phase 132bf1632450 sent/sec 254 min64 
BERT Pre Training Phase 18bf169218 sent/sec0 64 
BERT Pre Training Phase 11bf161178 sent/sec  64 
BERT Pre Training Phase 232bf1610861 sent/sec080.21 min16 
BERT Pre Training Phase 28bf162777.5 sent/sec0 16 
BERT Pre Training Phase 21bf16351 sent/sec  16 
BERT SQUAD Fine Tuning8bf162075 sent/sec90.644.68 min24 
BERT SQUAD Fine Tuning1bf16285 sent/sec  24 
ResNext1018bf1622184 img/sec77.93100 min256 
ResNext1011bf162853 img/sec  256 
SSD8bf1614651 img/sec23.0210.3 min128 
SSD1bf162140 img/sec  128 
Transformer8bf161110435 token/sec27.8241.73 min8,192 
Transformer1bf16138173.66 token/sec  8,192 
Unet2D (torch.compile)8bf1619938.29 img/sec72.6612.55 min64Lightning 2.2.4
Unet2D (torch.compile)1bf162626 img/sec  64Lightning 2.2.4
Unet3D8bf16252 img/sec254.7 img/sec74.262Lightning 2.2.4
Unet3D1bf1632.42 img/sec  2Lightning 2.2.4

 

Model#HPUPrecisionThroughputAccTTTBatchTaskFramework Version
Llama2-70B Fine Tuning FSDP (LoRA with torch.compile)8bf161.3 sentences/sec2.1381.75 min10language-modelingOptimum Habana 1.11.1
Llama2-70B Fine Tuning (LoRA)8bf162.6 sentences/sec2.1339.43 min10language-modelingDeepSpeed 0.14.0 Optimum Habana 1.11.1
Llama1-7B Fine Tuning (LoRA)8bf16150 sentences/sec2.355.08 min64language-modelingOptimum Habana 1.11.1
Falcon-180B Fine Tuning (LoRA)8bf162.67 sentences/sec3.71149.41 min1language-modelingDeepSpeed 0.14.0 Optimum Habana 1.11.1
Falcon-40B Fine Tuning (LoRA)8bf1627.99 sentences/sec4.0615.85 min1language-modelingOptimum Habana 1.11.1
GPTJ-CLM8bf1622.24 sentences/sec0.5317.18 min4language-modelingDeepSpeed 0.14.0 Optimum Habana 1.11.1
GPTNEOX-20B-CLM16bf16294 sentences/sec0.5327.21 min2language-modelingDeepSpeed 0.14.0 Optimum Habana 1.11.1
BridgeTower8bf16726 sentences/sec 20.63 min40contrastive-image-textOptimum Habana 1.11.1
GPT28bf16651 sentences/sec0.41.61 min4language-modelingDeepSpeed 0.14.0 Optimum Habana 1.11.1
GPT2-XL8bf1694.24 sentences/sec0.476.55 min4language-modelingDeepSpeed 0.14.0 Optimum Habana 1.11.1
ALBERT-Large8bf162479 sentences/sec91.71.86 min32question-answeringOptimum Habana 1.11.1
ALBERT-XXL8bf16456 sentences/sec94.86.73 min16question-answeringOptimum Habana 1.11.1
BERT Base (torch.compile)8bf164172 sentences/sec85.351.16 min24question-answeringOptimum Habana 1.11.1
BERT-Large Fine Tuning (torch.compile)8bf162117 sentences/sec93.41.98 min32question-answeringOptimum Habana 1.11.1
ClipRoBERTa8bf1616366 images/sec 9.35 min64contrastive-image-textOptimum Habana 1.11.1
DistilBERT8bf169992 sentences/sec82.430.56 min64question-answeringOptimum Habana 1.11.1
Flan-T5 XXL8bf1626.99 sentences/sec37.06369.91 min22question-answeringOptimum Habana 1.11.1
RoBERTa Base8bf166640 sentences/sec92.140.73 min64question-answeringOptimum Habana 1.11.1
RoBERTa Large (torch.compile)8bf162122 sentences/sec94.432.06 min32question-answeringOptimum Habana 1.11.1
Swin Transformer8bf165841 images/sec99.091.8 min160image-classificationOptimum Habana 1.11.1
T5-LARGE8bf1687.57 sentences/sec44.34246.95 min4summarizationDeepSpeed 0.14.0 Optimum Habana 1.11.1
T5-Small8bf16553 sentences/sec26.19106.61 min4translationDeepSpeed 0.14.0 Optimum Habana 1.11.1
Vision Transformer8bf166496 images/sec98.911 min128image-classificationOptimum Habana 1.11.1
Wav2Vec2.0 AC8bf161960 sentences/sec80.942.45 min16speech-recognitionOptimum Habana 1.11.1
Wav2Vec2.0 ASR8bf1676 sentences/sec3.9620.65 min4speech-recognitionOptimum Habana 1.11.1


 

Model#HPUPrecisionThroughputAccuracyTime To TrainBatch SizeFramework Version
MosaicML MPT-1B8bf1624145.17 samples/sec7.3513.41 min512PyTorch 2.2.2
MosaicML MPT-70B32bf1617937.17 samples/sec6.95106.43 min512PyTorch 2.2.2


 

 

​​​
Model#HPUPrecisionThroughputAccuracyTime To TrainBatch SizeFramework Version
ResNet50 Keras LARS​ (torch.compile)32bf1645063 img/sec76.3424.5 min256 
ResNet50 Keras LARS​ (torch.compile)​8bf1611633 img/sec76.5569.76 min256 
ResNet50 Keras LARS​ (torch.compile)​1bf161621 img/sec  256 
BERT Pre Training combine32bf164792.62 sent/sec 1751 min64 
BERT Pre Training combine8bf161234 sent/sec  64 
BERT Pre Training combine1bf16155 sent/sec  64 
BERT Pre Training Phase 132bf165732.07 sent/secLoss:1315 min64 
BERT Pre Training Phase 18bf161481.31 sent/sec  64 
BERT Pre Training Phase 11bf16186.2 sent/sec  64 
BERT Pre Training Phase 232bf161917.35 sent/secLoss:436 min16 
BERT Pre Training Phase 28bf16487.99 sent/sec  16 
BERT Pre Training Phase 21bf1661.25 sent/sec  16 
BERT SQUAD Fine Tuning8bf16404.52 sent/sec90.6812.96 min24 
BERT SQUAD Fine Tuning1bf1653.58 sent/sec  24 
BART Fine Tuning8bf16   32 
DINO8bf16947 exmpl/sec772315 min64 
MobileNetV28bf1612632 img/sec71.49505 min256 
ResNet1528bf164967 img/sec78.63399 min128 
SSD**8bf163439 img/sec  128 
Transformer8bf16187860.33 tokens/sec28.11023 min4096 
Unet2D (torch.compile)8bf164773 img/sec72.8663 min64Lightning 2.2.4
Unet3D8bf1662 img/sec74.3373 min2Lightning 2.2.4
YOLOX8bf16313.37 img/sec39.752326.8 min16 


 

 

​​​​​​​​​​​​​​​​
Model#HPUPrecisionThroughputAccuracyTime To TrainBatch SizeTaskFramework Version
GPT2-XL8bf1619.37 sentences/sec0.4774 min4language-modelingDeepSpeed 0.14.0, Optimum Habana 1.11.1
GPT2​8bf16167.41 sentences/sec0.414.2 min4language-modelingDeepSpeed 0.14.0, Optimum Habana 1.11.1
T5-LARGE​8bf1650 sentences/sec44.34365 min4summarizationDeepSpeed 0.14.0, Optimum Habana 1.11.1
T5-Small8bf16192 sentences/sec26.12116.8 min4translationDeepSpeed 0.14.0, Optimum Habana 1.11.1
ALBERT-L8bf16490.11 sentences/sec92.577.9 min32question-answeringOptimum Habana 1.11.1
ALBERT-XXL​8bf1675.34 sentences/sec94.8841.4 min12question-answeringOptimum Habana 1.11.1
BERT-BASE FT (torch.compile)8bf161178 sentences/sec85.533 min24question-answeringOptimum Habana 1.11.1
BERT-Large FT (torch.compile)​8bf16413 sentences/sec93.298.6 min24question-answeringOptimum Habana 1.11.1
Clip-RoBERTa​8bf16895 images/sec 45.2 min64contrastive-image-textOptimum Habana 1.11.1
DistilBERT8bf161524 sentences/sec85.723 min8question-answeringOptimum Habana 1.11.1
RoBERTa Base​8bf161066 sentences/sec91.813.13 min12question-answeringOptimum Habana 1.11.1
RoBERTa Large (torch.compile)​8bf16410 sentences/sec94.768.6 min12question-answeringOptimum Habana 1.11.1
Swin Transformer8bf161573 images/sec98.684.8 min64question-answeringOptimum Habana 1.11.1
Vision Transformer8bf162461 images/sec97.192.81 min64question-answeringOptimum Habana 1.11.1
Wav2Vec2-AC​8bf16667 sentences/sec81.846.3 min16speech-recognitionOptimum Habana 1.11.1
Wav2Vec2-ASR8bf1641.83 sentences/sec 4.236.73 min4speech-recognitionOptimum Habana 1.11.1