​​​​
Model#HPUPrecisionPerformanceFramework Version
MLPerf4.0 LLama 2 70B Server8fp86222.9 token/secPyTorch 2.2.2
MLPerf4.0 Llama 2 70B Offline​8fp87808 token/secPyTorch 2.2.2
MLPerf4.0 Stable Diffusion XL Server8fp86.25 Queries/s 
MLPerf4.0 Stable Diffusion XL Offline​8fp86.45 samples/sec620.15 ms

 

​​​​​​​​​​​​
Model#HPUPrecisionInput LengthOutput LengthThroughputBatchFramework Version
LLaMA 2 7B1fp812812813163 tokens/sec1230Optimum Habana 1.11.1
LLaMA 2 7B1fp812820484777 tokens/sec163Optimum Habana 1.11.1
LLaMA 2 7B1fp820481281291 tokens/sec94Optimum Habana 1.11.1
LLaMA 2 7B1fp8204820481943 tokens/sec81Optimum Habana 1.11.1
LLaMA 2 70B2fp81281282727 tokens/sec1750DeepSpeed 0.14.0, Optimum Habana 1.11.1
LLaMA 2 70B4fp812820487422 tokens/sec750DeepSpeed 0.14.0, Optimum Habana 1.11.1
LLaMA 2 70B2fp82048128276 tokens/sec95DeepSpeed 0.14.0, Optimum Habana 1.11.1
LLaMA 2 70B2fp820482048958 tokens/sec78DeepSpeed 0.14.0, Optimum Habana 1.11.1
Mistral 7B Instruct1fp812812813112 tokens/sec896Optimum Habana 1.11.1
Mistral 7B Instruct1fp812820487947 tokens/sec120Optimum Habana 1.11.1
Mistral 7B Instruct1fp820481281360 tokens/sec120Optimum Habana 1.11.1
Mistral 7B Instruct1fp8204820483143 tokens/sec44Optimum Habana 1.11.1

​​​​​​
Model#HPUPrecisionInput LengthLatencyBatchFramework Version
LLaMA 2 7B1fp81288.19 ms1Optimum Habana 1.11.1
LLaMA 2 7B1fp8204856.97 ms1Optimum Habana 1.11.1
LLaMA 2 70B8fp812824.33 ms1Optimum Habana 1.11.1
LLaMA 2 70B8fp82048122 ms1Optimum Habana 1.11.1
Mistral 7B Instruct1fp812810.8 ms1Optimum Habana 1.11.1
Mistral 7B Instruct1fp8204892 ms1Optimum Habana 1.11.1

​​​​​​​
Model#HPUPrecisionThroughputLatency‡BatchFramework Version
Stable Diffusion v2.1 (512x512)**1bf161.23 img/sec813 ms1Lightning 2.2.0
Stable Diffusion v2.1 (768X768)**1bf160.4 img/sec2500 ms1Lightning 2.2.0
Bert FT (torch.compile)1bf16806 token/sec29.77 ms24 
Resnet50 (torch.compile)1bf1617172.69 img/sec14.9 ms256 
Resnext1011bf1610670 img/sec23.99 ms256 
Unet2D1bf167483 img/sec8.55 ms64Lightning 2.2.4
Unet3D1bf16128 img/sec15.62 ms2Lightning 2.2.4

 

​​​​​​​​​​​​​​​​​​​​​​​
Model#HPUPrecisionInput LengthOutput LengthThroughputLatencyBatchTaskFramework Version
Llama 2-7B (torch.compile)1bf161281285820 token/sec51.54 ms300text-generationOptimum Habana 1.11.1
Falcon 180B8bf161282048700 token/sec57.14 ms40text-generationOptimum Habana 1.11.1
Falcon-40B 2048 Tokens8bf16128204892.34 token/sec10.82 ms1text-generationOptimum Habana 1.11.1
Falcon-7B 8192 Tokens1bf161288192118.19 token/sec8.46 ms1text-generationOptimum Habana 1.11.1
GPT-J8bf16128100628.74 token/sec6.36 ms4text-generationOptimum Habana 1.11.1
StableLM-3B1bf161282048250 token/sec4 ms1text-generationOptimum Habana 1.11.1
StableLM-7B1bf161282048128 token/sec7.81 ms1text-generationOptimum Habana 1.11.1
MPT-7B1bf161281932121 token/sec8.26 ms1text-generationOptimum Habana 1.11.1
Bloomz8bf1612810036.78 token/sec27.18 ms1text-generationDeepSpeed 0.14.0, Optimum Habana 1.11.1
StarCoder1bf1610010065 token/sec15.38 ms1text-generationDeepSpeed 0.14.0, Optimum Habana 1.11.1
OPT1bf161001001120 token/sec0.89 ms1text-generationOptimum Habana 1.11.1
T5-3B Summarization 1024-128 Beam41bf1610241280.94 token/sec1063.82 ms1summarizationOptimum Habana 1.11.1
Bert (Text Classification)1bf16 1282125 token/sec3.76 ms8text-generationOptimum Habana 1.11.1
Bert (Language Modeling)1bf16  66.64 token/sec60.02 ms4language-modelingOptimum Habana 1.11.1
Bert (Question Answering)1bf16 384613 token/sec13.05 ms8question-answeringOptimum Habana 1.11.1
StableDiffusion v2.1 (512x512)1bf16  1.33 images/sec3007.51 ms4stable-diffusionPyTorch Lightning 2.2.4
Bart1bf16  6.79 token/sec294.55 ms2summarizationOptimum Habana 1.11.1
BridgeTower1bf16  321 token/sec49.84 ms16constrastive-image-textOptimum Habana 1.11.1
ESMFold1bf16  2.97 token/sec336.7 ms1protein-foldingOptimum Habana 1.11.1
T5-3B Summarization Greedy1bf16  2.46 token/sec406.5 ms1summarizationOptimum Habana 1.11.1
HF-T5-Small-Translation-Greedy1bf16  30.85 token/sec129.65 ms4translationOptimum Habana 1.11.1
Wav2vec(Audio Classification)1bf16  1002 token/sec3.99 ms4audio-classificationOptimum Habana 1.11.1
Wav2vec(Speech Recoginition)1bf16  16.62 token/sec240.67 ms4speech-recoginitionOptimum Habana 1.11.1

​​​
Model#HPUPrecisionThroughputLatencyBatch SizeFramework Version
Bert1bf16154.1 token/sec155.74 ms24 
Unet2D1bf163730 img/sec17.15 ms64Lightning 2.2.4
Unet3D1bf1664.1 img/sec31.2 ms2Lightning 2.2.4

​​​​​​​​​
Model#HPUPrecisionThroughputLatencyBatchTaskFramework Version
HF Bert (Language Modeling)1bf16  4language-modelingOptimum Habana 1.11.1
HF Bert (Question Answering)1bf16127.7 token/sec62.64 ms8question-answeringOptimum Habana 1.11.1
HF Bert (Text Classification)1bf16434.4 token/sec18.41 ms8text-classificationOptimum Habana 1.11.1
HF Bart-Greedy1bf163.1 token/sec645.16 ms2summarizationOptimum Habana 1.11.1
HF ESMFold1bf1613.9 token/sec71.94 ms1protein-foldingOptimum Habana 1.11.1
HF StableDiffusion V2-1 (512x512)1bf160.4 token/sec10000 ms4text to image generationOptimum Habana 1.11.1
HF-T5-Small-Translation-Greedy1bf1616.8 token/sec238.09 ms4translationOptimum Habana 1.11.1
HF Wav2vec(Audio Classification)1bf16494.6 token/sec8.08 ms4speech-recognitionOptimum Habana 1.11.1
HF Wav2vec(Speech Recoginition)1bf169.5 token/sec421.05 ms4speech-recognitionOptimum Habana 1.11.1