| Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | | | 40 | 1 | 1 instance per socket |
| Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | 130.4 tokens/s | | 92 | 6 | 1 instance per socket |
| Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | | | 59.5 | 1 | 1 instance per socket |
| Intel PyTorch 2.1 DeepSpeed | GPT-J 6B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | 125 tokens/s | | 96 | 6 | 1 instance per socket |
| MLPerf Inference v3.1 | GPT-J (offline, 99.0% acc) | Large Language Model | int8 | 2.05 samp/s | | | 7 | 4 cores per instance |
| Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | | | 47 | 1 | 1 instance per socket |
| Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | int8 | 111.6 tokens/s | | 107.5 | 6 | 1 instance per socket |
| Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | | | 68 | 1 | 1 instance per socket |
| Intel PyTorch 2.1 DeepSpeed | LLaMA2-7B Token size 1024/128 | text-generation, Beam Search, Width=4 | bf16 | 109.1 tokens/s | | 110 | 6 | 1 instance per socket |
| MLPerf Inference v3.1 | ResNet50 v1.5 (offline) | Image Recognition | int8 | 20,565.5 samp/s | | | 256 | 1 core per instance |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | int8 | 10,215.7 img/s | 9.98 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | int8 | 13,862.96 img/s | 14.09 | | 116 | 1 instance per socket |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf16 | 6,210.69 img/s | 6.13 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf16 | 7,295.63 img/s | 7.33 | | 116 | 1 instance per socket |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | fp32 | 1,319.52 img/s | 1.27 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | fp32 | 1,360.05 img/s | 1.28 | | 116 | 1 instance per socket |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf32 | 1,659.37 img/s | 1.65 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNet50 v1.5 | Image Recognition | bf32 | 1,985.26 img/s | 2.02 | | 116 | 1 instance per socket |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | int8 | 7,440.61 img/s | 7.70 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | int8 | 12,345.54 img/s | 11.80 | | 116 | 1 instance per socket |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf16 | 5,053.76 img/s | 5.01 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf16 | 6,704.17 img/s | 6.34 | | 116 | 1 instance per socket |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | fp32 | 1,282.77 img/s | 1.17 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | fp32 | 1,342.91 img/s | 1.27 | | 116 | 1 instance per socket |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf32 | 1,529.49 img/s | 1.41 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | ResNet50 v1.5 | Image Recognition | bf32 | 2,017.54 img/s | 1.89 | | 116 | 1 instance per socket |
| OpenVINO 2023.2 | ResNet50 v1.5 | Image Recognition | int8 | 8,819.657 img/s | 8.81 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | ResNet50 v1.5 | Image Recognition | bf16 | 5,915.793 img/s | 5.82 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | ResNet50 v1.5 | Image Recognition | fp32 | 1,281.337 img/s | 1.25 | | 1 | 4 cores per instance |
| MLPerf Inference v3.1 | BERT-Large (offline, 99.0% acc) | Natural Language Processing | int8 | 1,357.33 samp/s | | | 1,300 | 4 cores per instance |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | int8 | 335.1 sent/s | 0.35 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | int8 | 378.73 sent/s | 0.36 | | 56 | 1 instance per socket |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf16 | 204.52 sent/s | 0.21 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf16 | 201.44 sent/s | 0.21 | | 16 | 1 instance per socket |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | fp32 | 35.25 sent/s | 0.03 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | fp32 | 41.05 sent/s | 0.04 | | 56 | 1 instance per socket |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf32 | 72.42 sent/s | 0.07 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | BERTLarge | Natural Language Processing | bf32 | 71.63 sent/s | 0.07 | | 16 | 1 instance per socket |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | int8 | 253.27 sent/s | 0.24 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | int8 | 239.89 sent/s | 0.25 | | 16 | 1 instance per socket |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf16 | 181.02 sent/s | 0.18 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf16 | 184.06 sent/s | 0.17 | | 128 | 1 instance per socket |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | fp32 | 44.73 sent/s | 0.04 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | fp32 | 38.58 sent/s | 0.04 | | 16 | 1 instance per socket |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf32 | 72.78 sent/s | 0.07 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | BERTLarge | Natural Language Processing | bf32 | 71.77 sent/s | 0.07 | | 16 | 1 instance per socket |
| OpenVINO 2023.2 | BERTLarge | Natural Language Processing | int8 | 298.44 sent/s | 0.30 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | BERTLarge | Natural Language Processing | int8 | 285.68 sent/s | 0.28 | | 48 | 1 instance per socket |
| OpenVINO 2023.2 | BERTLarge | Natural Language Processing | bf16 | 202.48 sent/s | 0.20 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | BERTLarge | Natural Language Processing | bf16 | 191.2533 sent/s | 0.19 | | 32 | 1 instance per socket |
| OpenVINO 2023.2 | BERTLarge | Natural Language Processing | fp32 | 47.33667 sent/s | 0.05 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | BERTLarge | Natural Language Processing | fp32 | 44.23333 sent/s | 0.04 | | 48 | 1 instance per socket |
| MLPerf Inference v3.1 | DLRM-v2 (offline, 99.0% acc) | Recommender | int8 | 5,367.77 samp/s | | | 300 | 1 core per instance |
| Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | int8 | 23,444,587 rec/s | 23611.92 | | 128 | 1 instance per socket |
| Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | bf16 | 10,646,560 rec/s | 10238.88 | | 128 | 1 instance per socket |
| Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | fp32 | 2,278,228 rec/s | 2220.37 | | 128 | 1 instance per socket |
| Intel PyTorch 2.1 | DLRM Criteo Terabyte | Recommender | bf32 | 4,530,200 rec/s | 4427.38 | | 128 | 1 instance per socket |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | int8 | 4,726.15 sent/s | 4.94 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | int8 | 7,759.25 sent/s | 8.42 | | 168 | 1 instance per socket |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf16 | 3,306.46 sent/s | 3.35 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf16 | 5,057.47 sent/s | 5.50 | | 120 | 1 instance per socket |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | fp32 | 900.58 sent/s | 0.85 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | fp32 | 1,007.05 sent/s | 1.04 | | 56 | 1 instance per socket |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf32 | 1,513.66 sent/s | 1.49 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | DistilBERT | Natural Language Processing | bf32 | 1,926.1 sent/s | 1.77 | | 288 | 1 instance per socket |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | int8 | 61.03 sent/s | 0.06 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | int8 | 245.66 sent/s | 0.24 | | 448 | 1 instance per socket |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf16 | 41.44 sent/s | 0.04 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf16 | 278.81 sent/s | 0.28 | | 448 | 1 instance per socket |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | fp32 | 20.27 sent/s | 0.02 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | fp32 | 102.48 sent/s | 0.10 | | 448 | 1 instance per socket |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf32 | 20.28 sent/s | 0.02 | | 1 | 4 cores per instance |
| Intel TensorFlow 2.14 | Transformer MLPerf | Language Translation | bf32 | 114.08 sent/s | 0.11 | | 448 | 1 instance per socket |
| OpenVINO 2023.2 | 3D-Unet | Image Segmentation | int8 | 24.68333 samp/s | 0.02 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | 3D-Unet | Image Segmentation | int8 | 21.85667 samp/s | 0.02 | | 6 | 1 instance per socket |
| OpenVINO 2023.2 | 3D-Unet | Image Segmentation | bf16 | 13.05333 samp/s | 0.01 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | 3D-Unet | Image Segmentation | bf16 | 11.87 samp/s | 0.01 | | 6 | 1 instance per socket |
| OpenVINO 2023.2 | 3D-Unet | Image Segmentation | fp32 | 2.883333 samp/s | 0.00 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | 3D-Unet | Image Segmentation | fp32 | 2.62 samp/s | 0.00 | | 6 | 1 instance per socket |
| OpenVINO 2023.2 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | int8 | 459.3633 img/s | 0.44 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | bf16 | 218.4133 img/s | 0.20 | | 1 | 4 cores per instance |
| OpenVINO 2023.2 | SSD-ResNet34 COCO 2017 (1200 x1200) | Object Detection | fp32 | 31.17333 img/s | 0.03 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | int8 | 1289.95 fps | 1.35 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | int8 | 1923.77 fps | 1.83 | | 116 | 1 instance per socket |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf16 | 648.58 fps | 0.66 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf16 | 867.05 fps | 0.87 | | 64 | 1 instance per socket |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | fp32 | 151.29 fps | 0.14 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | fp32 | 160.93 fps | 0.15 | | 64 | 1 instance per socket |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf32 | 215.11 fps | 0.21 | | 1 | 4 cores per instance |
| Intel PyTorch 2.1 | ResNeXt101 32x16d ImageNet | Image Classification | bf32 | 241.98 fps | 0.22 | | 116 | 1 instance per socket |
| MLPerf Inference v3.1 | RetinaNet (offline) | Object Detection | int8 | 284.75 samp/s | | | 2 | 4 cores per instance |
| MLPerf Inference v3.1 | RNN-T (offline) | Speech-to-text | int8+bf16 | 5,782.18 samp/s | | | 256 | 4 cores per instance |