Performance Data for Intel® AI Data Center Products
Find the latest AI benchmark performance data for Intel Data Center products, including detailed hardware and software configurations.
Pretrained models, sample scripts, best practices, and tutorials
- Intel® Developer Cloud
- Intel® AI Reference Models and Jupyter Notebooks*
- AI-Optimized CPU Containers from Intel
- AI-Optimized GPU Containers from Intel
- Jupyter Notebook tutorials for OpenVINO™
- AI Performance Debugging on Intel® CPUs
Measurements were taken using:
- PyTorch* Optimizations from Intel
- TensorFlow* Optimizations from Intel
- Intel® Distribution of OpenVINO™ Toolkit
6th Gen Intel® Xeon® Scalable Processors
Intel® Xeon® 6980P Processor (128 Cores)
Inference
Framework Version | Model | Usage | Precision | Throughput | Perf/Watt | Latency(ms) | Batch size |
---|---|---|---|---|---|---|---|
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | avx_fp32 | 51.33 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | avx_fp32 | 49.95 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | avx_fp32 | 405.71 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | avx_fp32 | 351.60 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 163.14 token/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 150.00 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 981.52 token/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 686.74 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 99.17 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 93.67 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 787.75 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 587.60 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 101.69 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 97.47 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 964.57 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 765.42 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_bf32 | 51.38 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_bf32 | 50.02 tokens/s | 1 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_bf32 | 576.72 tokens/s | 30 | ||
Intel PyTorch 2.6.0+IPEX Inf LLMs | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_bf32 | 466.59 tokens/s | 30 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | fp32 | 52.37 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | fp32 | 51.04 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | fp32 | 351.10 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | fp32 | 272.81 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 162.54 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 150.34 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 962.09 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 542.06 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 93.69 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 88.85 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 830.94 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 480.82 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 100.33 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 95.44 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 771.99 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | ChatGLM3-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 551.93 tokens/s | 64 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | avx_fp32 | 52.07 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | avx_fp32 | 50.30 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | avx_fp32 | 282.47 tokens/s | 17 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | avx_fp32 | 237.34 tokens/s | 17 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 158.13 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 146.94 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 765.28 tokens/s | 15 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 590.43 tokens/s | 25 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 99.54 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 93.75 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 673.26 tokens/s | 29 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 512.50 tokens/s | 31 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 101.03 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 97.47 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 687.17 tokens/s | 24 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 559.85 tokens/s | 22 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_bf32 | 52.20 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_bf32 | 50.40 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_bf32 | 360.43 tokens/s | 25 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_bf32 | 260.74 tokens/s | 17 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | avx_fp32 | 53.78 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | avx_fp32 | 51.81 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | avx_fp32 | 281.41 tokens/s | 4 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | avx_fp32 | 136.37 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 168.86 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 152.81 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_int8 | 746.81 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_int8 | 480.53 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 96.83 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 92.52 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_bf16 | 657.46 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_bf16 | 441.71 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 99.86 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 94.62 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 2016/32 | Natural Language Processing | amx_fp16 | 558.09 tokens/s | 8 | ||
OpenVINO 2024.4.0 Inf LLM | GPT-J-6B Token Size 1024/128 | Natural Language Processing | amx_fp16 | 347.37 tokens/s | 16 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | avx_fp32 | 24.18 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | avx_fp32 | 23.37 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | avx_fp32 | 122.01 tokens/s | 10 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | avx_fp32 | 86.45 tokens/s | 6 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_int8 | 82.13 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_int8 | 75.69 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_int8 | 436.95 tokens/s | 15 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_int8 | 283.91 tokens/s | 15 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf16 | 47.18 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf16 | 44.70 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf16 | 367.51 tokens/s | 30 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf16 | 245.02 tokens/s | 18 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_fp16 | 48.45 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_fp16 | 46.06 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_fp16 | 380.90 tokens/s | 24 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_fp16 | 288.17 tokens/s | 18 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf32 | 24.20 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf32 | 23.39 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf32 | 134.45 tokens/s | 10 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf32 | 89.23 tokens/s | 6 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | avx_fp32 | 23.40 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | avx_fp32 | 22.78 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | avx_fp32 | 132.09 tokens/s | 8 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | avx_fp32 | 103.81 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_int8 | 80.59 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_int8 | 72.12 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_int8 | 477.90 tokens/s | 8 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_int8 | 266.37 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf16 | 47.34 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf16 | 44.76 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf16 | 387.62 tokens/s | 8 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf16 | 231.78 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf32 | 47.60 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf32 | 44.36 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 2016/32 | Natural Language Processing | amx_bf32 | 395.06 tokens/s | 8 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-13B Token size 1024/128 | Natural Language Processing | amx_bf32 | 226.38 tokens/s | 16 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | avx_fp32 | 45.23 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | avx_fp32 | 43.55 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | avx_fp32 | 268.14 tokens/s | 23 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | avx_fp32 | 201.32 tokens/s | 17 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_int8 | 143.24 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_int8 | 132.19 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_int8 | 660.96 tokens/s | 15 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_int8 | 430.51 tokens/s | 15 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf16 | 86.96 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf16 | 82.07 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf16 | 604.93 tokens/s | 25 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf16 | 432.21 tokens/s | 25 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_fp16 | 88.89 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_fp16 | 84.11 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_fp16 | 638.30 tokens/s | 25 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_fp16 | 470.77 tokens/s | 22 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf32 | 45.26 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf32 | 43.51 tokens/s | 1 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf32 | 307.81 tokens/s | 23 | ||
Intel PyTorch 2.6.0+ IPEX Inf LLMs | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf32 | 224.92 tokens/s | 17 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | avx_fp32 | 45.94 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | avx_fp32 | 44.23 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | avx_fp32 | 262.65 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | avx_fp32 | 190.16 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_int8 | 138.41 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_int8 | 126.22 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_int8 | 703.42 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_int8 | 481.97 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf16 | 84.21 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf16 | 79.43 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf16 | 618.63 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf16 | 424.50 tokens/s | 32 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf32 | 85.19 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf32 | 80.02 tokens/s | 1 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 2016/32 | Natural Language Processing | amx_bf32 | 613.47 tokens/s | 16 | ||
OpenVINO 2024.4.0 Inf LLM | LLaMA2-7B Token size 1024/128 | Natural Language Processing | amx_bf32 | 439.74 tokens/s | 32 | ||
OpenVINO 2024.4.0 | Stable-Diffusion | Image Generation | fp32 | 0.09 samp/s | 1 | ||
OpenVINO 2024.4.0 | Stable-Diffusion | Image Generation | amx_int8 | 0.25 samp/s | 1 | ||
OpenVINO 2024.4.0 | Stable-Diffusion | Image Generation | amx_bf16 | 0.25 samp/s | 1 | ||
OpenVINO 2024.4.0 | Stable-Diffusion | Image Generation | amx_fp16 | 0.25 samp/s | 1 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | fp32 | 121.65 sent/s | 1 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | fp32 | 113.96 sent/s | 16 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | amx_int8 | 733.89 sent/s | 1 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | amx_int8 | 761.71 sent/s | 32 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | amx_bf16 | 456.85 sent/s | 1 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | amx_bf16 | 462.06 sent/s | 16 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | amx_fp16 | 457.07 sent/s | 1 | ||
OpenVINO 2024.4.0 | BERTLarge | Natural Language Processing | amx_fp16 | 407.68 sent/s | 16 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | avx_fp32 | 113.44 sent/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | avx_fp32 | 109.88 sent/s | 40 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_int8 | 829.34 sent/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_int8 | 1030.82 sent/s | 64 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_bf16 | 473.94 sent/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_bf16 | 554.99 sent/s | 32 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_fp16 | 441.32 sent/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_fp16 | 460.63 sent/s | 88 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_bf32 | 212.01 sent/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | BERT Large | Natural Language Processing | amx_bf32 | 212.02 sent/s | 88 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | fp32 | 108.62 sent/s | 1 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | fp32 | 99.07 sent/s | 32 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_int8 | 484.98 sent/s | 1 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_int8 | 569.70 sent/s | 16 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_bf16 | 403.19 sent/s | 1 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_bf16 | 438.23 sent/s | 32 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_fp16 | 405.00 sent/s | 1 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_fp16 | 432.31 sent/s | 32 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_bf32 | 202.29 sent/s | 1 | ||
Intel Tensor Flow 2.19.0 | BERT Large | Natural Language Processing | amx_bf32 | 190.89 sent/s | 32 | ||
Intel PyTorch 2.6.0 + IPEX | DLRM-v2 | Recommender | avx_fp32 | 844726.49 rec/s | 128 | ||
Intel PyTorch 2.6.0 + IPEX | DLRM-v2 | Recommender | amx_int8 | 6,676,543.49 rec/s | 128 | ||
Intel PyTorch 2.6.0 + IPEX | DLRM-v2 | Recommender | amx_bf16 | 4,481,704.53 rec/s | 128 | ||
Intel PyTorch 2.6.0 + IPEX | DLRM-v2 | Recommender | amx_fp16 | 4,321,739.37 rec/s | 128 | ||
Intel PyTorch 2.6.0 + IPEX | DLRM-v2 | Recommender | amx_bf32 | 1.588,266.49 rec/s | 128 | ||
Intel PyTorch 2.6.0 + IPEX | Stable-Diffusion | Image Generation | avx_fp32 | 0.12 img/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Stable-Diffusion | Image Generation | amx_int8 | 0.41 img/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Stable-Diffusion | Image Generation | amx_bf16 | 0.35 img/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Stable-Diffusion | Image Generation | amx_fp16 | 0.37 img/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Stable-Diffusion | Image Generation | amx_bf32 | 0.15 img/s | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | avx_fp32 | 779.13 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | avx_fp32 | 807.66 fps | 160 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_int8 | 4490.99 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_int8 | 6277.16 fps | 94 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_bf16 | 2624.42 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_bf16 | 3570.05 fps | 96 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_fp16 | 2558.35 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_fp16 | 3442.89 fps | 256 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_bf32 | 1352.02 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Vision-Transformer | Image Recognition | amx_bf32 | 1572.22 fps | 256 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | fp32 | 744.63 fps | 1 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | fp32 | 771.33 fps | 252 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_int8 | 2876.37 fps | 1 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_int8 | 4085.75 fps | 252 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_bf16 | 2332.85 fps | 1 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_bf16 | 3143.31 fps | 159 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_fp16 | 2379.54 fps | 1 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_fp16 | 3058.30 fps | 159 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_bf32 | 1641.15 fps | 1 | ||
Intel Tensor Flow 2.19.0 | Vision-Transformer | Image Recognition | amx_bf32 | 1891.57 fps | 239 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | fp32 | 812.05 fps | 1 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | fp32 | 847.73 fps | 32 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | amx_int8 | 3997.38 fps | 1 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | amx_int8 | 4198.79 fps | 32 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | amx_bf16 | 2406.63 fps | 1 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | amx_bf16 | 2609.63 fps | 64 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | amx_fp16 | 2358.47 fps | 1 | ||
OpenVINO 2024.4.0 | Vision-Transformer | Image Recognition | amx_fp16 | 2537.90 fps | 64 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | fp32 | 3776.73 fps | 1 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | fp32 | 3800.48 fps | 64 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | amx_int8 | 21,118.56 fps | 1 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | amx_int8 | 29,484 fps | 64 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | amx_bf16 | 14,487.85 fps | 1 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | amx_bf16 | 17,805.47 fps | 32 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | amx_fp16 | 14,475.74 fps | 1 | ||
OpenVINO 2024.4.0 | ResNet50-v1-5 | Image Classification | amx_fp16 | 17,687.28 fps | 32 | ||
Intel PyTorch 2.6.0 + IPEX | LCM | Reasoning and Understanding | avx_fp32 | 1.78 | 1 | ||
Intel PyTorch 2.6.0 + IPEX | LCM | Reasoning and Understanding | amx_int8 | 6.43 | 1 | ||
Intel PyTorch 2.6.0 + IPEX | LCM | Reasoning and Understanding | amx_bf16 | 4.96 | 1 | ||
Intel PyTorch 2.6.0 + IPEX | LCM | Reasoning and Understanding | amx_fp16 | 5.1 | 1 | ||
Intel PyTorch 2.6.0 + IPEX | LCM | Reasoning and Understanding | amx_bf32 | 2.07 | 1 | ||
OpenVINO 2024.4.0 | LCM | Reasoning and Understanding | fp32 | 1.4 | 1 | ||
OpenVINO 2024.4.0 | LCM | Reasoning and Understanding | amx_int8 | 3.6 | 1 | ||
OpenVINO 2024.4.0 | LCM | Reasoning and Understanding | amx_bf16 | 3.7 | 1 | ||
OpenVINO 2024.4.0 | LCM | Reasoning and Understanding | amx_fp16 | 3.58 | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | avx_fp32 | 282.23 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | avx_fp32 | 283.59 fps | 21 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_int8 | 1403.46 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_int8 | 1038.72 fps | 10 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_bf16 | 1058.43 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_bf16 | 1011.21 fps | 21 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_fp16 | 994.78 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_fp16 | 961.03 fps | 21 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_bf32 | 400.53 fps | 1 | ||
Intel PyTorch 2.6.0 + IPEX | Yolo-v7 | Object Detection | amx_bf32 | 376.71 fps | 21 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | fp32 | 1415.87 img/s | 1 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | fp32 | 1509.53 img/s | 94 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | amx_bf16 | 2726.87 img/s | 1 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | amx_bf16 | 3986.90 img/s | 94 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | amx_fp16 | 2882.10 img/s | 1 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | amx_fp16 | 4199.40 img/s | 84 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | amx_bf32 | 1587.33 img/s | 1 | ||
Intel Tensor Flow 2.19.0 | Yolo-v5 | Object Detection | amx_bf32 | 1879.03 img/s | 94 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | fp32 | 1570.38 img/s | 1 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | fp32 | 1388.07 img/s | 16 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | amx_int8 | 6151.55 img/s | 1 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | amx_int8 | 5170.74 img/s | 16 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | amx_bf16 | 4738.79 img/s | 1 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | amx_bf16 | 3825.40 img/s | 16 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | amx_fp16 | 4585.38 img/s | 1 | ||
OpenVINO 2024.4.0 | Yolov-5s | Object Detection | amx_fp16 | 3505.09 img/s | 16 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | FP32 | 15,749.10 | 1 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | FP32 | 15,927.51 | 2625 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | amx_bf16 | 31,608.73 | 1 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | amx_bf16 | 40,945.58 | 2625 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | amx_fp16 | 29,505.19 | 1 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | amx_fp16 | 32,624.98 | 2625 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | amx_bf32 | 16,486.88 | 1 | ||
Intel Tensor Flow 2.19.0 | R-GAT | Multi-Relational Graphs | amx_bf32 | 22,650.06 | 2625 |
Hardware and software configuration (measured March 13, 2025):
1-node, 2x Intel® Xeon® 6980P processors, 128 cores, hyperthreading on, turbo on, non-uniform memory access (NUMA) 6.
Integrated accelerators available (used): DLB 8 [0], DSA 8 [0], IAA 8 [0], QAT 8 [0].
Total memory: 1536 GB (24 x 64 GB DDR5 8800 MT/s [8800 MT/s]), BIOS BHSDCRB1.IPC.0033.D57.2406240014, microcode 0x81000290, 1x Ethernet controller I225-LM, 1x 3.5T SSDPF2KX038TZ from Intel, 1x 894.3G Micron_7450_MTFDKBG960TFR, CentOS* Stream 9, 6.6.43. TensorFlow*: 2.19.0, Intel® oneAPI Deep Neural Network Library (oneDNN): e34cb13, PyTorch*: 2.6.0.dev20241124+cpu, Intel® Extension for PyTorch*: 2.6.0+gitc5a2330, oneDNN: v3.6.2, OpenVINO™ toolkit: 2024.4.0, oneDNN: 3.5.0. Test by Intel as of March 13, 2025, 10:45:43 a.m. UTC.