Framework Version	Model	Usage	Precision	Throughput	Perf/Watt	Latency(ms)	Batch size	Config*
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	int8			40	1	1 instance per socket
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	int8	130.4 tokens/s		92	6	1 instance per socket
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	bf16			59.5	1	1 instance per socket
Intel PyTorch 2.1 DeepSpeed	GPT-J 6B Token size 1024/128	text-generation, Beam Search, Width=4	bf16	125 tokens/s		96	6	1 instance per socket
MLPerf Inference v3.1	GPT-J (offline, 99.0% acc)	Large Language Model	int8	2.05 samp/s			7	4 cores per instance
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	int8			47	1	1 instance per socket
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	int8	111.6 tokens/s		107.5	6	1 instance per socket
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	bf16			68	1	1 instance per socket
Intel PyTorch 2.1 DeepSpeed	LLaMA2-7B Token size 1024/128	text-generation, Beam Search, Width=4	bf16	109.1 tokens/s		110	6	1 instance per socket
MLPerf Inference v3.1	ResNet50 v1.5 (offline)	Image Recognition	int8	20,565.5 samp/s			256	1 core per instance
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	int8	10,215.7 img/s	9.98		1	4 cores per instance
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	int8	13,862.96 img/s	14.09		116	1 instance per socket
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf16	6,210.69 img/s	6.13		1	4 cores per instance
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf16	7,295.63 img/s	7.33		116	1 instance per socket
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	fp32	1,319.52 img/s	1.27		1	4 cores per instance
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	fp32	1,360.05 img/s	1.28		116	1 instance per socket
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf32	1,659.37 img/s	1.65		1	4 cores per instance
Intel PyTorch 2.1	ResNet50 v1.5	Image Recognition	bf32	1,985.26 img/s	2.02		116	1 instance per socket
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	int8	7,440.61 img/s	7.70		1	4 cores per instance
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	int8	12,345.54 img/s	11.80		116	1 instance per socket
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf16	5,053.76 img/s	5.01		1	4 cores per instance
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf16	6,704.17 img/s	6.34		116	1 instance per socket
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	fp32	1,282.77 img/s	1.17		1	4 cores per instance
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	fp32	1,342.91 img/s	1.27		116	1 instance per socket
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf32	1,529.49 img/s	1.41		1	4 cores per instance
Intel TensorFlow 2.14	ResNet50 v1.5	Image Recognition	bf32	2,017.54 img/s	1.89		116	1 instance per socket
OpenVINO 2023.2	ResNet50 v1.5	Image Recognition	int8	8,819.657 img/s	8.81		1	4 cores per instance
OpenVINO 2023.2	ResNet50 v1.5	Image Recognition	bf16	5,915.793 img/s	5.82		1	4 cores per instance
OpenVINO 2023.2	ResNet50 v1.5	Image Recognition	fp32	1,281.337 img/s	1.25		1	4 cores per instance
MLPerf Inference v3.1	BERT-Large (offline, 99.0% acc)	Natural Language Processing	int8	1,357.33 samp/s			1,300	4 cores per instance
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	int8	335.1 sent/s	0.35		1	4 cores per instance
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	int8	378.73 sent/s	0.36		56	1 instance per socket
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf16	204.52 sent/s	0.21		1	4 cores per instance
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf16	201.44 sent/s	0.21		16	1 instance per socket
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	fp32	35.25 sent/s	0.03		1	4 cores per instance
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	fp32	41.05 sent/s	0.04		56	1 instance per socket
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf32	72.42 sent/s	0.07		1	4 cores per instance
Intel PyTorch 2.1	BERTLarge	Natural Language Processing	bf32	71.63 sent/s	0.07		16	1 instance per socket
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	int8	253.27 sent/s	0.24		1	4 cores per instance
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	int8	239.89 sent/s	0.25		16	1 instance per socket
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf16	181.02 sent/s	0.18		1	4 cores per instance
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf16	184.06 sent/s	0.17		128	1 instance per socket
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	fp32	44.73 sent/s	0.04		1	4 cores per instance
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	fp32	38.58 sent/s	0.04		16	1 instance per socket
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf32	72.78 sent/s	0.07		1	4 cores per instance
Intel TensorFlow 2.14	BERTLarge	Natural Language Processing	bf32	71.77 sent/s	0.07		16	1 instance per socket
OpenVINO 2023.2	BERTLarge	Natural Language Processing	int8	298.44 sent/s	0.30		1	4 cores per instance
OpenVINO 2023.2	BERTLarge	Natural Language Processing	int8	285.68 sent/s	0.28		48	1 instance per socket
OpenVINO 2023.2	BERTLarge	Natural Language Processing	bf16	202.48 sent/s	0.20		1	4 cores per instance
OpenVINO 2023.2	BERTLarge	Natural Language Processing	bf16	191.2533 sent/s	0.19		32	1 instance per socket
OpenVINO 2023.2	BERTLarge	Natural Language Processing	fp32	47.33667 sent/s	0.05		1	4 cores per instance
OpenVINO 2023.2	BERTLarge	Natural Language Processing	fp32	44.23333 sent/s	0.04		48	1 instance per socket
MLPerf Inference v3.1	DLRM-v2 (offline, 99.0% acc)	Recommender	int8	5,367.77 samp/s			300	1 core per instance
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	int8	23,444,587 rec/s	23611.92		128	1 instance per socket
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	bf16	10,646,560 rec/s	10238.88		128	1 instance per socket
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	fp32	2,278,228 rec/s	2220.37		128	1 instance per socket
Intel PyTorch 2.1	DLRM Criteo Terabyte	Recommender	bf32	4,530,200 rec/s	4427.38		128	1 instance per socket
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	int8	4,726.15 sent/s	4.94		1	4 cores per instance
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	int8	7,759.25 sent/s	8.42		168	1 instance per socket
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf16	3,306.46 sent/s	3.35		1	4 cores per instance
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf16	5,057.47 sent/s	5.50		120	1 instance per socket
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	fp32	900.58 sent/s	0.85		1	4 cores per instance
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	fp32	1,007.05 sent/s	1.04		56	1 instance per socket
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf32	1,513.66 sent/s	1.49		1	4 cores per instance
Intel PyTorch 2.1	DistilBERT	Natural Language Processing	bf32	1,926.1 sent/s	1.77		288	1 instance per socket
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	int8	61.03 sent/s	0.06		1	4 cores per instance
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	int8	245.66 sent/s	0.24		448	1 instance per socket
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf16	41.44 sent/s	0.04		1	4 cores per instance
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf16	278.81 sent/s	0.28		448	1 instance per socket
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	fp32	20.27 sent/s	0.02		1	4 cores per instance
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	fp32	102.48 sent/s	0.10		448	1 instance per socket
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf32	20.28 sent/s	0.02		1	4 cores per instance
Intel TensorFlow 2.14	Transformer MLPerf	Language Translation	bf32	114.08 sent/s	0.11		448	1 instance per socket
OpenVINO 2023.2	3D-Unet	Image Segmentation	int8	24.68333 samp/s	0.02		1	4 cores per instance
OpenVINO 2023.2	3D-Unet	Image Segmentation	int8	21.85667 samp/s	0.02		6	1 instance per socket
OpenVINO 2023.2	3D-Unet	Image Segmentation	bf16	13.05333 samp/s	0.01		1	4 cores per instance
OpenVINO 2023.2	3D-Unet	Image Segmentation	bf16	11.87 samp/s	0.01		6	1 instance per socket
OpenVINO 2023.2	3D-Unet	Image Segmentation	fp32	2.883333 samp/s	0.00		1	4 cores per instance
OpenVINO 2023.2	3D-Unet	Image Segmentation	fp32	2.62 samp/s	0.00		6	1 instance per socket
OpenVINO 2023.2	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	int8	459.3633 img/s	0.44		1	4 cores per instance
OpenVINO 2023.2	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	bf16	218.4133 img/s	0.20		1	4 cores per instance
OpenVINO 2023.2	SSD-ResNet34 COCO 2017 (1200 x1200)	Object Detection	fp32	31.17333 img/s	0.03		1	4 cores per instance
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	int8	1289.95 fps	1.35		1	4 cores per instance
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	int8	1923.77 fps	1.83		116	1 instance per socket
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf16	648.58 fps	0.66		1	4 cores per instance
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf16	867.05 fps	0.87		64	1 instance per socket
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	fp32	151.29 fps	0.14		1	4 cores per instance
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	fp32	160.93 fps	0.15		64	1 instance per socket
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf32	215.11 fps	0.21		1	4 cores per instance
Intel PyTorch 2.1	ResNeXt101 32x16d ImageNet	Image Classification	bf32	241.98 fps	0.22		116	1 instance per socket
MLPerf Inference v3.1	RetinaNet (offline)	Object Detection	int8	284.75 samp/s			2	4 cores per instance
MLPerf Inference v3.1	RNN-T (offline)	Speech-to-text	int8+bf16	5,782.18 samp/s			256	4 cores per instance