2.2. Model Performance
- Intel® Arria® 10: 265 MHz
- Intel Agilex® 7: 400 Hz
The performance results for the designs that follow were achieved using the dla_build_example_design.py script that is included with the Intel® FPGA AI Suite. The script uses a standard (-2) speed bin with a single seed and uses high-effort compiler settings.
- Intel® Arria® 10 runtime host: CentOS7 host on an Intel® Xeon® processor E5-1650 @ 3.6 GHz
- Intel Agilex® 7 runtime host: SLES12 host on an Intel® Xeon® processor E5-1650 @ 3.5 GHz.
set_global_assignment -name ALLOW_SHIFT_REGISTER_MERGING_ACROSS_HIERARCHIES ALWAYS set_global_assignment -name DISABLE_REGISTER_MERGING_ACROSS_HIERARCHIES OFF
The architectures in the tables that follow are in the $COREDLA_ROOT/example_architectures/ directory. Review the README file in that directory for information about each architecture.
Details - Intel FPGA AI Suite V2023.2
Architecture | fMAX | ALMs | DSPs | M20Ks | Registers |
---|---|---|---|---|---|
A10_FP16_Generic | 327 MHz | 25.2 k | 162 | 485 | 66 k |
A10_FP16_Performance | 296 MHz | 78.1 k | 1114 | 1443 | 239 k |
A10_Small_NoSoftmax | 348 MHz | 14.5 k | 80 | 247 | 40 k |
A10_Small_Softmax | 331 MHz | 15.8 k | 90 | 255 | 43 k |
A10_Generic | 306 MHz | 26.9 k | 178 | 597 | 72 k |
A10_Performance | 302 MHz | 51.6 k | 602 | 909 | 156 k |
AGX7_FP16_Generic | 615 MHz | 28. k | 162 | 504 | 94 k |
AGX7_FP16_Performance | 600 MHz | 91.1 k | 1114 | 1505 | 310 k |
AGX7_Small_NoSoftmax | 615 MHz | 16.1 k | 80 | 307 | 56 k |
AGX7_Small_Softmax | 615 MHz | 17.3 k | 90 | 315 | 62 k |
AGX7_Generic | 600 MHz | 30.1 k | 178 | 765 | 108 k |
AGX7_Performance | 570 MHz | 54.8 k | 602 | 1221 | 179 k |
AGX7_Performance_NoPrelu_NoEltwise | 595 MHz | 84. k | 1162 | 2795 | 295 k |
public/mobilenet-v1-1.0-224
Architecture | ALMs | DSPs | DDR 1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 1248 | 94 | 71.2 | 89.5 |
A10_FP16_Performance | 78.1 k | 1114 | 4949 | 306 | 71.2 | 89.5 |
A10_Small_NoSoftmax | 14.5 k | 80 | 1151 | 99 | 69.8 | 89.1 |
A10_Small_Softmax | 15.8 k | 90 | 1103 | 94 | 69.6 | 89.0 |
A10_Generic | 26.9 k | 178 | 1244 | 131 | 69.6 | 89.0 |
A10_Performance | 51.6 k | 602 | 2890 | 323 | 70.0 | 88.9 |
AGX7_FP16_Generic | 28. k | 162 | 2295 | 173 | 71.2 | 89.5 |
AGX7_FP16_Performance | 91.1 k | 1114 | 8969 | 555 | 71.2 | 89.5 |
AGX7_Small_NoSoftmax | 16.1 k | 80 | 2781 | 168 | 70.8 | 89.6 |
AGX7_Small_Softmax | 17.3 k | 90 | 2793 | 168 | 70.9 | 89.5 |
AGX7_Generic | 30.1 k | 178 | 4002 | 237 | 70.9 | 89.5 |
AGX7_Performance | 54.8 k | 602 | 6034 | 380 | 70.9 | 89.5 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 11923 | 501 | 70.9 | 89.5 |
public/mobilenet-v2
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 2098 | 85 | 71.8 | 89.6 |
A10_FP16_Performance | 78.1 k | 1114 | 3861 | 201 | 71.7 | 89.6 |
A10_Small_NoSoftmax | 14.5 k | 80 | 2426 | 86 | 70.1 | 88.6 |
A10_Small_Softmax | 15.8 k | 90 | 2349 | 83 | 70.0 | 88.7 |
A10_Generic | 26.9 k | 178 | 1067 | 107 | 70.0 | 88.7 |
A10_Performance | 51.6 k | 602 | 2324 | 213 | 69.6 | 88.3 |
AGX7_FP16_Generic | 28. k | 162 | 3691 | 150 | 71.8 | 89.6 |
AGX7_FP16_Performance | 91.1 k | 1114 | 7095 | 369 | 71.7 | 89.6 |
AGX7_Small_NoSoftmax | 16.1 k | 80 | 4522 | 139 | 71.7 | 89.6 |
AGX7_Small_Softmax | 17.3 k | 90 | 4551 | 139 | 71.8 | 89.5 |
AGX7_Generic | 30.1 k | 178 | 3290 | 190 | 71.8 | 89.5 |
AGX7_Performance | 54.8 k | 602 | 5725 | 268 | 71.7 | 89.4 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 9669 | 273 | 71.7 | 89.4 |
public/mobilenet-v2-1.4-224
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 2309 | 68 | 74.8 | 91.9 |
A10_FP16_Performance | 78.1 k | 1114 | 5303 | 170 | 74.9 | 91.8 |
A10_Generic | 26.9 k | 178 | 1751 | 83 | 73.1 | 90.9 |
A10_Performance | 51.6 k | 602 | 3304 | 183 | 72.4 | 90.4 |
AGX7_FP16_Generic | 28. k | 162 | 4117 | 122 | 74.8 | 91.9 |
AGX7_FP16_Performance | 91.1 k | 1114 | 8964 | 288 | 74.9 | 91.8 |
AGX7_Generic | 30.1 k | 178 | 4456 | 139 | 74.7 | 91.8 |
AGX7_Performance | 54.8 k | 602 | 7241 | 233 | 74.7 | 91.7 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 11751 | 251 | 74.7 | 91.7 |
public/mobilenet-v3-large-1.0-224-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 2161 | 85 | 75.8 | 92.1 |
A10_FP16_Performance | 78.1 k | 1114 | 13061 | 29 | 75.8 | 92.1 |
AGX7_FP16_Generic | 28. k | 162 | 3782 | 149 | 75.8 | 92.1 |
AGX7_FP16_Performance | 91.1 k | 1114 | 17695 | 40 | 75.8 | 92.1 |
AGX7_Generic | 30.1 k | 178 | 2395 | 64 | 72.3 | 90.7 |
AGX7_Performance | 54.8 k | 602 | 4205 | 10 | 72.3 | 90.5 |
public/resnet-50-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 1664 | 18 | 76.8 | 92.9 |
A10_FP16_Performance | 78.1 k | 1114 | 6868 | 97 | 76.8 | 92.9 |
A10_Small_NoSoftmax | 14.5 k | 80 | 2041 | 17 | 76.6 | 92.7 |
A10_Small_Softmax | 15.8 k | 90 | 1947 | 16 | 76.4 | 92.6 |
A10_Generic | 26.9 k | 178 | 1454 | 32 | 76.4 | 92.6 |
A10_Performance | 51.6 k | 602 | 4654 | 104 | 76.6 | 92.7 |
AGX7_FP16_Generic | 28. k | 162 | 3074 | 32 | 76.8 | 92.9 |
AGX7_FP16_Performance | 91.1 k | 1114 | 11525 | 163 | 76.8 | 92.9 |
AGX7_Small_NoSoftmax | 16.1 k | 80 | 5970 | 28 | 77.0 | 92.9 |
AGX7_Small_Softmax | 17.3 k | 90 | 5970 | 28 | 77.0 | 92.9 |
AGX7_Generic | 30.1 k | 178 | 4387 | 60 | 77.0 | 92.9 |
AGX7_Performance | 54.8 k | 602 | 10136 | 143 | 76.9 | 92.8 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 13722 | 208 | 76.9 | 92.8 |
Resnet50 v1 (Caffe)
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 1563 | 21 | 74.4 | 91.4 |
A10_FP16_Performance | 78.1 k | 1114 | 7263 | 116 | 74.4 | 91.4 |
A10_Small_NoSoftmax | 14.5 k | 80 | 1431 | 21 | 73.9 | 91.2 |
A10_Small_Softmax | 15.8 k | 90 | 1364 | 20 | 73.8 | 91.2 |
A10_Generic | 26.9 k | 178 | 1471 | 38 | 73.8 | 91.2 |
A10_Performance | 51.6 k | 602 | 4757 | 128 | 74.2 | 91.2 |
AGX7_FP16_Generic | 28. k | 162 | 2889 | 39 | 74.4 | 91.4 |
AGX7_FP16_Performance | 91.1 k | 1114 | 12028 | 193 | 74.4 | 91.4 |
AGX7_Small_NoSoftmax | 16.1 k | 80 | 4186 | 37 | 74.1 | 91.4 |
AGX7_Small_Softmax | 17.3 k | 90 | 4189 | 37 | 74.2 | 91.3 |
AGX7_Generic | 30.1 k | 178 | 4704 | 72 | 74.2 | 91.3 |
AGX7_Performance | 54.8 k | 602 | 10259 | 164 | 74.0 | 91.3 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 14336 | 228 | 74.0 | 91.3 |
intel/unet-camvid-onnx-0001
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 459 | 0.59 |
A10_FP16_Performance | 78.1 k | 1114 | 2250 | 3.74 |
AGX7_FP16_Generic | 28. k | 162 | 833 | 1.10 |
AGX7_FP16_Performance | 91.1 k | 1114 | 4335 | 7.21 |
AGX7_Small_NoSoftmax | 16.1 k | 80 | 1135 | 1.10 |
AGX7_Small_Softmax | 17.3 k | 90 | 1134 | 1.10 |
AGX7_Generic | 30.1 k | 178 | 1283 | 2.09 |
AGX7_Performance | 54.8 k | 602 | 1894 | 3.22 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 6066 | 8.44 |
public/yolo-v3-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
COCO AP | mAP |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 767 | 2.3 | 31.58 | 62.27 |
A10_FP16_Performance | 78.1 k | 1114 | 3232 | 14.2 | 31.58 | 62.25 |
A10_Generic | 26.9 k | 178 | 666 | 4.1 | 31.26 | 62.07 |
A10_Performance | 51.6 k | 602 | 1932 | 12.5 | 31.32 | 62.25 |
AGX7_FP16_Generic | 28. k | 162 | 1425 | 4.2 | 31.58 | 62.27 |
AGX7_FP16_Performance | 91.1 k | 1114 | 6262 | 27.5 | 31.58 | 62.25 |
AGX7_Generic | 30.1 k | 178 | 1818 | 7.9 | 31.48 | 62.21 |
AGX7_Performance | 54.8 k | 602 | 2649 | 11.7 | 31.47 | 62.22 |
public/yolo-v3-tiny-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
COCO AP | mAP |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 583 | 20 | 14.77 | 35.79 |
A10_FP16_Performance | 78.1 k | 1114 | 2401 | 60 | 14.78 | 35.81 |
A10_Generic | 26.9 k | 178 | 790 | 37 | 14.78 | 35.76 |
A10_Performance | 51.6 k | 602 | 1500 | 48 | 14.70 | 35.71 |
AGX7_FP16_Generic | 28. k | 162 | 1095 | 38 | 14.77 | 35.79 |
AGX7_FP16_Performance | 91.1 k | 1114 | 4558 | 113 | 14.78 | 35.81 |
AGX7_Generic | 30.1 k | 178 | 1989 | 67 | 14.74 | 35.76 |
AGX7_Performance | 54.8 k | 602 | 1569 | 39 | 14.72 | 35.73 |
public/squeezenet1.1
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 1043 | 117 | 58.5 | 81.1 |
A10_FP16_Performance | 78.1 k | 1114 | 8134 | 289 | 58.5 | 81.1 |
A10_Small_NoSoftmax | 14.5 k | 80 | 746 | 126 | 58.9 | 81.0 |
A10_Small_Softmax | 15.8 k | 90 | 716 | 120 | 58.1 | 81.1 |
A10_Generic | 26.9 k | 178 | 12219 | 62 | 58.1 | 81.1 |
A10_Performance | 51.6 k | 602 | 5446 | 375 | 58.8 | 81.1 |
AGX7_FP16_Generic | 28. k | 162 | 1904 | 214 | 58.5 | 81.1 |
AGX7_FP16_Performance | 91.1 k | 1114 | 12682 | 450 | 58.5 | 81.1 |
AGX7_Small_NoSoftmax | 16.1 k | 80 | 2140 | 211 | 58.5 | 81.0 |
AGX7_Small_Softmax | 17.3 k | 90 | 2155 | 211 | 58.5 | 81.0 |
AGX7_Generic | 30.1 k | 178 | 17918 | 46 | 58.5 | 81.0 |
AGX7_Performance | 54.8 k | 602 | 9152 | 325 | 58.4 | 81.0 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 14731 | 262 | 58.4 | 81.0 |
public/i3d_rgb_tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
A10_FP16_Generic | 25.2 k | 162 | 117 | 0.17 | 65.79 | 82.89 |
A10_FP16_Performance | 78.1 k | 1114 | 1268 | 2.01 | 65.79 | 82.89 |
A10_Small_NoSoftmax | 14.5 k | 80 | 119 | 0.16 | 66.01 | 83.77 |
A10_Small_Softmax | 15.8 k | 90 | 113 | 0.16 | 66.23 | 83.11 |
A10_Generic | 26.9 k | 178 | 178 | 0.34 | 66.23 | 83.11 |
A10_Performance | 51.6 k | 602 | 602 | 1.03 | 66.67 | 83.77 |
AGX7_FP16_Generic | 28. k | 162 | 224 | 0.31 | 65.79 | 82.89 |
AGX7_FP16_Performance | 91.1 k | 1114 | 2432 | 3.87 | 65.79 | 82.89 |
AGX7_Small_NoSoftmax | 16.1 k | 80 | 245 | 0.29 | 65.35 | 83.11 |
AGX7_Small_Softmax | 17.3 k | 90 | 245 | 0.29 | 65.57 | 83.11 |
AGX7_Generic | 30.1 k | 178 | 364 | 0.66 | 65.57 | 83.11 |
AGX7_Performance | 54.8 k | 602 | 582 | 0.93 | 65.13 | 83.11 |
AGX7_Performance_NoPrelu_NoEltwise | 84. k | 1162 | 1947 | 2.39 | 65.13 | 83.11 |