2.2. Model Performance
- Arria® 10: 265 MHz
- Agilex™ 7: 400 MHz
The performance results for the designs that follow were achieved using the dla_build_example_design.py script that is included with the FPGA AI Suite. The script uses a standard (-2) speed bin with a single seed and uses high-effort compiler settings.
- Agilex™ 7 runtime host: SUSE Linux Enterprise Server 15 host on an Intel® Xeon® processor E5-1650 @ 3.5 GHz.
set_global_assignment -name ALLOW_SHIFT_REGISTER_MERGING_ACROSS_HIERARCHIES ALWAYS set_global_assignment -name DISABLE_REGISTER_MERGING_ACROSS_HIERARCHIES OFF
The architectures in the tables that follow are in the $COREDLA_ROOT/example_architectures/ directory. Review the README file in that directory for information about each architecture.
The IP Throughput column in the tables that follow shows the performance for the portion of the graph that runs on the FPGA device. In many cases, the entire graph runs on the FPGA device. The IP Throughput is representative of performance if the IP is used in a hostless configuration.
The IP+host Throughput column in the tables that follow shows the performance including the host. The IP+host performance may be lower than IP-only performance if the host is unable to stream data to the FPGA device quickly enough, or if the host is limited by some of the processing associated with the graph (for example, the host performs NMS for the YOLOv3 graph). Achievable IP+host performance depends on the speed and loading of the host and the FPGA AI Suite IP.
Details - FPGA AI Suite 2024.3
Architecture | fMAX | ALMs | DSPs | M20Ks | Registers |
---|---|---|---|---|---|
AGX7_FP16_Generic | 600 MHz | 33.6 k | 186 | 511 | 95 k |
AGX7_FP16_Performance | 605 MHz | 103.9 k | 1162 | 1533 | 324 k |
AGX7_Small_NoSoftmax | 610 MHz | 17.2 k | 80 | 296 | 49 k |
AGX7_Small_Softmax | 616 MHz | 18.6 k | 90 | 304 | 57 k |
AGX7_Generic | 600 MHz | 38.9 k | 202 | 778 | 113 k |
AGX7_Performance | 585 MHz | 70.5 k | 650 | 1278 | 207 k |
AGX7_Performance_Giant | 535 MHz | 127.8 k | 1546 | 2371 | 359 k |
public/mobilenet-v1-1.0-224
Architecture | ALMs | DSPs | DDR 1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 2261 | 171 | 171 | 71.2 | 89.5 |
AGX7_FP16_Performance | 103.9 k | 1162 | 9117 | 572 | 567 | 71.2 | 89.5 |
AGX7_Small_NoSoftmax | 17.2 k | 80 | 2770 | 167 | 167 | 70.9 | 89.6 |
AGX7_Small_Softmax | 18.6 k | 90 | 2796 | 169 | 168 | 70.9 | 89.5 |
AGX7_Generic | 38.9 k | 202 | 3306 | 255 | 251 | 70.9 | 89.5 |
AGX7_Performance | 70.5 k | 650 | 8893 | 566 | 399 | 70.9 | 89.5 |
AGX7_Performance_Giant | 127.8 k | 1546 | 8987 | 1483 | 764 | 71.0 | 89.6 |
public/mobilenet-v2
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 3653 | 148 | 147 | 71.8 | 89.6 |
AGX7_FP16_Performance | 103.9 k | 1162 | 6948 | 372 | 367 | 71.8 | 89.6 |
AGX7_Small_NoSoftmax | 17.2 k | 80 | 4609 | 141 | 138 | 71.6 | 89.6 |
AGX7_Small_Softmax | 18.6 k | 90 | 4645 | 142 | 139 | 71.8 | 89.4 |
AGX7_Generic | 38.9 k | 202 | 2720 | 203 | 198 | 71.8 | 89.4 |
AGX7_Performance | 70.5 k | 650 | 7166 | 343 | 276 | 71.7 | 89.4 |
AGX7_Performance_Giant | 127.8 k | 1546 | 6370 | 1081 | 726 | 71.8 | 89.4 |
public/mobilenet-v2-1.4-224
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 4085 | 122 | 121 | 74.8 | 91.9 |
AGX7_FP16_Performance | 103.9 k | 1162 | 8717 | 290 | 288 | 74.8 | 91.9 |
AGX7_Generic | 38.9 k | 202 | 4184 | 151 | 145 | 74.7 | 91.8 |
AGX7_Performance | 70.5 k | 650 | 8716 | 290 | 226 | 74.7 | 91.8 |
AGX7_Performance_Giant | 127.8 k | 1546 | 7539 | 847 | 618 | 74.7 | 91.7 |
public/mobilenet-v3-large-1.0-224-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 3774 | 169 | 165 | 75.8 | 92.1 |
AGX7_FP16_Performance | 103.9 k | 1162 | 11260 | 240 | 234 | 75.8 | 92.1 |
AGX7_Generic | 38.9 k | 202 | 4530 | 181 | 174 | 72.3 | 90.7 |
AGX7_Performance | 70.5 k | 650 | 11293 | 246 | 201 | 72.1 | 90.5 |
AGX7_Performance_Giant | 127.8 k | 1546 | 8492 | 355 | 304 | 72.6 | 90.6 |
public/resnet-50-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 3005 | 32 | 32 | 76.8 | 92.9 |
AGX7_FP16_Performance | 103.9 k | 1162 | 11715 | 166 | 164 | 76.8 | 92.9 |
AGX7_Small_NoSoftmax | 17.2 k | 80 | 5935 | 28 | 28 | 77.0 | 92.9 |
AGX7_Small_Softmax | 18.6 k | 90 | 5989 | 28 | 28 | 77.1 | 92.9 |
AGX7_Generic | 38.9 k | 202 | 4206 | 60 | 60 | 77.1 | 92.9 |
AGX7_Performance | 70.5 k | 650 | 11540 | 163 | 143 | 76.9 | 92.9 |
AGX7_Performance_Giant | 127.8 k | 1546 | 8067 | 237 | 229 | 76.9 | 92.8 |
Resnet50 v1 (Caffe)
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 2822 | 38 | 38 | 74.4 | 91.4 |
AGX7_FP16_Performance | 103.9 k | 1162 | 12139 | 195 | 195 | 74.4 | 91.4 |
AGX7_Small_NoSoftmax | 17.2 k | 80 | 4161 | 37 | 37 | 74.1 | 91.4 |
AGX7_Small_Softmax | 18.6 k | 90 | 4203 | 37 | 37 | 74.2 | 91.3 |
AGX7_Generic | 38.9 k | 202 | 4489 | 73 | 73 | 74.2 | 91.3 |
AGX7_Performance | 70.5 k | 650 | 12119 | 195 | 162 | 74.0 | 91.4 |
AGX7_Performance_Giant | 127.8 k | 1546 | 8379 | 270 | 247 | 74.1 | 91.4 |
intel/unet-camvid-onnx-0001
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 825 | 1.09 |
AGX7_FP16_Performance | 103.9 k | 1162 | 4552 | 7.57 |
AGX7_Small_NoSoftmax | 17.2 k | 80 | 1140 | 1.10 |
AGX7_Small_Softmax | 18.6 k | 90 | 1153 | 1.11 |
AGX7_Generic | 38.9 k | 202 | 1319 | 2.14 |
AGX7_Performance | 70.5 k | 650 | 4331 | 7.36 |
AGX7_Performance_Giant | 127.8 k | 1546 | 5426 | 11.71 |
public/yolo-v3-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 1428 | 4.2 | 4 | 62.27 | 31.58 |
AGX7_FP16_Performance | 103.9 k | 1162 | 6347 | 27.9 | 28 | 62.25 | 31.58 |
AGX7_Generic | 38.9 k | 202 | 1901 | 8.2 | 8 | 62.28 | 31.49 |
AGX7_Performance | 70.5 k | 650 | 6170 | 27.0 | 11 | 62.22 | 31.47 |
AGX7_Performance_Giant | 127.8 k | 1546 | 6634 | 40.5 | 30 | 62.25 | 31.46 |
public/yolo-v3-tiny-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 1200 | 41 | 36 | 35.79 | 14.77 |
AGX7_FP16_Performance | 103.9 k | 1162 | 4680 | 116 | 113 | 35.81 | 14.78 |
AGX7_Generic | 38.9 k | 202 | 2433 | 82 | 66 | 35.76 | 14.74 |
AGX7_Performance | 70.5 k | 650 | 4647 | 115 | 40 | 35.73 | 14.72 |
AGX7_Performance_Giant | 127.8 k | 1546 | 5028 | 109 | 64 | 35.81 | 14.75 |
public/yolo-v8-nano detection
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|---|
AGX7_FP16_Performance | 103.9 k | 1162 | 6728 | 94 | 91 | 51.15 | 36.52 |
AGX7_Generic | 38.9 k | 202 | 2427 | 50 | 39 | 51.14 | 36.50 |
AGX7_Performance | 70.5 k | 650 | 6720 | 95 | 32 | 51.10 | 36.48 |
public/yolo-v8-nano classification
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Performance | 103.9 k | 1162 | 10345 | 1384 | 67.92 | 87.72 |
AGX7_Generic | 38.9 k | 202 | 5489 | 943 | 67.96 | 87.86 |
AGX7_Performance | 70.5 k | 650 | 10178 | 1358 | 67.72 | 87.72 |
public/squeezenet1.1
Architecture | ALMs | DSPs | DDR1 [MB/s] |
IP Throughput [fps] |
IP+host Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 631 | 218 | 219 | 58.5 | 81.1 |
AGX7_FP16_Performance | 103.9 k | 1162 | 4679 | 940 | 886 | 58.5 | 81.1 |
AGX7_Small_NoSoftmax | 17.2 k | 80 | 923 | 220 | 219 | 58.5 | 81.0 |
AGX7_Small_Softmax | 18.6 k | 90 | 933 | 222 | 222 | 58.5 | 81.0 |
AGX7_Generic | 38.9 k | 202 | 1722 | 535 | 536 | 58.5 | 81.0 |
AGX7_Performance | 70.5 k | 650 | 4654 | 932 | 419 | 58.4 | 81.0 |
AGX7_Performance_Giant | 127.8 k | 1546 | 3631 | 951 | 735 | 58.3 | 81.1 |
public/i3d_rgb_tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 33.6 k | 186 | 442 | 0.61 | 65.79 | 82.89 |
AGX7_FP16_Performance | 103.9 k | 1162 | 2562 | 4.14 | 65.79 | 82.89 |
AGX7_Small_NoSoftmax | 17.2 k | 80 | 492 | 0.58 | 65.35 | 82.89 |
AGX7_Small_Softmax | 18.6 k | 90 | 496 | 0.59 | 65.57 | 82.89 |
AGX7_Generic | 38.9 k | 202 | 742 | 1.36 | 65.57 | 83.11 |
AGX7_Performance | 70.5 k | 650 | 2486 | 4.01 | 65.13 | 83.11 |
AGX7_Performance_Giant | 127.8 k | 1546 | 2839 | 4.64 | 65.79 | 82.89 |