2.2. Model Performance
- Arria® 10: 265 MHz
- Agilex™ 7: 400 MHz
The performance results for the designs that follow were achieved using the dla_build_example_design.py script that is included with the FPGA AI Suite. The script uses a standard (-2) speed bin with a single seed and uses high-effort compiler settings.
- Agilex™ 7 runtime host: SLES12 host on an Intel® Xeon® processor E5-1650 @ 3.5 GHz.
set_global_assignment -name ALLOW_SHIFT_REGISTER_MERGING_ACROSS_HIERARCHIES ALWAYS set_global_assignment -name DISABLE_REGISTER_MERGING_ACROSS_HIERARCHIES OFF
The architectures in the tables that follow are in the $COREDLA_ROOT/example_architectures/ directory. Review the README file in that directory for information about each architecture.
Details - FPGA AI Suite 2024.2
Architecture | fMAX | ALMs | DSPs | M20Ks | Registers |
---|---|---|---|---|---|
AGX7_FP16_Generic | 616 MHz | 32.5 k | 186 | 501 | 100 k |
AGX7_FP16_Performance | 600 MHz | 103. k | 1162 | 1533 | 346 k |
AGX7_Small_NoSoftmax | 612 MHz | 16.7 k | 80 | 296 | 54 k |
AGX7_Small_Softmax | 610 MHz | 18.3 k | 90 | 304 | 57 k |
AGX7_Generic | 600 MHz | 38.6 k | 202 | 778 | 126 k |
AGX7_Performance | 585 MHz | 70.5 k | 650 | 1278 | 209 k |
AGX7_Performance_Giant | 537 MHz | 127.9 k | 1546 | 2371 | 372 k |
public/mobilenet-v1-1.0-224
Architecture | ALMs | DSPs | DDR 1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 2325 | 176 | 71.2 | 89.5 |
AGX7_FP16_Performance | 103. k | 1162 | 8845 | 555 | 71.2 | 89.5 |
AGX7_Small_NoSoftmax | 16.7 k | 80 | 2774 | 168 | 70.9 | 89.6 |
AGX7_Small_Softmax | 18.3 k | 90 | 2765 | 167 | 70.9 | 89.5 |
AGX7_Generic | 38.6 k | 202 | 3247 | 250 | 70.9 | 89.5 |
AGX7_Performance | 70.5 k | 650 | 6231 | 397 | 70.9 | 89.5 |
AGX7_Performance_Giant | 127.9 k | 1546 | 4755 | 785 | 70.9 | 89.6 |
public/mobilenet-v2
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 3720 | 151 | 71.8 | 89.6 |
AGX7_FP16_Performance | 103. k | 1162 | 6979 | 374 | 71.8 | 89.6 |
AGX7_Small_NoSoftmax | 16.7 k | 80 | 4527 | 139 | 71.6 | 89.6 |
AGX7_Small_Softmax | 18.3 k | 90 | 4532 | 139 | 71.8 | 89.4 |
AGX7_Generic | 38.6 k | 202 | 2635 | 197 | 71.8 | 89.4 |
AGX7_Performance | 70.5 k | 650 | 5804 | 278 | 71.7 | 89.4 |
AGX7_Performance_Giant | 127.9 k | 1546 | 4242 | 720 | 71.7 | 89.4 |
public/mobilenet-v2-1.4-224
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 4194 | 125 | 74.8 | 91.9 |
AGX7_FP16_Performance | 103. k | 1162 | 8833 | 294 | 74.8 | 91.9 |
AGX7_Generic | 38.6 k | 202 | 4074 | 147 | 74.7 | 91.8 |
AGX7_Performance | 70.5 k | 650 | 6881 | 229 | 74.7 | 91.8 |
AGX7_Performance_Giant | 127.9 k | 1546 | 5729 | 644 | 74.6 | 91.8 |
public/mobilenet-v3-large-1.0-224-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 3831 | 171 | 75.8 | 92.1 |
AGX7_FP16_Performance | 103. k | 1162 | 11087 | 237 | 75.8 | 92.1 |
AGX7_Generic | 38.6 k | 202 | 4420 | 176 | 72.3 | 90.7 |
AGX7_Performance | 70.5 k | 650 | 9165 | 193 | 72.1 | 90.5 |
AGX7_Performance_Giant | 127.9 k | 1546 | 7637 | 319 | 72.4 | 90.4 |
public/resnet-50-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 3081 | 32 | 76.8 | 92.9 |
AGX7_FP16_Performance | 103. k | 1162 | 11626 | 164 | 76.8 | 92.9 |
AGX7_Small_NoSoftmax | 16.7 k | 80 | 5950 | 28 | 77.0 | 92.9 |
AGX7_Small_Softmax | 18.3 k | 90 | 5931 | 28 | 77.1 | 92.9 |
AGX7_Generic | 38.6 k | 202 | 4205 | 60 | 77.1 | 92.9 |
AGX7_Performance | 70.5 k | 650 | 10180 | 144 | 76.9 | 92.9 |
AGX7_Performance_Giant | 127.9 k | 1546 | 7838 | 230 | 76.9 | 92.8 |
Resnet50 v1 (Caffe)
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 2894 | 39 | 74.4 | 91.4 |
AGX7_FP16_Performance | 103. k | 1162 | 11970 | 193 | 74.4 | 91.4 |
AGX7_Small_NoSoftmax | 16.7 k | 80 | 4171 | 37 | 74.1 | 91.4 |
AGX7_Small_Softmax | 18.3 k | 90 | 4156 | 37 | 74.2 | 91.3 |
AGX7_Generic | 38.6 k | 202 | 4490 | 73 | 74.2 | 91.3 |
AGX7_Performance | 70.5 k | 650 | 10214 | 164 | 74.0 | 91.4 |
AGX7_Performance_Giant | 127.9 k | 1546 | 7595 | 245 | 74.1 | 91.4 |
intel/unet-camvid-onnx-0001
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 828 | 1.09 |
AGX7_FP16_Performance | 103. k | 1162 | 4298 | 7.14 |
AGX7_Small_NoSoftmax | 16.7 k | 80 | 1121 | 1.08 |
AGX7_Small_Softmax | 18.3 k | 90 | 1114 | 1.08 |
AGX7_Generic | 38.6 k | 202 | 1262 | 2.05 |
AGX7_Performance | 70.5 k | 650 | 3691 | 6.28 |
AGX7_Performance_Giant | 127.9 k | 1546 | 4198 | 9.06 |
public/yolo-v3-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 1422 | 4.2 | 62.27 | 31.58 |
AGX7_FP16_Performance | 103. k | 1162 | 6284 | 27.6 | 62.25 | 31.58 |
AGX7_Generic | 38.6 k | 202 | 1795 | 7.8 | 62.28 | 31.49 |
AGX7_Performance | 70.5 k | 650 | 2662 | 11.6 | 62.22 | 31.47 |
AGX7_Performance_Giant | 127.9 k | 1546 | 4918 | 30.0 | 62.25 | 31.46 |
public/yolo-v3-tiny-tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 1073 | 37 | 35.79 | 14.77 |
AGX7_FP16_Performance | 103. k | 1162 | 4567 | 113 | 35.81 | 14.78 |
AGX7_Generic | 38.6 k | 202 | 1969 | 66 | 35.76 | 14.74 |
AGX7_Performance | 70.5 k | 650 | 1604 | 40 | 35.73 | 14.72 |
AGX7_Performance_Giant | 127.9 k | 1546 | 2980 | 64 | 35.81 | 14.75 |
public/yolo-v8-nano detection
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Detection mAP @0.5 | Detection mAP @0.5:0.95 |
---|---|---|---|---|---|---|
AGX7_FP16_Performance | 103. k | 1162 | 6154 | 96 | 51.15 | 36.52 |
AGX7_Generic | 38.6 k | 202 | 1942 | 40 | 51.14 | 36.50 |
AGX7_Performance | 70.5 k | 650 | 2135 | 34 | 51.10 | 36.48 |
public/yolo-v8-nano classification
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Performance | 103. k | 1162 | 4761 | 636 | 67.92 | 87.72 |
AGX7_Generic | 38.6 k | 202 | 1628 | 280 | 67.96 | 87.86 |
AGX7_Performance | 70.5 k | 650 | 1232 | 164 | 67.72 | 87.72 |
public/squeezenet1.1
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 649 | 225 | 58.5 | 81.1 |
AGX7_FP16_Performance | 103. k | 1162 | 4468 | 898 | 58.5 | 81.1 |
AGX7_Small_NoSoftmax | 16.7 k | 80 | 924 | 220 | 58.5 | 81.0 |
AGX7_Small_Softmax | 18.3 k | 90 | 920 | 219 | 58.5 | 81.0 |
AGX7_Generic | 38.6 k | 202 | 1713 | 532 | 58.5 | 81.0 |
AGX7_Performance | 70.5 k | 650 | 2155 | 432 | 58.4 | 81.0 |
AGX7_Performance_Giant | 127.9 k | 1546 | 2767 | 724 | 58.3 | 81.1 |
public/i3d_rgb_tf
Architecture | ALMs | DSPs | DDR1 [MB/s] |
Throughput [fps] |
Top-1 [%] |
Top-5 [%] |
---|---|---|---|---|---|---|
AGX7_FP16_Generic | 32.5 k | 186 | 449 | 0.62 | 65.79 | 82.89 |
AGX7_FP16_Performance | 103. k | 1162 | 2393 | 3.87 | 65.79 | 82.89 |
AGX7_Small_NoSoftmax | 16.7 k | 80 | 488 | 0.58 | 65.35 | 83.11 |
AGX7_Small_Softmax | 18.3 k | 90 | 487 | 0.57 | 65.57 | 83.11 |
AGX7_Generic | 38.6 k | 202 | 728 | 1.33 | 65.57 | 83.11 |
AGX7_Performance | 70.5 k | 650 | 2341 | 3.78 | 65.13 | 83.11 |
AGX7_Performance_Giant | 127.9 k | 1546 | 2571 | 4.20 | 65.79 | 82.89 |