Intel® FPGA AI Suite: IP Reference Manual

ID 768974
Date 4/05/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

2.2. Model Performance

The performance estimator tool (described in the Intel® FPGA AI Suite Compiler Reference Manual ) assumes the following fMAX values for FPGA devices:
  • Intel® Arria® 10: 265 MHz
  • Intel Agilex® 7: 400 Hz
These assumptions are reasonable and conservative for the standard speed bin. As shown by the results in this section, the achieved fMAX of the example design typically exceeds these assumptions.

The performance results for the designs below were achieved using the dla_build_example_design.py script that is included with the Intel® FPGA AI Suite. The script uses a standard (-2) speed bin with a single seed and does not use high-effort compiler settings. The runtime host uses CentOS7 on an Intel® Xeon® processor E5-1650 @ 3.5 GHz. This design uses a dedicated DDR interface for the IP. Performance varies based on the clock speed, the DDR latency and bandwidth, and, depending on the graph, the host CPU speed.

The architectures in the tables that follow are in the $COREDLA_ROOT/example_architectures/ directory. Review the README file in that directory for information about each architecture.

Details - Intel FPGA AI Suite V2023.1

Architecture fMAX ALMs DSPs M20Ks Registers
A10_FP16_Generic 315 MHz 25.6 k 162 485 68 k
A10_FP16_Performance 281 MHz 79.7 k 1114 1444 244 k
A10_Small_NoSoftmax 356 MHz 14.9 k 80 247 42 k
A10_Small_Softmax 353 MHz 16.1 k 90 255 45 k
A10_Generic 283 MHz 27.3 k 178 598 74 k
A10_Performance 300 MHz 52.6 k 602 910 160 k
AGX7_FP16_Generic 600 MHz 29. k 162 489 105 k
AGX7_FP16_Performance 600 MHz 91.1 k 1114 1477 315 k
AGX7_Small_NoSoftmax 600 MHz 16.9 k 80 296 57 k
AGX7_Small_Softmax 600 MHz 18.3 k 90 304 65 k
AGX7_Generic 600 MHz 30.7 k 178 751 110 k
AGX7_Performance 600 MHz 54.5 k 602 1222 189 k
AGX7_Performance_NoPrelu_NoEltwise 600 MHz 80.8 k 1162 2717 317 k

public/mobilenet-v1-1.0-224

Architecture ALMs DSPs DDR 1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 1203 91 71.2 89.5
A10_FP16_Performance 79.7 k 1114 4738 293 71.2 89.5
A10_Small_NoSoftmax 14.9 k 80 1176 101 69.9 89.1
A10_Small_Softmax 16.1 k 90 1173 100 69.9 89.1
A10_Generic 27.3 k 178 1156 122 69.9 89.1
A10_Performance 52.6 k 602 2869 321 69.6 88.9
AGX7_FP16_Generic 29. k 162 2241 169 71.2 89.5
AGX7_FP16_Performance 91.1 k 1114 9089 562 71.2 89.5
AGX7_Small_NoSoftmax 16.9 k 80 2719 165 70.8 89.5
AGX7_Small_Softmax 18.3 k 90 2729 165 70.9 89.4
AGX7_Generic 30.7 k 178 4029 238 70.9 89.4
AGX7_Performance 54.5 k 602 5920 373 70.9 89.5
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 10497 441 70.9 89.5

public/mobilenet-v2

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 2030 82 71.8 89.6
A10_FP16_Performance 79.7 k 1114 3699 192 71.7 89.6
A10_Small_NoSoftmax 14.9 k 80 2470 88 70.2 88.6
A10_Small_Softmax 16.1 k 90 2476 88 70.0 88.6
A10_Generic 27.3 k 178 992 99 70.0 88.6
A10_Performance 52.6 k 602 2308 211 70.1 88.1
AGX7_FP16_Generic 29. k 162 3609 146 71.8 89.6
AGX7_FP16_Performance 91.1 k 1114 7104 369 71.7 89.6
AGX7_Small_NoSoftmax 16.9 k 80 4460 137 71.6 89.7
AGX7_Small_Softmax 18.3 k 90 4468 137 71.6 89.6
AGX7_Generic 30.7 k 178 3246 187 71.6 89.6
AGX7_Performance 54.5 k 602 5773 270 71.8 89.4
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 9760 275 71.8 89.4

public/mobilenet-v2-1.4-224

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 2233 66 74.8 91.8
A10_FP16_Performance 79.7 k 1114 5094 163 74.8 91.9
A10_Generic 27.3 k 178 1636 77 73.2 91.0
A10_Performance 52.6 k 602 3285 182 72.2 90.4
AGX7_FP16_Generic 29. k 162 4030 119 74.8 91.8
AGX7_FP16_Performance 91.1 k 1114 8990 289 74.8 91.9
AGX7_Generic 30.7 k 178 4495 140 74.7 91.8
AGX7_Performance 54.5 k 602 7590 245 74.6 91.7
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 11780 252 74.6 91.7

public/mobilenet-v3-large-1.0-224-tf

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 2090 81 75.8 92.1
A10_FP16_Performance 79.7 k 1114 12817 29 75.8 92.1
AGX7_FP16_Generic 29. k 162 3693 143 75.8 92.1
AGX7_FP16_Performance 91.1 k 1114 17656 39 75.8 92.1
AGX7_Generic 30.7 k 178 5077 133 72.1 90.8
AGX7_Performance 54.5 k 602 17037 38 72.5 90.5

public/resnet-50-tf

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 1606 17 76.8 92.9
A10_FP16_Performance 79.7 k 1114 6619 93 76.8 92.9
A10_Small_NoSoftmax 14.9 k 80 2086 17 76.6 92.7
A10_Small_Softmax 16.1 k 90 2071 17 76.4 92.6
A10_Generic 27.3 k 178 1350 29 76.4 92.6
A10_Performance 52.6 k 602 4652 104 76.6 92.7
AGX7_FP16_Generic 29. k 162 3023 32 76.8 92.9
AGX7_FP16_Performance 91.1 k 1114 11575 163 76.8 92.9
AGX7_Small_NoSoftmax 16.9 k 80 5846 27 77.0 92.9
AGX7_Small_Softmax 18.3 k 90 5847 27 77.0 92.9
AGX7_Generic 30.7 k 178 4387 60 77.0 92.9
AGX7_Performance 54.5 k 602 10301 145 76.9 92.8
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 13697 208 76.9 92.8

Resnet50 v1 (Caffe)

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 1507 20 74.4 91.4
A10_FP16_Performance 79.7 k 1114 7031 113 74.4 91.4
A10_Small_NoSoftmax 14.9 k 80 1462 22 73.9 91.2
A10_Small_Softmax 16.1 k 90 1451 21 73.8 91.2
A10_Generic 27.3 k 178 1365 36 73.8 91.2
A10_Performance 52.6 k 602 4739 127 74.2 91.2
AGX7_FP16_Generic 29. k 162 2822 38 74.4 91.4
AGX7_FP16_Performance 91.1 k 1114 11948 191 74.4 91.4
AGX7_Small_NoSoftmax 16.9 k 80 4090 36 74.1 91.4
AGX7_Small_Softmax 18.3 k 90 4093 36 74.2 91.3
AGX7_Generic 30.7 k 178 4703 72 74.2 91.3
AGX7_Performance 54.5 k 602 10514 168 74.0 91.3
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 14288 228 74.0 91.3

intel/unet-camvid-onnx-0001

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

A10_FP16_Generic 25.6 k 162 435 0.56
A10_FP16_Performance 79.7 k 1114 2126 3.53
AGX7_FP16_Generic 29. k 162 785 1.03
AGX7_FP16_Performance 91.1 k 1114 4244 7.05
AGX7_Small_NoSoftmax 16.9 k 80 1090 1.05
AGX7_Small_Softmax 18.3 k 90 1074 1.04
AGX7_Generic 30.7 k 178 1216 1.98
AGX7_Performance 54.5 k 602 1771 3.01
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 5517 7.68

public/yolo-v3-tf

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

COCO AP mAP
A10_FP16_Generic 25.6 k 162 728 2.2 31.58 62.24
A10_FP16_Performance 79.7 k 1114 3078 13.5 31.57 62.24
A10_Generic 27.3 k 178 591 3.7 31.33 62.21
A10_Performance 52.6 k 602 1783 11.6 31.37 62.20
AGX7_FP16_Generic 29. k 162 1309 3.9 31.58 62.24
AGX7_FP16_Performance 91.1 k 1114 6281 27.6 31.57 62.24
AGX7_Generic 30.7 k 178 1622 7.0 31.49 62.23
AGX7_Performance 54.5 k 602 2405 10.6 31.47 62.24

public/yolo-v3-tiny-tf

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

COCO AP mAP
A10_FP16_Generic 25.6 k 162 544 18.7 14.77 35.81
A10_FP16_Performance 79.7 k 1114 2287 56.7 14.77 35.81
A10_Generic 27.3 k 178 684 32.4 14.78 35.69
A10_Performance 52.6 k 602 1408 44.7 14.69 35.69
AGX7_FP16_Generic 29. k 162 1041 35.7 14.77 35.81
AGX7_FP16_Performance 91.1 k 1114 4531 112.3 14.77 35.81
AGX7_Generic 30.7 k 178 1886 63.6 14.73 35.76
AGX7_Performance 54.5 k 602 1503 37.2 14.72 35.73

public/squeezenet1.1

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 1006 113 58.5 81.1
A10_FP16_Performance 79.7 k 1114 7904 280 58.5 81.0
A10_Small_NoSoftmax 14.9 k 80 762 129 58.8 80.9
A10_Small_Softmax 16.1 k 90 762 128 58.2 80.7
A10_Generic 27.3 k 178 12011 61 58.2 80.7
A10_Performance 52.6 k 602 5415 373 58.2 80.8
AGX7_FP16_Generic 29. k 162 1858 209 58.5 81.1
AGX7_FP16_Performance 91.1 k 1114 11269 400 58.5 81.0
AGX7_Small_NoSoftmax 16.9 k 80 2088 206 58.4 81.1
AGX7_Small_Softmax 18.3 k 90 2104 206 58.4 81.1
AGX7_Generic 30.7 k 178 17916 46 58.4 81.1
AGX7_Performance 54.5 k 602 8701 309 58.1 81.1
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 14859 264 58.1 81.1

public/i3d_rgb_tf

Architecture ALMs DSPs DDR1

[MB/s]

Throughput

[fps]

Top-1

[%]

Top-5

[%]

A10_FP16_Generic 25.6 k 162 113 0.16 64 85
A10_FP16_Performance 79.7 k 1114 1207 1.92 65 85
A10_Small_NoSoftmax 14.9 k 80 121 0.17 64 86
A10_Small_Softmax 16.1 k 90 120 0.17 65 86
A10_Generic 27.3 k 178 165 0.32 64 85
A10_Performance 52.6 k 602 591 1.01 63 87
AGX7_FP16_Generic 29. k 162 218 0.30 64 85
AGX7_FP16_Performance 91.1 k 1114 2403 3.82 65 85
AGX7_Small_NoSoftmax 16.9 k 80 239 0.28 65 86
AGX7_Small_Softmax 18.3 k 90 239 0.28 65 85
AGX7_Generic 30.7 k 178 363 0.65 65 85
AGX7_Performance 54.5 k 602 602 0.96 65 86
AGX7_Performance_NoPrelu_NoEltwise 80.8 k 1162 1920 2.35 65 86
* DDR is estimated minimum average read + write (that is, read + write require at least this much bandwidth on average). Peak bandwidth is higher.