Visible to Intel only — GUID: ufb1573071592179
Ixiasoft
Visible to Intel only — GUID: ufb1573071592179
Ixiasoft
A.3.1. Loop Analysis Example
Figure 4 shows an example High Level Design Report (report.html) file that shows the loop analysis of a component design taken from the transpose_and_fold.cpp file (part of the tutorial files provided in <quartus_installdir>/hls/examples/tutorials/best_practices/loop_memory_dependency).
Consider the following example code snippet for transpose_and_fold.cpp:
01: #include "HLS/hls.h"
02: #include <stdio.h>
03: #include <stdlib.h>
04:
05: #define SIZE 32
06:
07: typedef ihc::stream_in<int> my_operand;
08: typedef ihc::stream_out<int> my_result;
09:
10: component void transpose_and_fold(my_operand &data_in, my_result &res)
11: {
12: int i;
13: int j;
14: int in_buf[SIZE][SIZE];
15: int tmp_buf[SIZE][SIZE];
16: for (i = 0; i < SIZE * SIZE; i++) {
17: in_buf[i / SIZE][i % SIZE] = data_in.read();
18: tmp_buf[i / SIZE][i % SIZE] = 0;
19: }
20:
21: #ifdef USE_IVDEP
22: #pragma ivdep safelen(SIZE)
23: #endif
24: for (j = 0; j < SIZE * SIZE * SIZE; j++) {
25: #pragma unroll
26: for (i = 0; i < SIZE; i++) {
27: tmp_buf[j % SIZE][i] += in_buf[i][j % SIZE];
28: }
29: }
30: for (i = 0; i < SIZE * SIZE; i++) {
31: res.write(tmp_buf[i / SIZE][i % SIZE]);
32: }
33: }
The transpose_and_fold component has four loops. The loop analysis report shows that the compiler performed different kinds of loop optimizations:
- The loop on line 26 is fully unrolled, as defined by #pragma unroll.
- The loops on lines 16 and 30 are pipelined with an II value of ~1. The value is ~1 because both loops contain access to streams that could stall. If these access stall, then the loop II becomes greater than 1.
The Block1.start loop in the loop analysis report is not present in the code. It is an implicit infinite loop that the compiler adds to allow the component to run continuously, instead of only once. In hardware, the component run continuously and checks its inputs to see if it should start executing.