Visible to Intel only — GUID: xis1520273381539
Ixiasoft
Visible to Intel only — GUID: xis1520273381539
Ixiasoft
10.1.2. Using a Single Kernel to Describe Systolic Arrays
Unoptimized multi-kernel systolic array pseudocode:
// data distribution network over an array of channels
channel int c[ROWS][COLS];
channel int d[ROWS][COLS];
attribute((num_compute_units(ROWS,COLS))
kernel void PE() {
// get data values from my neighbors
while(1){
x = read_channel_intel(c[ROWS-1][COLS]);
y = read_channel_inel(d[ROWS][COLS-1]);
// some code that uses x and y
...
// send the same data values to the next neighbors
write_channel_intel(c[ROWS][COLS], x);
write_channel_intel(d[ROWS][COLS], y);
}
}
Optimized single-kernel pseudocode:
kernel void allPEs() {
while(1){
int c[ROWS], d[COLS];
#pragma unroll
for (int i = 0; i < ROWS; i++)
#pragma unroll
for (int j = 0; j < COLS; j++) {
PE(c[i], d[j]);
}
}
}
}
Optimized pseudocode with the __fpga_reg() function:
kernel void allPEs() {
int c[ROWS], d[COLS];
while(1){
#pragma unroll
for (int i = 0; i < ROWS; i++)
#pragma unroll
for (int j = 0; j < COLS; j++) {
// compute and store outputs
PE(c[i], d[j]);
c[i] = __fpga_reg(c[i]);
d[j] = __fpga_reg(d[j]);
}
}
}
}
After the offline compiler unrolls the loop, there is one more register before every PE on both c and d, allowing the Intel® Quartus® Prime Pro Edition software to place the PEs apart. You may add more than one register by inserting multiple __fpga_reg() calls in your code. For example, the call __fpga_reg(__fpga_reg(x)) adds two registers on the data path. However, having excessive __fpga_reg() calls in your kernel increases the design area, and the congestion might result in fMAX degradation.