Visible to Intel only — GUID: GUID-F4B16EE1-CB9D-48FE-B525-B5F54DE96FB4
Visible to Intel only — GUID: GUID-F4B16EE1-CB9D-48FE-B525-B5F54DE96FB4
Kernel Variable Accesses
This section shows techniques you can use to optimize local and private variables in kernels.
Inferring a Shift Register
The shift register design pattern is a very important design pattern for efficient implementation of many applications on the FPGA. However, the implementation of a shift register design pattern might seem counter-intuitive at first.
Consider the following code example:
using InPipe = ext::intel::pipe<class PipeIn, int, 4>;
using OutPipe = ext::intel::pipe<class PipeOut, int, 4>;
#define SIZE 512
//Shift register size must be statically determinable
// this function is used in kernel
void foo()
{
int shift_reg[SIZE];
//The key is that the array size is a compile time constant
// Initialization loop
#pragma unroll
for (int i = 0; i < SIZE; i++)
{
//All elements of the array should be initialized to the same value
shift_reg[i] = 0;
}
while(1) {
// Fully unrolling the shifting loop produces constant accesses
#pragma unroll
for (int j = 0; j < SIZE–1; j++)
{
shift_reg[j] = shift_reg[j + 1];
}
shift_reg[SIZE – 1] = InPipe::read();
// Using fixed access points of the shift register
int res = (shift_reg[0] + shift_reg[1]) / 2;
// ‘out’ pipe will have running average of the input pipe
OutPipe::write(res);
}
}
In each clock cycle, the kernel shifts a new value into the array. By placing this shift register into a block RAM, the Intel® oneAPI DPC++/C++ Compiler can efficiently handle multiple access points into the array. The shift register design pattern is ideal for implementing filters (for example, image filters like a Sobel filter or time-delay filters like a finite impulse response (FIR) filter).
When implementing a shift register in your kernel code, remember the following key points:
- Unroll the shifting loop so that it can access every element of the array.
- All access points must have constant data accesses. For example, if you write a calculation in nested loops using multiple access points, unroll these loops to establish the constant access points.
- Initialize all elements of the array to the same value. Alternatively, you may leave the elements uninitialized if you do not require a specific initial value.
Memory Access Considerations
Intel® recommends the following kernel programming strategies that can improve memory access efficiency and reduce area use of your kernel:
- Minimize the number of access points to external memory to reduce area. The compiler infers an LSU for each access point in your kernel, which consumes area.
If possible, structure your kernel such that it reads its input from one location, processes the data internally, and then writes the output to another location.
- Instead of relying on local or global memory accesses, structure your kernel as a single work-item with shift register inference whenever possible.