Visible to Intel only — GUID: heb1521224567785
Ixiasoft
Visible to Intel only — GUID: heb1521224567785
Ixiasoft
5.10.1. Inferring a Shift Register
Consider the following code example:
channel int in, out;
#define SIZE 512
//Shift register size must be statically determinable
__kernel void foo()
{
int shift_reg[SIZE];
//The key is that the array size is a compile time constant
// Initialization loop
#pragma unroll
for (int i=0; i < SIZE; i++)
{
//All elements of the array should be initialized to the same value
shift_reg[i] = 0;
}
while(1)
{
// Fully unrolling the shifting loop produces constant accesses
#pragma unroll
for (int j=0; j < SIZE–1; j++)
{
shift_reg[j] = shift_reg[j + 1];
}
shift_reg[SIZE – 1] = read_channel_intel(in);
// Using fixed access points of the shift register
int res = (shift_reg[0] + shift_reg[1]) / 2;
// ‘out’ channel will have running average of the input channel
write_channel_intel(out, res);
}
}
In each clock cycle, the kernel shifts a new value into the array. By placing this shift register into a block RAM, the Intel® FPGA SDK for OpenCL™ Offline Compiler can efficiently handle multiple access points into the array. The shift register design pattern is ideal for implementing filters (for example, image filters like a Sobel filter or time-delay filters like a finite impulse response (FIR) filter).
When implementing a shift register in your kernel code, keep in mind the following key points:
- Unroll the shifting loop so that it can access every element of the array.
- All access points must have constant data accesses. For example, if you write a calculation in nested loops using multiple access points, unroll these loops to establish the constant access points.
- Initialize all elements of the array to the same value. Alternatively, you may leave the elements uninitialized if you do not require a specific initial value.
- If some accesses to a large array are not inferable statically, they force the offline compiler to create inefficient hardware. If these accesses are necessary, use __local memory instead of __private memory.
- Do not shift a large shift register conditionally. The shifting must occur in very loop iteration that contains the shifting code to avoid creating inefficient hardware.