Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/19/2022
Public
Document Table of Contents

5.10.1. Inferring a Shift Register

The shift register design pattern is a very important design pattern for efficient implementation of many applications on the FPGA. However, the implementation of a shift register design pattern might seem counterintuitive at first.

Consider the following code example:

channel int in, out;

#define SIZE 512
//Shift register size must be statically determinable

__kernel void foo()
{
    int shift_reg[SIZE];
	   //The key is that the array size is a compile time constant

    // Initialization loop
    #pragma unroll
	for (int i=0; i < SIZE; i++)
	{
        //All elements of the array should be initialized to the same value
	    shift_reg[i] = 0;
    }
	
	while(1)
    {
        // Fully unrolling the shifting loop produces constant accesses
        #pragma unroll
        for (int j=0; j < SIZE–1; j++)
	    {
            shift_reg[j] = shift_reg[j + 1];
        } 
        shift_reg[SIZE – 1] = read_channel_intel(in);

        // Using fixed access points of the shift register
        int res = (shift_reg[0] + shift_reg[1]) / 2;

        // ‘out’ channel will have running average of the input channel
        write_channel_intel(out, res);
	}
}

In each clock cycle, the kernel shifts a new value into the array. By placing this shift register into a block RAM, the Intel® FPGA SDK for OpenCL™ Offline Compiler can efficiently handle multiple access points into the array. The shift register design pattern is ideal for implementing filters (for example, image filters like a Sobel filter or time-delay filters like a finite impulse response (FIR) filter).

When implementing a shift register in your kernel code, keep in mind the following key points:

  1. Unroll the shifting loop so that it can access every element of the array.
  2. All access points must have constant data accesses. For example, if you write a calculation in nested loops using multiple access points, unroll these loops to establish the constant access points.
  3. Initialize all elements of the array to the same value. Alternatively, you may leave the elements uninitialized if you do not require a specific initial value.
  4. If some accesses to a large array are not inferable statically, they force the offline compiler to create inefficient hardware. If these accesses are necessary, use __local memory instead of __private memory.
  5. Do not shift a large shift register conditionally. The shifting must occur in very loop iteration that contains the shifting code to avoid creating inefficient hardware.