Developer Guide

Intel oneAPI FPGA Handbook

ID 785441
Date 2/07/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Kernel Variable Accesses

This section shows techniques you can use to optimize local and private variables in kernels.

Inferring a Shift Register

The shift register design pattern is a very important design pattern for efficient implementation of many applications on the FPGA. However, the implementation of a shift register design pattern might seem counter-intuitive at first.

Consider the following code example:

using InPipe = ext::intel::pipe<class PipeIn, int, 4>;
using OutPipe = ext::intel::pipe<class PipeOut, int, 4>; 

#define SIZE 512 
//Shift register size must be statically determinable 
// this function is used in kernel 
void foo() 
{ 
  int shift_reg[SIZE]; 
  //The key is that the array size is a compile time constant 
  // Initialization loop 
  #pragma unroll 
  for (int i = 0; i < SIZE; i++) 
  { 
    //All elements of the array should be initialized to the same value 
    shift_reg[i] = 0; 
  } 
  while(1)     { 
    // Fully unrolling the shifting loop produces constant accesses 
    #pragma unroll 
    for (int j = 0; j < SIZE–1; j++) 
    { 
      shift_reg[j] = shift_reg[j + 1]; 
    }  
       
    shift_reg[SIZE – 1] = InPipe::read(); 
    // Using fixed access points of the shift register 
    int res = (shift_reg[0] + shift_reg[1]) / 2;
       
    // ‘out’ pipe will have running average of the input pipe 
    OutPipe::write(res); 
  } 
}

In each clock cycle, the kernel shifts a new value into the array. By placing this shift register into a block RAM, the Intel® oneAPI DPC++/C++ Compiler can efficiently handle multiple access points into the array. The shift register design pattern is ideal for implementing filters (for example, image filters like a Sobel filter or time-delay filters like a finite impulse response (FIR) filter).

When implementing a shift register in your kernel code, remember the following key points:

  • Unroll the shifting loop so that it can access every element of the array.
  • All access points must have constant data accesses. For example, if you write a calculation in nested loops using multiple access points, unroll these loops to establish the constant access points.
  • Initialize all elements of the array to the same value. Alternatively, you may leave the elements uninitialized if you do not require a specific initial value.

Memory Access Considerations

Intel® recommends the following kernel programming strategies that can improve memory access efficiency and reduce area use of your kernel:

  • Minimize the number of access points to external memory to reduce area. The compiler infers an LSU for each access point in your kernel, which consumes area.

    If possible, structure your kernel such that it reads its input from one location, processes the data internally, and then writes the output to another location.

  • Instead of relying on local or global memory accesses, structure your kernel as a single work-item with shift register inference whenever possible.