4.1.3. Optimizing Buffer Inference for Channels or Pipes

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

Download PDF

ID 683521

Date 3/28/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

4.1.3. Optimizing Buffer Inference for Channels or Pipes

In addition to the manual addition of buffered channels or pipes, the Intel® FPGA SDK for OpenCL™ Offline Compiler improves kernel throughput by adjusting buffer sizes whenever possible.

During compilation, the offline compiler computes scheduling mismatches between interacting channels or pipes. These mismatches might cause imbalances between read and write operations. The offline compiler performs buffer inference optimization automatically to correct the imbalance.

Consider the following examples:

Kernels with Channels Kernels with Pipes

Table 16. Buffer Inference Optimization for Channels and Pipes
Kernels with Channels	Kernels with Pipes
`__kernel void producer ( __global const uint * restrict src, const uint iterations) { for(int i = 0; i < iteration; i++) { write_channel_intel(c0,src[2i]); write_channel_intel(c1,src[2i+1]); } } __kernel void consumer ( __global uint * restrict dst, const uint iterations) { for(int i = 0; i < iterations; i++) { dst[2i] = read_channel_intel(c0); dst[2i+1] = read_channel_intel(c1); } }`	__kernel void producer ( __global const uint * restrict src, const uint iterations, write_only pipe uint __attribute__((blocking)) c0, write_only pipe uint __attribute__((blocking)) c1) { for(int i = 0; i < iteration; i++) { write_pipe(c0,&src[2i]); write_pipe(c1,&src[2i+1]); } } __kernel void consumer ( __global uint * restrict dst, const uint iterations, read_only pipe uint __attribute__((blocking)) c0, read_only pipe uint __attribute__((blocking)) c1) { for(int i = 0; i < iterations; i++) { read_pipe(c0,&dst[2i]); read_pipe(c1,&dst[2i+1]); } }

__kernel void producer (
  __global const uint * restrict src,
  const uint iterations)
{
  for(int i = 0; i < iteration; i++)
  {
    write_channel_intel(c0,src[2*i]);
    write_channel_intel(c1,src[2*i+1]);
  }
}

__kernel void consumer (
  __global uint * restrict dst,
  const uint iterations)
{
  for(int i = 0; i < iterations; i++)
  {
    dst[2*i] = read_channel_intel(c0);
    dst[2*i+1] = read_channel_intel(c1);
  }
}

__kernel void producer (
  __global const uint * restrict src,
  const uint iterations,
  write_only pipe uint
    __attribute__((blocking)) c0,
  write_only pipe uint
    __attribute__((blocking)) c1)
{
  for(int i = 0; i < iteration; i++)
  {
    write_pipe(c0,&src[2*i]);
    write_pipe(c1,&src[2*i+1]);
  }
}

__kernel void consumer (
  __global uint * restrict dst,
  const uint iterations,
  read_only pipe uint
    __attribute__((blocking)) c0,
  read_only pipe uint
    __attribute__((blocking)) c1)
{
  for(int i = 0; i < iterations; i++)
  {
    read_pipe(c0,&dst[2*i]);
    read_pipe(c1,&dst[2*i+1]);
  }
}

The offline compiler performs buffer inference optimization if channels or pipes between kernels cannot form a cycle. A cycle between kernels is a path that originates from a kernel, through a write channel or a write pipe call, and returns to the original kernel. For the example, assume that the write channel or write pipe calls in the kernel producer are scheduled 10 cycles apart and the read channel or read pipe calls are scheduled 15 cycles apart. There exists a temporary mismatch in the read and write operations to c1 because five extra write operations might occur before a read operation to c1 occurs. To correct this imbalance, the offline compiler assigns a buffer size of five cycles to c1 to avoid stalls. The extra buffer capacity decouples the c1 write operations in the producer kernel and the c1 read operations in the consumer kernel.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

4.1.3. Optimizing Buffer Inference for Channels or Pipes