Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.5.5.5. Implementing Buffered Pipes Using the depth Attribute

You may have buffered or unbuffered pipes in your kernel program. If there are imbalances in pipe read and write operations, create buffered pipes to prevent kernel stalls by including the depth attribute in your pipe declaration. Buffered pipes decouple the operation of concurrent work-items executing in different kernels.

You may use a buffered pipe to control data traffic, such as limiting throughput or synchronizing accesses to shared memory. In an unbuffered pipe, a write operation can only proceed when the read operation is expecting to read data. Use unbuffered pipes in conjunction with blocking read and write behaviors in kernels that execute concurrently. The unbuffered pipes provide self-synchronizing data transfers efficiently.

In a buffered pipe, a write operation can only proceed if there is capacity in the pipe to hold the incoming packet. A read operation can only proceed if there is at least one packet in the pipe.

Use buffered pipes if pipe calls are predicated differently in the writer and reader kernels, and the kernels do not execute concurrently.

If you expect any temporary mismatch between the consumption rate and the production rate to the pipe, set the buffer size using the depth attribute.
The following example demonstrates the use of the depth attribute in kernel code that implements the OpenCL pipes. The depth(N) attribute specifies the minimum depth of a buffered pipe, where N is the number of data values. If the read and write kernels specify different depths for a given buffered pipe, the Intel® FPGA SDK for OpenCL™ Offline Compiler uses the larger depth of the two.
__kernel void
producer (__global int *in_data,
          write_only pipe int __attribute__((blocking))
		                    __attribute__((depth(10))) c)
{ 
    for (i = 0; i < N; i++)
    {
        if (in_data[i])
        {
            write_pipe( c, &in_data[i] );
        }
    }
}

__kernel void
consumer (__global int *check_data,
          __global int *out_data,
          read_only pipe int __attribute__((blocking)) c ) 
{
    int last_val = 0;
    for (i = 0; i < N; i++)
    {
        if (check_data[i])
        {
            read_pipe( c, &last_val );
        }
        out_data[i] = last_val;
    }
}

In this example, the write operation can write ten data values to the pipe successfully. After the pipe is full, the write kernel returns failure until a read kernel consumes some of the data in the pipe.

Because the pipe read and write calls are conditional statements, the pipe might experience an imbalance between read and write calls. You may add a buffer capacity to the pipe to ensure that the producer and consumer kernels are decoupled. This step is particularly important if the producer kernel is writing data to the pipe when the consumer kernel is not reading from it.