Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

11.2.1. Double Buffered Host Application Utilizing Kernel Invocation Queue

Double buffering in OpenCL host application allows OpenCL runtime environment to coalesce memory transfers and kernel execution.

To utilize hardware kernel invocation queue while double buffering, write your host code as shown in the following code snippet:

int main()
{	…
  cl_event dependencies[2];
  for (int i=0; i<MAX_ITERATIONS; i++) {
    if (i < 2) {
      clEnqueueWriteBuffer(writeQ,  inputBufferD[i%2],  CL_FALSE,  …,  inputBufferH[i],  0,  NULL,  &writeEvent[i]);
      clFlush(writeQ);
      clSetKernelArg(kernel,  0,  sizeof(cl_mem *),  &inputBufferD[i%2]);
      clSetKernelArg(kernel,  1,  sizeof(cl_mem *),  &outputBufferD[i%2]);
      clEnqueueNDRangeKernel(kernelQ,  kernel,  …,  1,  &writeEvent[i],  &kernelEvent[i]);
      clFlush(kernelQ);
    } else {
      clEnqueueWriteBuffer(writeQ,  inputBufferD[i%2],  CL_FALSE,  …,  inputBufferH[i],  1,  &kernelEvent[i-2],  &writeEvent[i]); 
      clFlush(writeQ);
      dependencies[0] = writeEvent[i];
      dependencies[1] = readEvent[i-2];
      clSetKernelArg(kernel,  0,  sizeof(cl_mem *),  &inputBufferD[i%2]);
      clSetKernelArg(kernel,  1,  sizeof(cl_mem *),  &outputBufferD[i%2]);
      clEnqueueNDRangeKernel(kernelQ,  kernel,  …,  2,  dependencies,  &kernelEvent[i]);
      clFlush(kernelQ);
    }
    clEnqueueReadBuffer(readQ,  output_device[i%2],  CL_FALSE,  …,  outputBufferH[i],  1,  &kernelEvent[i],  &readEvent[i]);
    clFlush(readQ);
  }
  …
}

The following diagram helps you in visualizing the event dependency:

Note: Arrows represent the source of event in the event wait list.
Figure 93. Event Dependency Graph

The following figure illustrates the order the commands are executed on the device assuming kernel execution is longer than reads and writes, and the device supports concurrent reads and writes:

Figure 94. Order of Event Execution