Avoid Handling Edge Conditions in Kernels

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Download PDF

ID 773005

Date 10/30/2018

Version 2018

Public

Visible to Intel only — GUID: GUID-CB77D598-349F-472A-B697-3F4DB51B4EF7

View Details

Avoid Handling Edge Conditions in Kernels

To understand how to avoid handling edge conditions in kernels, consider the following smoothing 2x2 filter:

__kernel void smooth(const __global float* input, 
                     __global float* output)
{
  const int myX = get_global_id(0);
  const int myY = get_global_id(1);
  const int image_width = get_global_size(0);
  uint neighbors = 1;
  float sum = 0.0f;
  if ((myX + 1) < (image_width-1))
  {
    sum += input[myY * image_width + (myX + 1)];
    ++neighbors;
  }
  if (myX > 0)
  {
    sum += input[myY * image_width + (myX - 1)];
    ++neighbors;
  }
  if ((myY + 1) < (image_height-1))
  {
    sum += input[(myY + 1) * image_width + myX];
    ++neighbors;
  }
  if (myY > 0)
  {
    sum += input[(myY – 1) * image_width + myX];
    ++neighbors;
  }
  sum += input[myY * image_width + myX];
  output[myY * image_width + myX] = sum / (float)neighbors;
}

Assume that you have a full HD image with size of 1920x1080 pixels. The four edge if conditions are tested for every pixel, that is, roughly two million times.

However, they are only relevant for the 6000 pixels on the image edges, which is 0.2% of the pixels. For the remaining 99.8% work-items, the edge condition check is a waste of time. See the following optimized code:

__kernel void smooth(const __global float* input, 
                     __global float* output)
{
  const int myX = get_global_id(0);
  const int myY = get_global_id(1);
  const int image_width = get_global_size(0);
  float sum = 0.0f;
  sum += input[myY * image_width + (myX + 1)];
  sum += input[myY * image_width + (myX - 1)];
  sum += input[(myY + 1) * image_width + myX];
  sum += input[(myY – 1) * image_width + myX];
  sum += input[myY * image_width + myX];
  output[myY * image_width + myX] = sum / 5.0f;
}

This code requires padding (enlarging) the input buffer appropriately, while using the original global size. This way querying the neighbors for the border pixels does not result in buffer overrun. When you process several frames this way, do this padding exactly once (which means you initially allocate frames of the proper size), instead of copying each frame to larger buffer before processing, and then copying the result back to smaller (original size) buffer.

If padding through larger input is not possible, make sure you use the min or max built-ins, so checking a work-item does not access outside the actual image adds only four lines. Consider the following example:

__kernel void smooth(const __global float* input, 
                     __global float* output)
{
  const int image_width = get_global_size(0);
  const int image_height = get_global_size(0);
  int myX = get_global_id(0);
  //since for myX== image_width–1 the (myX+1) is incorrect 
  myX =  min(myX, image_width -2);
  //since for myX==0 the (myX-1) is incorrect
  myX =  max(myX, 1);
  int myY = get_global_id(1);
  //since for myY== image_height-1 the (myY+1) is incorrect 
  myY =  min(myY, image_height -2);
  //since for myY==0 the (myY-1) is incorrect
  myY =  max(myY - 1, 0);
  float sum = 0.0f;
  sum += input[myY * image_width + (myX + 1)];
  sum += input[myY * image_width + (myX - 1)];
  sum += input[(myY + 1) * image_width + myX];
  sum += input[(myY – 1) * image_width + myX];
  sum += input[myY * image_width + myX];
  output[myY * image_width + myX] = sum / 5.0f;
}

At a cost of duplicating calculations for border work-items, this code avoids testing for the edge conditions, which is otherwise needed to perform for all work-items.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

OpenCL™ Developer Guide for Intel® Core™ and Intel® Xeon® Processors

Avoid Handling Edge Conditions in Kernels

See Also