Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.2.11. Specifying Work-Group Sizes

Specify a maximum or required work-group size whenever possible. The Intel® FPGA SDK for OpenCL™ Offline Compiler relies on this specification to optimize hardware usage of the OpenCL™ kernel without involving excess logic.

If you do not specify a max_work_group_size or a reqd_work_group_size attribute in your kernel, the work-group size assumes a default value depending on compilation time and runtime constraints.

  • If your kernel contains a barrier, the offline compiler sets a default maximum scalarized work-group size of 128 work-items.
  • If your kernel does not query any OpenCL intrinsics that allow different threads to behave differently (that is, local or global thread IDs, or work-group ID), the offline compiler infers a single-threaded execution mode and sets the maximum work-group size to (1,1,1). In this case, the OpenCL runtime also enforces a global enqueue size of (1,1,1), and loop pipelining optimizations are enabled within the offline compiler.

To specify the work-group size, modify your kernel code in the following manner:

  • To specify the maximum number of work-items that the offline compiler provisions for a work-group in a kernel, insert the max_work_group_size(X, Y, Z) attribute in your kernel source code.
    For example:
    __attribute__((max_work_group_size(512,1,1)))
    __kernel void sum (__global const float * restrict a,
                       __global const float * restrict b,
                       __global float * restrict answer)
    {
        size_t gid = get_global_id(0);
        answer[gid] = a[gid] + b[gid];
    }
  • To specify the required number of work-items that the offline compiler provisions for a work-group in a kernel, insert the reqd_work_group_size(X, Y, Z) attribute in your kernel source code.
    For example:
    __attribute__((reqd_work_group_size(64,1,1)))
    __kernel void sum (__global const float * restrict a,
                       __global const float * restrict b,
                       __global float * restrict answer)
    {
        size_t gid = get_global_id(0);
        answer[gid] = a[gid] + b[gid];
    }