Specify a Workgroup Size

Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

Download PDF

ID 767853

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-4D6AC431-1E13-481D-A43A-5F861C6AEB44

View Details

Specify a Workgroup Size

Specify a maximum or the required workgroup size whenever possible. The Intel® oneAPI DPC++/C++ Compiler relies on this specification to optimize hardware use of the SYCL* kernel without involving excess logic.

If you do not specify the [[intel::max_work_group_size(Z, Y, X)]] or [[sycl::reqd_work_group_size(Z, Y, X)]] attribute in your kernel, the workgroup size assumes a default value depending on compilation time and runtime constraints.
If your kernel contains a barrier, the Intel® oneAPI DPC++/C++ Compiler sets a default maximum scalarized work-group size of 128 work-items.
If your kernel does not query any SYCL intrinsic that allow different threads to behave differently (that is, local or global thread IDs, or work-group ID), the Intel® oneAPI DPC++/C++ Compiler infers a single-threaded execution mode and sets the maximum work-group size to (1, 1, 1). In this case, the SYCL runtime also enforces a global enqueue size of (1, 1, 1), and loop pipelining optimizations are enabled within the Intel® oneAPI DPC++/C++ Compiler.

Deprecation Notice:

The [[cl::reqd_work_group_size(Z, Y, X)]] attribute is deprecated. Use the [[sycl::reqd_work_group_size(Z, Y, X)]] attribute.

To specify the work-group size, modify your kernel code in the following manner:

To specify the maximum number of work-items that the compiler provisions for a work-group in a kernel, insert the [[intel::max_work_group_size(Z, Y, X)]] attribute in your kernel source code.

For example:


constexpr unsigned MAX_WG_SIZE = 4;
...
cgh.parallel_for<class kernelCompute>(
  nd_range<1>(range<1>(N), range<1>(wg_size)),
  [=] (nd_item<id> it)
  [[intel::max_work_group_size(1, 1, MAX_WG_SIZE)]] {
    auto gid = it.get_global_id(0);
    accessorRes[gid] = accessorIdx[gid] * 2;
});

To specify the required number of work-items that the Intel® oneAPI DPC++/C++ Compiler provisions for a work-group in a kernel, insert the [[sycl::reqd_work_group_size(Z, Y, X)]] attribute in your kernel source code.

For example:


constexpr unsigned REQD_WG_SIZE = 4;
...
cgh.parallel_for<class kernelCompute>( 
  nd_range<1>(range<1>(N), range<1>(wg_size)),
 [=] (nd_item<id> it)
 [[sycl::reqd_work_group_size(1, 1, REQD_WG_SIZE)]] {
   auto gid = it.get_global_id(0);
   accessorRes[gid] = accessorIdx[gid] * 2;
 });

Parent topic: Kernel Attributes

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA Optimization Guide for Intel® oneAPI Toolkits

Specify a Workgroup Size