Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Specify Number of SIMD Work Items

You have the option to increase the data-processing efficiency of a SYCL kernel by executing multiple work-items in a single instruction multiple data (SIMD) manner without manually vectorizing your kernel code.

Specify the number of work-items within a work-group that the Intel® oneAPI DPC++/C++ Compiler should execute in a SIMD or vectorized manner.

Deprecation Notice:

The [[cl::reqd_work_group_size(Z, Y, X)]] attribute is deprecated. Use the [[sycl::reqd_work_group_size(Z, Y, X)]] attribute.

To specify the number of SIMD work-items in a work-group, insert the [[intel::num_simd_work_items(N)]] attribute in the kernel source code. The supported values for size N are 2, 4, 8, and 16. Other sizes are accepted, but ignored (no vectorization occurs).

Consider the following example:

cgh.parallel_for<class kernelComputeSIMD>(
  nd_range<1>(range<1>(N), range<1>(REQD_WORK_GROUP_SIZE)),
  [=] (nd_item<id> it)
    [[intel::num_simd_work_items(NUM_SIMD_WORK_ITEMS),
    sycl::reqd_work_group_size(1, 1, REQD_WORK_GROUP_SIZE)]] {
      auto gid = it.get_global_id(0);
      accessorRes[gid] = sycl::sqrt(accessorIdx[gid]);
    }
NOTE:

Introduce the [[intel::num_simd_work_items(N)]] attribute in conjunction with the [[sycl::reqd_work_group_size(Z, Y, X)]] attribute. The [[intel::num_simd_work_items(N)]] attribute you specify must evenly divide the last argument that you specify to the req_work_group_size attribute.

For additional information about [[sycl::reqd_work_group_size(Z, Y, X)]] attribute, refer to Specify a Workgroup Size.