DPCT1096

Intel® DPC++ Compatibility Tool Developer Guide and Reference

Download PDF

ID 768918

Date 11/07/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-B9C9E8E9-BC2F-42DA-B81A-E224DC4F83AE

View Details

DPCT1096

Message

The right-most dimension of the work-group used in the SYCL kernel that calls this function may be less than <value of kernel sub-group size attribute>. The function <help function name> may return an unexpected result on the CPU device. Modify the size of the work-group to ensure that the value of the right-most dimension is a multiple of <value of kernel sub-group size attribute>.

Detailed Help

The dpct::select_from_sub_group, dpct::shift_sub_group_left, dpct::shift_sub_group_right, and dpct::permute_sub_group_by_xor functions may return unexpected results when run on a CPU device with an OpenCL™ backend, if the right-most dimension value of the work-group used in the SYCL* kernel that calls these functions is less than the value of the kernel sub-group size attribute. The real sub-group size may not be the value specified by the kernel sub-group size attribute and could cause the helper function to return unexpected results on a CPU device.

Adjust the code by modifying the size of the work-group to ensure that the value of the right-most dimension is a multiple of the kernel sub-group size attribute.

For example, this original CUDA* code:

  __global__ void kernel() {
      ...
      value = __shfl_down(x, delta);
      ...
  }

  int main() {
      ...
      auto GridSize = dim3(2);
      auto BlockSize = dim3(8, 8, 1);
      kernel<<<GridSize, BlockSize>>>();
      ...
  }

results in the following migrated SYCL code:

  void kernel(sycl::nd_item<3> item) {
      ...
      value = dpct::shift_sub_group_left(item.get_sub_group(), x, delta); // May return unexpected result on CPU
      ...
  }

  int main() {
      ...
      auto GridSize = sycl::range<3>(1, 1, 2);
      auto BlockSize = sycl::range<3>(1, 8, 8); // Problem: value of the right-most dimension 8 is less than the kernel sub group size attribute 32.
      queue.parallel_for(sycl::nd_range<3>(GridSize * BlockSize, BlockSize), [=](sycl::nd_item<3> item)[[intel::reqd_sub_group_size(32)]] {
          kernel(item);
      });
      ...
  }

which is rewritten to:

  void kernel(sycl::nd_item<3> item) {
      ...
      value = dpct::shift_sub_group_left(item.get_sub_group(), x, delta);
      ...
  }

  int main() {
      auto GridSize = sycl::range<3>(1, 1, 2);
      auto BlockSize = sycl::range<3>(1, 2, 32); // Fix: modified work group size to make the right-most dimension to be multiple of the kernel sub group size attribute value, which is 32.
      queue.parallel_for(sycl::nd_range<3>(GridSize * BlockSize, BlockSize), [=](sycl::nd_item<3> item)[[intel::reqd_sub_group_size(32)]] {
          kernel(item);
      });
      ...
  }

Suggestions to Fix

If the program needs to execute on a CPU device, you may need to adjust the code.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® DPC++ Compatibility Tool Developer Guide and Reference

DPCT1096

Message

Detailed Help

Suggestions to Fix