DPCT1116

Intel® DPC++ Compatibility Tool Developer Guide and Reference

Download PDF

ID 768918

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

DPCT1116

Message

The <original API> was migrated to <migrated API> for performance, and <expression> is computed by (UINT_MAX + 1) / ('<argument>' + 1). This migration requires the initial value of <value> to be scaled by multiplying <expression>, and any usage of value of <value> outside atomic function to be scaled by dividing <expression>.

Detailed Help

By default, the migration tool uses the Compare-and-Swap atomic operation to migrate the atomicInc and atomicDec functions. If the atomic API is in a critical path, performance is not good.

To improve the performance, the migration tool checks the special value of the second argument of atomicInc/atomicDec. If it is 2n – 1 (1 < n <= 32), then the tool uses the atomic add API to migrate atomicInc/atomicDec.

Using atomicInc(addr, val) as an example, the essence of optimization involves:

Migrate the atomicInc(addr, val) function to dpct::atomic_fetch_add(addr, step), where step is calculated by (UINT_MAX + 1) / (val + 1). This strategy exploits the overflow behavior of unsigned integers to mimic the wrapping behavior of atomicInc/Dec().
Update the initial value of *addr to be scaled by multiplying with step, ensuring the wrapping behavior is consistent.
Update the references to the value *addr outside atomic functions, dividing it by step to retrieve the intended value.

Suggestions to Fix

For example, this original CUDA* code:

__device__ unsigned int a1 = 5;
__global__ void kernel(unsigned int *result){
  ...
  unsigned int old_val = atomicInc(&a1, 0x7fffffff);
  ...
  *result = a1;
}

results in the following migrated SYCL* code:

dpct::global_memory<unsigned int, 0> a1(5 * 2);
void kernel(unsigned int*result, unsigned int &a1){
  ...
  /*
  DPCT1116:0: The atomicInc was migrated to dpct::atomic_fetch_add(&a1, 2) / 2 for performance, and 2 is computed by (UINT_MAX + 1) / ('0x7fffffff' + 1). This migration requires the initial value of 'a1' to be scaled by multiplying 2, and any usage of the value of 'a1' outside atomic function to be scaled by dividing 2.
  */
  unsigned int old_val = dpct::atomic_fetch_add<sycl::access::address_space::generic_space>(&a1, 2) / 2;
  ...
  *result = a1 / 2; // Dividing a1 by step 2 to get the intended value.
}

which is rewritten to:

dpct::global_memory<unsigned int, 0> a1(5 * 2);
void kernel(unsigned int*result, unsigned int &a1){
  ...
  unsigned int old_val = dpct::atomic_fetch_add<sycl::access::address_space::generic_space>(&a1, 2) / 2;
  ...
  *result = a1 / 2; // Dividing a1 by step 2 to get the intended value.
}

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® DPC++ Compatibility Tool Developer Guide and Reference

DPCT1116

Message

Detailed Help

Suggestions to Fix