Visible to Intel only — GUID: GUID-A3009071-2EA4-40D3-BD7B-C9E50271D793
Visible to Intel only — GUID: GUID-A3009071-2EA4-40D3-BD7B-C9E50271D793
DPCT1115
Message
The sycl::ext::oneapi::group_local_memory_for_overwrite is used to allocate group-local memory at the none kernel functor scope of a work-group data parallel kernel. You may need to adjust the code.
Detailed Help
The sycl::ext::oneapi::group_local_memory_for_overwrite can be used to allocate group-local memory at the kernel functor scope of a work-group data parallel kernel. The restriction that group-local variables must be defined at kernel functor scope may be lifted in a future version of this extension.
Refer to sycl_ext_oneapi_local_memory.asciidoc for more details.
Suggestions to Fix
For example, this original CUDA* code:
template <int S> __device__ void devfun() { __shared__ int slm1[32 * S]; ... } template <int S> __global__ void kernel() { __shared__ int slm2[S]; devfun<S>(); } void hostfun() { kernel<256><<<1, 1>>>(); }
results in the following migrated SYCL* code:
template <int S> inline void devfun(int *p, const sycl::nd_item<3> &item_ct1) { /* DPCT1115:0: The sycl::ext::oneapi::group_local_memory_for_overwrite is used to allocate group-local memory at the none kernel functor scope of a work-group data parallel kernel. You may need to adjust the code. */ auto &slm1 = *sycl::ext::oneapi::group_local_memory_for_overwrite<int[32 * S]>(item_ct1.get_group()); ... } template <int S> __dpct_inline__ void kernel(const sycl::nd_item<3> &item_ct1) { auto &slm2 = *sycl::ext::oneapi::group_local_memory_for_overwrite<int[S]>(item_ct1.get_group()); devfun<S>(item_ct1); } void hostfun() { dpct::get_default_queue().parallel_for( sycl::nd_range<3>(sycl::range<3>(1, 1, 1), sycl::range<3>(1, 1, 1)), [=](sycl::nd_item<3> item_ct1) { kernel<256>(item_ct1); }); }
which is rewritten to:
template <int S> inline void devfun(int *slm1) { ... } template <int S> __dpct_inline__ void kernel(int *slm1, int *slm2) { devfun<S>(slm1); } void hostfun() { dpct::get_default_queue().submit( [&](sycl::handler &cgh) { sycl::local_accessor<int, 1> slm1_acc_ct1(sycl::range<1>(32 * 256), cgh); sycl::local_accessor<int, 1> slm2_acc_ct1(sycl::range<1>(256), cgh); cgh.parallel_for( sycl::nd_range<3>(sycl::range<3>(1, 1, 1), sycl::range<3>(1, 1, 1)), [=](sycl::nd_item<3> item_ct1) { kernel<256>(slm1_acc_ct1.get_pointer(), slm2_acc_ct1.get_pointer()); }); });