Intel® Inspector User Guide for Linux* OS

ID 767796
Date 10/31/2024
Public
Document Table of Contents

Invalid Kernel Argument Size

Occurs when the size of kernel argument exceeds the amount of registers available for argument storage.

ID

Code Location

Description

1

Allocation site

Represents source location of passing arguments from host to a kernel.

By default, kernel arguments are promoted to registers if possible. On each device, the amount of registers available for kernel arguments may vary (usually 25-50%). The arguments outside this limit are cut off, which results in logical errors in the kernel.

In OpenCL™, arguments are specified directly using the clSetKernelArg function. In this case, the problem is more obvious and may appear if a large amount of arguments or wide structures is used.

In Data Parallel (DPC++), arguments may be specified implicitly using lambda capture list. If the list is specified as [=] or as [&], all used arguments are captured. In this case, a large amount of arguments, wide structures or classes may be passed to a kernel accidentally.

In this diagnostic, Intel® Inspector displays the total size kernel arguments (in bytes) vs the device limit.

DPC++ Example

const int N = 1000; 

struct Data 

{ 

   double numbers[N]; 

}; 

Data data; 

queue.submit([&](cl::sycl::handler &cgh) 

{ 

   cgh.parallel_for<class my_task>(cl::sycl::range<1> { N }, [=](cl::sycl::id<1> idx) 

   { 

       deviceData[0] += data.numbers[idx];  // Implicit usage of Data structure from host 

   }); 

}); 

queue.wait(); 

// The structure contains 1000 doubles x 8 bytes = 8000 bytes of data > available kernel arguments limit.

Possible Correction Strategies

To avoid the problem, use the following hints:

  • Reduce the number of arguments used if possible.
  • Avoid passing complex structures to a kernel. Consider passing separate fields instead of a complete data structure.
  • Narrow the type of data. Consider using float instead of double, int instead of long, etc.
  • In a DPC++ program, specify lambda capture list explicitly instead of using a default capture list specified by [=] or [&].