Invalid Kernel Argument Size

Intel® Inspector User Guide for Linux* OS

Download PDF

ID 767796

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-C0938D55-B44F-47DA-A721-3F70DD0F626B

View Details

Invalid Kernel Argument Size

Occurs when the size of kernel argument exceeds the amount of registers available for argument storage.

ID	Code Location	Description
1	Allocation site	Represents source location of passing arguments from host to a kernel.

By default, kernel arguments are promoted to registers if possible. On each device, the amount of registers available for kernel arguments may vary (usually 25-50%). The arguments outside this limit are cut off, which results in logical errors in the kernel.

In OpenCL™, arguments are specified directly using the clSetKernelArg function. In this case, the problem is more obvious and may appear if a large amount of arguments or wide structures is used.

In Data Parallel (DPC++), arguments may be specified implicitly using lambda capture list. If the list is specified as [=] or as [&], all used arguments are captured. In this case, a large amount of arguments, wide structures or classes may be passed to a kernel accidentally.

In this diagnostic, Intel® Inspector displays the total size kernel arguments (in bytes) vs the device limit.

DPC++ Example

const int N = 1000; 

struct Data 

{ 

   double numbers[N]; 

}; 

Data data; 

queue.submit([&](cl::sycl::handler &cgh) 

{ 

   cgh.parallel_for<class my_task>(cl::sycl::range<1> { N }, [=](cl::sycl::id<1> idx) 

   { 

       deviceData[0] += data.numbers[idx];  // Implicit usage of Data structure from host 

   }); 

}); 

queue.wait(); 

// The structure contains 1000 doubles x 8 bytes = 8000 bytes of data > available kernel arguments limit.

Possible Correction Strategies

To avoid the problem, use the following hints:

Reduce the number of arguments used if possible.
Avoid passing complex structures to a kernel. Consider passing separate fields instead of a complete data structure.
Narrow the type of data. Consider using float instead of double, int instead of long, etc.
In a DPC++ program, specify lambda capture list explicitly instead of using a default capture list specified by [=] or [&].

CAUTION:

Sample Code Caveats

Parent topic: Problem Type Reference

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® Inspector User Guide for Linux* OS

Invalid Kernel Argument Size

Possible Correction Strategies