Intel® Inspector User Guide for Linux* OS

ID 767796
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Invalid Kernel Argument Size

Occurs when the size of kernel argument exceeds the amount of registers available for argument storage.

ID

Code Location

Description

1

Allocation site

Represents source location of passing arguments from host to a kernel.

By default, kernel arguments are promoted to registers if possible. On each device, the amount of registers available for kernel arguments may vary (usually 25-50%). The arguments outside this limit are cut off, which results in logical errors in the kernel.

In OpenCL™, arguments are specified directly using the clSetKernelArg function. In this case, the problem is more obvious and may appear if a large amount of arguments or wide structures is used.

In Data Parallel (DPC++), arguments may be specified implicitly using lambda capture list. If the list is specified as [=] or as [&], all used arguments are captured. In this case, a large amount of arguments, wide structures or classes may be passed to a kernel accidentally.

In this diagnostic, Intel® Inspector displays the total size kernel arguments (in bytes) vs the device limit.

DPC++ Example

const int N = 1000; 

struct Data 

{ 

   double numbers[N]; 

}; 

Data data; 

queue.submit([&](cl::sycl::handler &cgh) 

{ 

   cgh.parallel_for<class my_task>(cl::sycl::range<1> { N }, [=](cl::sycl::id<1> idx) 

   { 

       deviceData[0] += data.numbers[idx];  // Implicit usage of Data structure from host 

   }); 

}); 

queue.wait(); 

// The structure contains 1000 doubles x 8 bytes = 8000 bytes of data > available kernel arguments limit.

Possible Correction Strategies

To avoid the problem, use the following hints:

  • Reduce the number of arguments used if possible.
  • Avoid passing complex structures to a kernel. Consider passing separate fields instead of a complete data structure.
  • Narrow the type of data. Consider using float instead of double, int instead of long, etc.
  • In a DPC++ program, specify lambda capture list explicitly instead of using a default capture list specified by [=] or [&].