Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

11.3.1.2. Omit Communication Hardware between the Host and the Kernel

The autorun kernel attribute instructs the Intel® FPGA SDK for OpenCL™ Offline Compiler to omit logic that is used for communication between the host and the kernel. A kernel that uses the autorun attribute starts executing automatically before any kernel that the host launches explicitly. In addition, this kernel restarts automatically as soon as it finishes its execution.

The autorun kernel attribute notifies the offline compiler that the kernel runs on its own and does not get enqueued by any host.

To leverage the autorun attribute, a kernel must meet all of the following criteria:

  • Does not use I/O channels
    Note: Kernel-to-kernel channels are supported.
  • Does not have any arguments
  • Has either the max_global_work_dim(0) attribute or the reqd_work_group_size(X,Y,Z) attribute. If the reqd_work_group_size(X,Y,Z) attribute is used, then X , Y , and Z must be divisors of 232.

As mentioned above, kernels with the autorun attribute cannot have any arguments and start executing without the host launching them explicitly. As a result, the offline compiler does not need to generate the logic for communication between the host and the kernel. Omitting this logic reduces logic utilization and allows the offline compiler to apply additional performance optimizations.

A typical use case for the autorun attribute is a kernel that reads data from one or more kernel-to-kernel channels, processes the data, and then writes the results to one or more channels. When compiling the kernel, the offline compiler generates hardware as illustrated in Figure 36.

channel int chan_in;
channel int chan_out;

__attribute__((max_global_work_dim(0)))
__attribute__((autorun))
__kernel void plusOne () {
    while(1) {
        int data_in = read_channel_intel(chan_in);
        write_channel_intel(chan_out, data_in + 1);	
    }
}
Figure 36. Single Work-Item Kernel with No Interface Hardware