Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

10.1.1. Reducing the Number of Kernels

Instead of partitioning your design across multiple kernels, consider consolidating the design into fewer kernels. For Intel® Stratix® 10 designs, Intel® recommends that you only use separate kernels for truly asynchronous execution.

The following example shows a producer kernel and a consumer kernel communicating via channels:

kernel producer(unsigned N) {
   int result;
   for (unsigned int i = 0; i < N; i++) {
      write_channel_intel(Produce(i));
   }
} 
 
kernel consumer(unsigned N) {
   for (unsigned int i = 0; i < N; i++) {
      Consume(i, read_channel_intel());
   }
}

The optimized code below merges the two kernels in the example above into a single kernel, which uses the computation results directly without channel accesses:

kernel fused(unsigned N) {
   for (unsigned int i = 0; i < N; i++) {
      Consume(i, Produce(i));
   }
}