Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

6.1. Addressing Single Work-Item Kernel Dependencies Based on Optimization Report Feedback

In many cases, designing your OpenCL™ application as a single work-item kernel is sufficient to maximize performance without performing additional optimization steps. To further improve the performance of your single work-item kernel, you can optimize it by addressing dependencies that the optimization report identifies.
Tip: If you are looking for Intel® oneAPI DPC++/C++ Compiler-specific details, refer to Single Work-item Kernels section in the Intel® oneAPI DPC++ FPGA Optimization Guide.

The following flowchart outlines the approach you can take to iterate on your design and optimize your single work-item kernel. For usage information on the Intel® FPGA SDK for OpenCL™ Emulator and the Profiler, refer to the Emulating and Debugging Your OpenCL Kernel and Profiling Your OpenCL Kernel sections of the Intel® FPGA SDK for OpenCL™ Programming Guide, respectively. For information on the Intel® FPGA dynamic profiler for OpenCL™ GUI and profiling information, refer to the Profile Your Kernel to Identify Performance Bottlenecks section.

Intel® recommends the following optimization options to address single work-item kernel loop-carried dependencies, in order of applicability: removal, relaxation, simplification, and transfer to local memory.

Figure 75. Optimization Work Flow of a Single Work-Item Kernel