Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.8.4. No Stalls, Low Occupancy Percentage, and Low Bandwidth

Loop-carried dependencies might create a bottleneck in your design that causes a low occupancy percentage and a low bandwidth.
Remember: An ideal kernel pipeline condition has a stall percentage of 0%, an occupancy percentage of 100%, and a bandwidth that equals the board's available bandwidth.
Figure 73. Example OpenCL Kernel and Profiler Analysis

In this example, dst[] is executed once every 20 iterations of the FACTOR2 loop and once every four iterations of the FACTOR1 loop. Therefore, FACTOR2 loop is the source of the bottleneck.

Solutions for resolving loop bottlenecks:

  • Unroll the FACTOR1 and FACTOR2 loops evenly. Simply unrolling FACTOR2 loop further does not resolve the bottleneck.
  • Vectorize your kernel to allow multiple work-items to execute during each loop iteration.