Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 3/28/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.7.1. Stall, Occupancy, Bandwidth

For specific lines of kernel code, the Source View tab of the Intel® VTune Profiler GUI shows stall percentage, occupancy percentage, data transfer size, and average memory bandwidth.

For definitions of stall, occupancy, and bandwidth, refer to Types of Performance Data.

The Intel® FPGA SDK for OpenCL™ generates a pipeline architecture where work-items traverse through the pipeline stages sequentially (that is, in a pipeline-parallel manner). As soon as a pipeline stage becomes empty, a work-item enters and occupies the stage. Pipeline parallelism also applies to iterations of pipelined loops, where iterations enter a pipelined loop sequentially.

Figure 72. Simplified Representation of a Kernel Pipeline Instrumented with Performance Counters

The following are simplified equations that describe the Profiler calculates stall, occupancy, and bandwidth:

Note: ivalid_count in the bandwidth equation also includes the predicate=true input to the load-store unit.

Ideal kernel pipeline conditions:

  • Stall percentage equals 0%
  • Occupancy percentage equals 100%
  • Bandwidth equals the board's bandwidth

For a given location in the kernel pipeline if the sum of the stall percentage and the occupancy percentage approximately equals 100%, the Profiler identifies the location as the stall source. If the stall percentage is low, the Profiler identifies the location as the victim of the stall.

The Profiler reports a high occupancy percentage if the offline compiler generates a highly efficient pipeline from your kernel, where work-items or iterations are moving through the pipeline stages without stalling.

If all LSUs are accessed the same number of times, they have the same occupancy value.

  • If work-items cannot enter the pipeline consecutively, they insert bubbles into the pipeline.
  • In loop pipelining, loop-carried dependencies also form bubbles in the pipeline because of bubbles that exist between iterations.
  • If an LSU is accessed less frequently than other LSUs, such as the case when an LSU is outside a loop that contains other LSUs, this LSU has a lower occupancy value than the other LSUs.

The same rule regarding occupancy value applies to channels.