Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Clustering the Datapath

Dynamically scheduling all operations adds overhead in the form of additional FPGA areas required to implement the required handshaking control logic.

To reduce this overhead, the compiler groups fixed latency operations into clusters. A cluster of fixed latency operations, such as arithmetic operations, requires fewer handshaking interfaces, thereby reducing the area overhead.

Clustered Logic

If A, B, and C from Figure 1 do not contain variable latency operations, the compiler can cluster them together, as illustrated in Figure 1. Clustering the logic reduces area by removing the need for signals to stall data flow in addition to other handshaking logic within the cluster.

Cluster Types

The Intel® oneAPI DPC++/C++ Compiler can create the following types of clusters:

  • Stall-Enable Cluster (SEC): This cluster type passes the handshaking logic to every pipeline stage in the cluster in parallel. This means that if the cluster is stalled by logic from further down in the datapath, all logic in the SEC stalls simultaneously.
    Stall-Enable Cluster

  • Stall-Free Cluster (SFC): This cluster type adds a first in, first out (FIFO) buffer to the end of the cluster that can accommodate the entire latency of the pipeline in the cluster. This FIFO is often called an exit FIFO because it is attached to the exit of the cluster datapath.

    Because of this FIFO, the pipeline stages in the cluster do not require any handshaking logic. The stages can run freely and drain into the capacity FIFO, even if the cluster is stalled from logic further down in the datapath.

    Stall-Free Cluster

Cluster Characteristics

The exit FIFO of the stall-free cluster results in some of the following tradeoffs:

  • Area: Because an SEC does not use an exit FIFO, it can save FPGA area compared to an SFC. If you have a design with many small, low-latency clusters, you can save a substantial amount of area by asking the compiler to use SECs instead of SFCs.
  • Latency: Logic that uses SFCs might have a larger latency than logic that uses SECs because of the write-read latency of the exit FIFO. If you use a zero-latency FIFO for the exit FIFO, you can mitigate the latency, but fMAX or FPGA area use might be negatively impacted. For additional information, refer to Global Control of Exit FIFO Latency of Stall-free Clusters (-Xssfc-exit-fifo-type=<value>).
  • FMAX: In an SFC, the oStall signal has less fanout than in an SEC. For a cluster with many pipeline stages, you can improve your design fMAX by using an SFC.
  • Handshaking: The exit FIFO in SFCs allow them to take advantage of hyper-optimized handshaking between clusters. For more information, refer to Hyper Optimized Handshaking. SECs do not support this capability.
  • Bubble Handling: SECs remove only leading bubbles in the pipeline under limited circumstances. A leading bubble is a bubble that arrives before the first piece of valid data arrives in the cluster. SECs do not remove any arriving afterward.

    SFCs can use the capacity FIFO to remove all bubbles from the pipeline if the SFC gets a downstream stall signal.

  • Stall Behavior: When an SEC receives a downstream stall, it stalls any logic upstream of it within one clock cycle. When an SFC receives a downstream stall, the exit FIFO allows it to consume additional valid data depending on how deep the exit FIFO is and how many bubbles are in the cluster datapath.