Reduce Kernel Area and Latency (use_stall_enable

Developer Guide

Intel® oneAPI DPC++/C++ Compiler Handbook for FPGAs

Download PDF

ID 785441

Date 6/24/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-A121584B-6ADB-4B7F-B1EE-8021B5C25F79

View Details

Reduce Kernel Area and Latency (use_stall_enable_clusters)

The [[intel::use_stall_enable_clusters]] attribute enables you to direct the Intel® oneAPI DPC++/C++ Compiler to reduce the area and latency of your kernel. Reducing the latency does not have a large effect on loops that are pipelined, unless the number of iterations of the loop is very small.

Computations in an FPGA kernel are normally grouped into the following cluster types:

Stall-Free Clusters (SFC): Allows simplification of signals within a cluster, but the FIFO queue at the end of the cluster is used to save intermediate results if the computation must stall. For more information about SFCs, refer to Clustering the Datapath.
Stall-Enable Clusters (SEC): Saves area and cycles by removing the FIFO queue and passing the stall signals to each part of the computation. These extra signals may cause the f_MAX to reduce. For more information, refer to Clustering the Datapath.

CAUTION:

If you specify the [[intel::use_stall_enable_clusters]] attribute on one or more kernels, the compiler might reduce the f_MAX of the generated FPGA bitstream, which may reduce performance on all kernels.

Hyper-Optimized Handshaking Restriction:

The [[intel::use_stall_enable_clusters]] attribute prevents the use of hyper-optimized handshaking on designs that target Stratix^® 10 and later FPGA architectures.

Example

h.single_task<class KernelComputeStallFree>( [=]() [[intel::use_stall_enable_clusters]] {
  // The computations in this device kernel uses Stall Enable Clusters
  Work(accessor_vec_a, accessor_vec_b, accessor_res);
});

The compiler uses stall-enable clusters for the kernel when possible. Some computations might not be stallable, so the compiler places them in a stall-free cluster even if a stall-enable cluster was requested.

NOTE:

For more information, refer to the FPGA tutorial sample “Stall Enable Clusters” on GitHub.

Parent topic: Kernel Attributes

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI DPC++/C++ Compiler Handbook for FPGAs

Reduce Kernel Area and Latency (use_stall_enable_clusters)

Example