Developer Guide

Intel® oneAPI DPC++/C++ Compiler Handbook for FPGAs

ID 785441
Date 10/24/2024
Public
Document Table of Contents

NDRange Kernels

FPGA Acceleration Flow Only:
The information presented here applies only to multiarchitecture binary kernels in the FPGA acceleration flow. It does not apply for RTL IP core kernels in the SYCL* HLS flow.
If your program naturally tends to describe multiple concurrent threads operating in a data-parallel manner, specify your kernel to operate in parallel instances over a work-item index-space (NDRange).

Avoid Work-Item ID-Dependent Backward Branching

The Intel® oneAPI DPC++/C++ Compiler collapses conditional statements into single bits that indicate when a particular functional unit becomes active. The Intel® oneAPI DPC++/C++ Compiler eliminates simple control flow paths that do not involve looping structures, resulting in a flat control structure and more efficient hardware use.

Avoid including any work-item ID-dependent backward branching (that is, branching that occurs in a loop) in your kernel because it degrades performance.

For example, the following code fragment illustrates branching that involves work-item ID such as get_global_id or get_local_id:

for (size_t i = 0; i < get_global_id(0); i++)
{
   // statements
}