7.3. Multiple Compute Units

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

Download PDF

ID 683521

Date 3/28/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: mwh1391807504031

Ixiasoft

View Details

7.3. Multiple Compute Units

To achieve higher throughput, the Intel® FPGA SDK for OpenCL™ Offline Compiler can generate multiple compute units for each kernel. The offline compiler implements each compute unit as a unique pipeline. Generally, each kernel compute unit can execute multiple work-groups simultaneously.

To increase overall kernel throughput, the hardware scheduler in the FPGA dispatches work-groups to additional available compute units. A compute unit is available for work-group assignments provided that it has not reached its full capacity.

Assume each work-group takes the same amount of time to complete its execution. If the offline compiler implements two compute units, each compute unit executes half of the work-groups. Because the hardware scheduler dispatches the work-groups, you do not need to manage this process in your own code.

The offline compiler does not automatically determine the optimal number of compute units for a kernel. To increase the number of compute units for your kernel implementation, you must specify the number of compute units that the offline compiler should create using the num_compute_units attribute, as shown in the code sample below.

__attribute__((num_compute_units(2)))
__kernel void sum (__global const float * restrict a,
                   __global const float * restrict b,
                   __global float * restrict answer)
{
    size_t gid = get_global_id(0);

    answer[gid] = a[gid] + b[gid];
}

Increasing the number of compute units achieves higher throughput. However, as shown in the figure below, you do so at the expense of increasing global memory bandwidth among the compute units. You also increase hardware resource utilization.

Figure 76. Data Flow with Multiple Compute Units

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

7.3. Multiple Compute Units