Strategies for Inferring the Accumulator

Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

Download PDF

ID 767853

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-A226CEFD-3D71-4E6E-9F78-5E6A722A3043

View Details

Strategies for Inferring the Accumulator

To leverage the single cycle floating-point accumulator feature, you can modify the accumulator description in your kernel code to improve efficiency or work around programming restrictions.

Describe an Accumulator Using Multiple Loops

Consider a case where you want to describe an accumulator using multiple loops, with some of the loops being unrolled:


float acc = 0.0f;
for (i = 0; i < k; i++) {
  #pragma unroll
  for (j = 0; j < 16; j++)
    acc += (x[i+j]*y[i+j]);
}

With fast math enabled by default, the Intel® oneAPI DPC++/C++ Compiler automatically rearranges operations in a way that exposes the accumulation.

Modify a Multi-Loop Accumulator Description

In cases where you cannot compile an accumulator description using the -Xsfp-relaxed compiler command option, rewrite the code to expose the accumulation.

For the code example above, rewrite it in the following manner:


float acc = 0.0f;
for (i = 0; i < k; i++) {
  float my_dot = 0.0f;
  #pragma unroll
  for (j = 0; j < 16; j++)
    my_dot += (x[i+j]*y[i+j]);
  acc += my_dot;
}

Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value

Consider a situation where you might want to apply an offset to a description of an accumulator that begins with a non-zero value:


float acc = array[0];
for (i = 0; i < k; i++) {
  acc += x[i];
}

Because the accumulator hardware does not support variable or non-zero initial values in a description, you must rewrite the description.


float acc = 0.0f;
for (i = 0; i < k; i++) {
  acc += x[i];
}
acc += array[0];

Rewriting the description in the above manner enables the kernel to use an accumulator in a loop. The loop structure is then followed by an increment of array[0].

Parent topic: Single-Cycle Floating-Point Accumulator for Single Work-Item Kernels

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA Optimization Guide for Intel® oneAPI Toolkits

Strategies for Inferring the Accumulator

Describe an Accumulator Using Multiple Loops

Modify a Multi-Loop Accumulator Description

Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value