Developer Guide

Intel® oneAPI DPC++/C++ Compiler Handbook for FPGAs

ID 785441
Date 10/24/2024
Public
Document Table of Contents

Strategies for Inferring the Accumulator

To leverage the single cycle floating-point accumulator feature, you can modify the accumulator description in your kernel code to improve efficiency or work around programming restrictions.

Describe an Accumulator Using Multiple Loops

Consider a case where you want to describe an accumulator using multiple loops, with some of the loops being unrolled:

float acc = 0.0f; for (i = 0; i < k; i++) { #pragma unroll for (j = 0; j < 16; j++) acc += (x[i+j]*y[i+j]); }

With fast math enabled by default, the Intel® oneAPI DPC++/C++ Compiler automatically rearranges operations in a way that exposes the accumulation.

Modify a Multi-Loop Accumulator Description

If you want an accumulator to be inferred even when using -fp-model=precise, rewrite your code to expose the accumulation.

For the code example above, rewrite it in the following manner:

float acc = 0.0f; for (i = 0; i < k; i++) { float my_dot = 0.0f; #pragma unroll for (j = 0; j < 16; j++) my_dot += (x[i+j]*y[i+j]); acc += my_dot; }

Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value

Consider a situation where you might want to apply an offset to a description of an accumulator that begins with a non-zero value:

float acc = array[0]; for (i = 0; i < k; i++) { acc += x[i]; }

Because the accumulator hardware does not support variable or non-zero initial values in a description, you must rewrite the description.

float acc = 0.0f; for (i = 0; i < k; i++) { acc += x[i]; } acc += array[0];

Rewriting the description in the above manner enables the kernel to use an accumulator in a loop. The loop structure is then followed by an increment of array[0].