Visible to Intel only — GUID: GUID-A226CEFD-3D71-4E6E-9F78-5E6A722A3043
Visible to Intel only — GUID: GUID-A226CEFD-3D71-4E6E-9F78-5E6A722A3043
Strategies for Inferring the Accumulator
To leverage the single cycle floating-point accumulator feature, you can modify the accumulator description in your kernel code to improve efficiency or work around programming restrictions.
Describe an Accumulator Using Multiple Loops
Consider a case where you want to describe an accumulator using multiple loops, with some of the loops being unrolled:
float acc = 0.0f;
for (i = 0; i < k; i++) {
#pragma unroll
for (j = 0; j < 16; j++)
acc += (x[i+j]*y[i+j]);
}
With fast math enabled by default, the Intel® oneAPI DPC++/C++ Compiler automatically rearranges operations in a way that exposes the accumulation.
Modify a Multi-Loop Accumulator Description
If you want an accumulator to be inferred even when using -fp-model=precise, rewrite your code to expose the accumulation..
For the code example above, rewrite it in the following manner:
float acc = 0.0f;
for (i = 0; i < k; i++) {
float my_dot = 0.0f;
#pragma unroll
for (j = 0; j < 16; j++)
my_dot += (x[i+j]*y[i+j]);
acc += my_dot;
}
Modify an Accumulator Description Containing a Variable or Non-Zero Initial Value
Consider a situation where you might want to apply an offset to a description of an accumulator that begins with a non-zero value:
float acc = array[0];
for (i = 0; i < k; i++) {
acc += x[i];
}
Because the accumulator hardware does not support variable or non-zero initial values in a description, you must rewrite the description.
float acc = 0.0f;
for (i = 0; i < k; i++) {
acc += x[i];
}
acc += array[0];
Rewriting the description in the above manner enables the kernel to use an accumulator in a loop. The loop structure is then followed by an increment of array[0].