Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

ID 683342
Date 4/22/2019
Public
Document Table of Contents

5.12. Single-Cycle Floating-Point Accumulator for Single Work-Item Kernels

Single work-item kernels that perform accumulation in a loop can leverage the single-cycle floating-point accumulator feature of the Intel® FPGA SDK for OpenCL™ Offline Compiler. The offline compiler searches for these kernel instances and attempts to map an accumulation that executes in a loop into the accumulator structure.

The offline compiler supports an accumulator that adds or subtracts a value. To leverage this feature, describe the accumulation in a way that allows the offline compiler to infer the accumulator.

Attention:
  • The accumulator is only available on Arria® 10 devices.
  • The accumulator must be part of a loop.
  • The accumulator must have an initial value of 0.
  • The accumulator cannot be conditional.

Below are examples of a description that results in the correct inference of the accumulator by the offline compiler.

channel float4 RANDOM_STREAM; __kernel void acc_test(__global float *a, int k) { // Simplest example of an accumulator. // In this loop, the accumulator acc is incremented by 5. int i; float acc = 0.0f; for (i = 0; i < k; i++) { acc+=5; } a[0] = acc; } __kernel void acc_test2(__global float *a, int k) { // Extended example showing that an accumulator can be // conditionally incremented. The key here is to describe the increment // as conditional, not the accumulation itself. int i; float acc = 0.0f; for (i = 0; i < k; i++) { acc += ((i < 30) ? 5 : 0); } a[0] = acc; } __kernel void acc_test3(__global float *a, int k) { // A more complex case where the accumulator is fed // by a dot product. int i; float acc = 0.0f; for (i = 0; i < k; i++ ){ float4 v = read_channel_intel(RANDOM_STREAM); float x1 = v.x; float x2 = v.y; float y1 = v.z; float y2 = v.w; acc += (x1*y1+x2*y2); } a[0] = acc; } __kernel void loader(__global float *a, int k) { int i; float4 my_val = 0; for(i = 0; i < k; i++) { if ((i%4) == 0) write_channel_intel(RANDOM_STREAM, my_val); if ((i%4) == 0) my_val.x = a[i]; if ((i%4) == 1) my_val.y = a[i]; if ((i%4) == 2) my_val.z = a[i]; if ((i%4) == 3) my_val.w = a[i]; } }