Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

ID 683342
Date 4/22/2019
Public
Document Table of Contents

5.2.7. Specifying Number of SIMD Work-Items

You have the option to increase the data-processing efficiency of an OpenCL™ kernel by executing multiple work-items in a single instruction multiple data (SIMD) manner without manually vectorizing your kernel code. Specify the number of work-items within a work-group that the Intel® FPGA SDK for OpenCL™ Offline Compiler should execute in an SIMD or vectorized manner.
Important: Introduce the num_simd_work_items attribute in conjunction with the reqd_work_group_size attribute. The num_simd_work_items attribute you specify must evenly divide the work-group size you specify for the reqd_work_group_size attribute.
To specify the number of SIMD work-items in a work-group, insert the num_simd_work_item(N) attribute in the kernel source code.
For example, the code fragment below assigns a fixed work-group size of 64 work-items to a kernel. It then consolidates the work-items within each work-group into four SIMD vector lanes:
__attribute__((num_simd_work_items(4))) __attribute__((reqd_work_group_size(64,1,1))) __kernel void test(__global const float * restrict a, __global const float * restrict b, __global float * restrict answer) { size_t gid = get_global_id(0); answer[gid] = a[gid] + b[gid]; }
The offline compiler vectorizes the kernel datapath according to the value you specify for num_simd_work_items whenever possible.