Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/19/2022
Public

Visible to Intel only — GUID: mwh1391807501175

Ixiasoft

Document Table of Contents

7. Strategies for Improving NDRange Kernel Data Processing Efficiency

Consider the following kernel code:

__kernel void sum (__global const float * restrict a, __global const float * restrict b, __global float * restrict answer) { size_t gid = get_global_id(0); answer[gid] = a[gid] + b[gid]; }

This kernel adds arrays a and b, one element at a time. Each work-item is responsible for adding two elements, one from each array, and storing the sum into the array answer. Without optimization, the kernel performs one addition per work-item.

To maximize the performance of your OpenCL™ kernel, consider implementing the applicable optimization techniques to improve data processing efficiency.