Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/13/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

8.1. General Guidelines on Optimizing Memory Accesses

Optimizing the memory accesses in your OpenCL™ kernels can improve overall kernel performance.

Consider implementing the following techniques for optimizing memory accesses, whenever possible:

  • If your OpenCL program has a pair of kernels—one produces data and the other one consumes that data—convert them into a single kernel that performs both functions. Also, implement helper functions to logically separate the functions of the two original kernels.
    FPGA implementations favor one large kernel over separate smaller kernels. Kernel unification removes the need to write the results from one kernel into global memory temporarily before fetching the same data in the other kernel.

  • The Intel® FPGA SDK for OpenCL™ Offline Compiler implements local memory in FPGAs very differently than in GPUs. If your OpenCL kernel contains code to avoid GPU-specific local memory bank conflicts, remove that code because the offline compiler generates hardware that avoids local memory bank conflicts automatically whenever possible.