Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Memory Accesses

Memory access efficiency often dictates the overall performance of your SYCL* kernel. Refer to Memory Types for an introduction to memory accesses.

The pipeline parallel nature of SYCL execution on FPGA means that memory loads and stores in your SYCL code compete for access to memory resources (global, local, and private memories). If your SYCL kernel performs a large number of memory accesses, the compiler must generate arbitration logic to share the available memory bandwidth between memory access sites in your kernel's datapath. If the bandwidth demanded by the datapath exceeds what the memory and arbitration logic can provide, the datapath stalls. This degrades the kernel’s throughput because the compute pipeline must wait for a memory access before resuming.

When optimizing your design, it is important to understand whether your kernel's throughput is limited by memory accesses (a memory-bound kernel) or by the structure of the kernel datapath (a compute-bound kernel). These situations require different optimization techniques. The following sections discuss memory access optimization in detail.

Consider the following when developing your SYCL code:

  • The maximum computation bandwidth of an FPGA is much larger than the available global memory bandwidth.
  • The available global memory bandwidth is much smaller than the local and private memory bandwidth.
  • Minimize the number of global memory accesses.