Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Load-Store Unit Controls

The Intel® oneAPI DPC++/C++ Compiler allows you to control the LSU that is generated for global memory accesses via a set of templated options in the ext::intel::lsu class. The ext::intel::lsu class has two member functions, load() and store(), which allow loading from and storing to a global pointer, respectively. See also Annotating Unified Shared Memory Pointers.

NOTE:
Not all LSU control options listed below work with both load() and store() member functions. Not all styles and types are supported on every target device. Thus, the use of LSU control options results in a specific type of LSU.

LSU Control Options Available in the ext::intel::lsu Class
Control Syntax Value Default Value Supported Functionality
Burst-Coalesce ext::intel::burst_coalesce<B> B is a boolean False Load and store
Static Coalescing ext::intel::statically_coalesce<B> B is a boolean True Load and store
Prefetch ext::intel::prefetch<B> B is a boolean false Only loads
Cache ext::intel::cache<N> N is an integer greater than or equal to 0. 0 Only loads
Pipeline No option provided - Defaults above result in this. Load and store

Burst-Coalesce

Currently, not every combination of LSU controls is supported by the compiler. The following rules apply:

  • For store, the ext::intel::cache control must be 0 and the ext::intel::prefetch control must be false.
  • For load, if the ext::intel::cache control is set to a value greater than 0, then the ext::intel::burst_coalesce control must be set to true.
  • For load, exactly one of ext::intel::prefetch and ext::intel::burst_coalesce control options are allowed to be true.
  • For load, exactly one of ext::intel::prefetch and ext::intel::cache control options are allowed to be true.

The burst-coalesce control helps the Intel® oneAPI DPC++/C++ Compiler infer a burst-coalesced LSU. It can be combined with other attributes such as cache. For more details about this LSU type, refer to Burst-Coalesced Load-Store Units.

The following is an example of burst-coalesce LSU control:

using BurstCoalescedLSU =
  ext::intel::lsu<ext::intel::burst_coalesce<true>,
                  ext::intel::statically_coalesce<false>>;
BurstCoalescedLSU::store(output_ptr, X);

Static Coalescing

The static-coalescing control helps the Intel® oneAPI DPC++/C++ Compiler turn on or off static coalescing for global memory accesses. Static coalescing of memory accesses is the default behavior of the compiler, so this feature can be used to turn off static coalescing. It can be combined with other attributes such as burst_coalesce. For more details about this memory optimization modifier, refer to Static Memory Coalescing.

The following is an example that shows how to use the LSU control to turn off static-coalescing LSU control:

using NonStaticCoalescedLSU =
  ext::intel::lsu<ext::intel::burst_coalesce<true>, 
                  ext::intel::statically_coalesce<false>>;

int X = NonStaticCoalescedLSU::load(input_ptr);

Prefetch

The prefetch control helps the Intel® oneAPI DPC++/C++ Compiler infer a prefetch style LSU. The compiler honors this control only if it is physically possible. If there is a risk that an LSU is not functionally correct, it is not inferred. This control also works only for load accesses. For more details about this LSU type, refer to Prefetching Load-Store Units.

The following is an example of prefetch LSU control:

using PrefetchingLSU =
  ext::intel::lsu<ext::intel::prefetch<true>,
                  ext::intel::statically_coalesce<false>>;

int X = PrefetchingLSU::load(input_ptr);

Cache

The cache control helps the Intel® oneAPI DPC++/C++ Compiler turn on or off caching behavior and set the size of the cache. This LSU modifier can be useful for parallel_for kernels. The cache control can be combined with other attributes such as burst_coalesce. For more details about this LSU modifier, refer to Cached.

The following is an example of cache LSU control:

using CachingLSU =
  ext::intel::lsu<ext::intel::burst_coalesce<true>,
		                ext::intel::cache<1024>,
		                ext::intel::statically_coalesce<false>>;

int X = CachingLSU::load(input_ptr);

Pipeline

The pipeline control helps the Intel® oneAPI DPC++/C++ Compiler infer the LSU style of pipeline. It can be combined with other attributes such as static coalesce. For more details about this LSU type, refer to Pipelined Load-Store Units.

For example:

using PipelinedLSU = ext::intel::lsu<>;

PipelinedLSU::store(output_ptr,X);