3.6.3. Controlling the Load-Store Units

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

Download PDF

ID 683521

Date 3/28/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

3.6.3. Controlling the Load-Store Units

The Intel® FPGA SDK for OpenCL™ Offline Compiler allows you to control the type of LSU that is being generated for global memory accesses via a set of built-in calls that you can use for loading from and storing to global memory.

Load Built-ins

The variations of the load built-in are summarized in the following table:

Table 10. Load Built-ins
Built-in	LSU Type Implemented
`__pipelined_load()`	Pipelined if possible
`__prefetching_load()`	Prefetching if possible
`__burst_coalesced_load()`	Burst-coalesced
`__burst_coalesced_cached_load()`	Burst-coalesced cached if possible

All variations expect the following arguments:

Table 11. Load Built-in Arguments
Built-in	Type	Description
Argument #1	Pointer	Memory location to load from.
Argument #2	Integer	Available only for `__burst_coalesced_cached_load()` function. Describes the LSU cache size in bytes. Non-negative compile-time constant integer.
Return value	Object	Data that the pointer argument points to. Same type as the base type of the pointer argument.

Store Built-ins

The variations of the store built-in are summarized in the following table:

Table 12. Store Built-ins
Built-in	LSU Type Implemented
`__pipelined_store()`	Pipelined if possible
`__burst_coalesced_store()`	Burst-coalesced

All variations expect the following arguments:

Table 13. Store Built-in Arguments
Built-in	Type	Description
Argument #1	Pointer	Memory location to store to.
Argument #2	Same as the pointer's base type	Value to be stored.

Note: All variations of the store built-in are non-value-returning.

Example

Following is an OpenCL example depicting different variations of the load and the store built-ins:

kernel void oclTest(global int * restrict in, 
                    global int * restrict out) {
    int i = get_global_id(0);
  
    int a1 = __pipelined_load(in + 3*i+0); // Uses a pipelined LSU
    // Uses a burst-coalesced LSU with a cache of size 1024 bytes
    int a2 = __burst_coalesced_cached_load(&in[3*i+1], 1024);  
    int a3 = __prefetching_load(&in[3*i+2]); // Uses a prefetching LSU

    __burst_coalesced_store(&out[3*i+0], a3); // Uses a burst-coalesced LSU
}

Note:

The compiler does not allow you to select an LSU that may cause functionally incorrect results in the context in which it is being requested. For example, if you request a prefetching LSU on a volatile pointer, the compiler errors out. The compiler also errors out if caching is requested in a situation where the cache (which is local to the LSU) may become incoherent due to other LSUs writing to memory.
The prefetching LSU is not available on the Intel® Stratix® 10 device.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

3.6.3. Controlling the Load-Store Units

Load Built-ins

Store Built-ins

Example