Intel® High Level Synthesis Compiler Pro Edition: Reference Manual
A newer version of this document is available. Customers should click here to go to the newest version.
4.4.3.1. Load-Store Unit Types
The Intel® HLS Compiler determines the types of load-store units (LSUs) to instantiate and whether to coalesce memory accesses based on from the memory access pattern that the compiler infers.
- Burst-coalesced LSUs
- Nonaligned burst-coalesced LSUs
- The Intel® HLS Compiler typically instantiates burst-coalesced LSUs for accessing variable-latency Avalon® MM Host interfaces.
- Pipelined LSUs
- Never-stall pipelined LSUs
- The Intel® HLS Compiler typically instantiates pipelined LSUs for accessing fixed-latency Avalon® MM Host interfaces or on-chip memories.
Click LSUs in the System Viewer (in the High-Level Design Reports) to see which types of LSU the compiler instantiated for your component.

Burst-Coalesced Load-Store Units
By default, the compiler infers burst-coalesced load-store units (LSUs) for any variable-latency Avalon® MM Host interface.
A burst-coalesced LSU dynamically buffers contiguous memory requests until the largest possible burst can be made or until the LSU receives no new requests for a given period of time. The largest possible burst is defined by the ihc::maxburst parameter. For noncontiguous memory requests, a burst-coalesced LSU flushes the buffer between requests.
Burst-coalsced LSUs provide efficient, variable-latency access to memories outside of your component. However, they require a considerable amount of FPGA resources.
#include "HLS/hls.h"
component void
burst_coalesced(ihc::mm_host<int, ihc::dwidth<64>, ihc::awidth<32>,
ihc::aspace<1>, ihc::latency<0>> &in,
ihc::mm_host<int, ihc::dwidth<64>, ihc::awidth<32>,
ihc::aspace<2>, ihc::latency<0>> &out,
int i) {
int value = in[i / 2]; // Burst-coalesced LSU
out[i] = value; // Burst-coalesced LSU
}
Depending on the memory access pattern and other attributes, the compiler might modify a burst-coalesced LSU to be a nonaligned burst-coalesced LSU.
Nonaligned Burst-coalesced LSUs
When a burst-coalesced LSU can access a memory that is not aligned to the external memory word size, the Intel® HLS Compiler creates a nonaligned burst-coalesced LSU. Nonaligned LSUs typically require more FPGA resources to implement than aligned LSUs. The throughput of a nonaligned LSU might be reduced if it receives many unaligned requests.
#include "HLS/hls.h"
struct State {
int x;
int y;
int z;
};
component void
static_coalescing(ihc::mm_host<State, ihc::dwidth<128>, ihc::awidth<32>,
ihc::aspace<1>, ihc::latency<0>> &in,
ihc::mm_host<State, ihc::dwidth<128>, ihc::awidth<32>,
ihc::aspace<2>, ihc::latency<0>> &out,
int i) {
out[i] = in[i]; // Two Nonaligned Burst-coalesced LSUs
The figure that follows (Figure 5) shows the external memory contents for the previous code example and the nonaligned burst-coalesced LSUs in the component pipeline.
The data type that is read and written is a 96-bit-wide struct. The external memory width is 128 bits. This difference between the read/write data width and the external memory width forces some of the memory requests to span two consecutive memory words.
Pipelined Load-Store Units
By default, the compiler infers pipelined load-store units (LSUs) for any fixed-latency Avalon® MM Host interface and on-device memories
In a pipelined LSU, requests are submitted when they are received and no buffering occurs. Pipelined LSUs are also used for accessing memories inside your component.
You can tell the compiler to instantiate pipelined LSUs for variable-latency MM Host interfaces. However, variable-latency interface access with pipelined LSUs might reduce throughput because pipelined LSUs do not combine sequential memory requests into bursts.
Memory accesses are pipelined, so multiple requests can be in flight at the same time.
#include "HLS/hls.h"
component void
pipelined(ihc::mm_host<int, ihc::dwidth<64>, ihc::awidth<32>,
ihc::aspace<1>, ihc::latency<2>> &in,
ihc::mm_host<int, ihc::dwidth<64>, ihc::awidth<32>,
ihc::aspace<1>, ihc::latency<2>> &out,
int gi, int li) {
int lmem[1024];
int res = in[gi]; // Pipelined LSU
for (int i = 0; i < 4; i++) {
lmem[li - i] = res; // Pipelined LSU
res >>= 1;
}
res = 0;
for (int i = 0; i < 4; i++) {
res ^= lmem[li - i]; // Pipelined LSU
}
out[gi] = res; // Pipelined LSU
}
Never-Stall Pipelined LSUs
If a pipelined LSU is connected to a memory inside the component or to a fixed-latency MM Host interface without arbitration, a never-stall LSU is created because all accesses to the memory take a fixed number of cycles that are known to the compiler.
#include "HLS/hls.h"
component void
neverstall(ihc::mm_host<int, ihc::dwidth<128>, ihc::awidth<32>,
ihc::aspace<1>, ihc::latency<0>> &in,
ihc::mm_host<int, ihc::dwidth<128>, ihc::awidth<32>,
ihc::aspace<1>, ihc::latency<0>> &out,
int gi, int li) {
int lmem[1024];
for (int i = 0; i < 1024; i++)
lmem[i] = in[i]; // Pipelined never-stall LSU
out[gi] = lmem[li] ^ lmem[li + 1]; // Pipelined never-stall LSU
}