Visible to Intel only — GUID: tbv1593102559359
Ixiasoft
Visible to Intel only — GUID: tbv1593102559359
Ixiasoft
3.3.4.1. Component Memory
If you declare an array inside your component, the Intel® HLS Compiler creates component memory in hardware. Component memory is sometimes referred to as local memory or on-chip memory because it is created from memory resources (such as RAM blocks) available on the FPGA.
The following source code snippet results in the creation of a component memory system, an interface to an external memory system, and access to these memory systems:
#include <HLS/hls.h> constexpr int SIZE = 128; constexpr int N = SIZE - 1; using HostInterface = ihc::mm_host<int, ihc::waitrequest<true>, ihc::latency<0>>; component void memoryComponent(HostInterface &hostA) { hls_memory int T[SIZE]; // declaring an array as a component memory for (unsigned i = 0; i < SIZE; i++) { T[i] = i; // writing to component memory } for (int i = 0; i < N; i += 2) { // reading from a component memory and writing to a external // Avalon memory-mapped agent component through an Avalon // memory-mapped host interface hostA[i] = T[i] + T[i + 1]; } }
The compiler performs the following tasks to build a memory system:
- Build a component memory from FPGA memory resources (such as block RAMs) and presents it to the datapath as a single memory.
- Map each array access to a load-store unit (LSU) in the datapath that transacts with the component memory through its ports.
- Automatically optimizes the component memory geometry to maximize the bandwidth available to loads and stores in the datapath.
- Attempts to guarantee that component memory accesses never stall.
Stallable and Stall-Free Memory Systems
- Stall-free memory access
- A memory access is stall-free if it has contention-free access to a memory port. A memory system is stall-free if each of its memory operations has contention-free access to a memory port.
- Stallable memory access
- A memory access is stallable if it does not have contention free access to a memory port. When two datapath LSUs try to transact with a memory port in the same clock cycle, one of those memory accesses is delayed (or stalled) until the memory port in contention becomes available.
As much as possible, the Intel® HLS Compiler tries to create stall-free memory systems for your component.
- A: A stall-free memory system
This memory system is stall-free because, even though the reads are scheduled in the same cycle, they are mapped to different ports. There is no contention for accessing the memory ports.
- B: A stall-free memory system
This memory system is stall-free because the two reads are statically-scheduled to occur in different clock cycles. The two reads can share a memory port without any contention for the read access.
- C: A stallable memory system
This memory system is stallable because two reads are mapped to the same port in the same cycle. The two reads happen at the same time. There reads require collision arbitration to manage their port access requests, and arbitration can affect throughput.
- Port
- A memory port is a physical access point into a memory. A port is connected to one or more load-store units (LSUs) in the datapath. An LSU can connect to one or more ports. A port can have one or more LSUs connected.
- Bank
-
A memory bank is a division of the component memory system that contains of subset of the data stored. That is, all the of the data stored for a component is split across banks, with each bank containing a unique piece of the stored data.
A memory system always has at least one bank.
- Replicate
-
A memory bank replicate is a copy of the data in the memory bank with its own ports. All replicates in a bank contain the same data. Each replicate can be accessed independent of the others
A memory bank always has at least one replicate.
- Private Copy
-
A private copy is a copy of the data in a replicate that is created for nested loops to enable concurrent iterations of the outer loop.
A replicate can comprise multiple private copies, with each iteration of an outer loop having its own private copy. Because each outer loop iteration has its own private copy, private copies are not expected to all contain the same data.
The following figure illustrates the relationship between banks, replicates, ports, and private copies:
Strategies that Enable Concurrent Stall-Free Memory Accesses
The compiler uses a variety of strategies to ensure that concurrent accesses are stall-free including:
- Adjusting the number of ports the memory system has. This can be done either by replicating the memory to enable more read ports or by clocking the RAM block at twice the component clock speed, which enables four ports per replicate instead of two.
Clocking the RAM block at twice the component clock speed to double the number of available ports to the memory system is called double pumping.
All of a replicate's physical access ports can be accessed concurrently.
- Partitioning memory content into one or more banks, such that each bank contains a subset of the data contained in the original memory (corresponds to the top-right box of Schematic Representation of Local Memories Showing the Relationship between Banks, Replicates, Ports, and Private Copies).
The banks of a component memory can be accessed concurrently by the datapath.
- Replicating a bank to create multiple coherent replicates (corresponds to the bottom-left box of Schematic Representation of Local Memories Showing the Relationship between Banks, Replicates, Ports, and Private Copies). Each replicate in a bank contains identical data.
The replicates are loaded concurrently.
- Creating private copies of an array that is declared inside of a loop nest (corresponds to the bottom-right box of Schematic Representation of Local Memories Showing the Relationship between Banks, Replicates, Ports, and Private Copies).
These private copies enable loop pipelining because each pipeline-parallel loop iteration accesses it own private copy of the array declared within the loop body. Private copies are not expected to contain the same data.
Despite the compiler’s best efforts, the component memory system can still be stallable. This might happen due to resource constraints or memory attributes defined in your source code. In that case, the compiler tries to minimize the hardware resources consumed by the arbitrated memory system.