Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

3.3.4.1. Component Memory

If you declare an array inside your component, the Intel® HLS Compiler creates component memory in hardware. Component memory is sometimes referred to as local memory or on-chip memory because it is created from memory resources (such as RAM blocks) available on the FPGA.

The following source code snippet results in the creation of a component memory system, an interface to an external memory system, and access to these memory systems:

#include <HLS/hls.h>

constexpr int SIZE = 128;
constexpr int N = SIZE - 1;

using HostInterface = ihc::mm_host<int, ihc::waitrequest<true>, 
                                       ihc::latency<0>>;

component void memoryComponent(HostInterface &hostA)
{
    hls_memory int T[SIZE]; // declaring an array as a component memory
    for (unsigned i = 0; i < SIZE; i++)
    {
        T[i] = i; // writing to component memory
    }
    for (int i = 0; i < N; i += 2)
    {
        // reading from a component memory and writing to a external 
        // Avalon memory-mapped agent component through an Avalon
        // memory-mapped host interface
        hostA[i] = T[i] + T[i + 1];
    }
}

The compiler performs the following tasks to build a memory system:

  • Build a component memory from FPGA memory resources (such as block RAMs) and presents it to the datapath as a single memory.
  • Map each array access to a load-store unit (LSU) in the datapath that transacts with the component memory through its ports.
  • Automatically optimizes the component memory geometry to maximize the bandwidth available to loads and stores in the datapath.
  • Attempts to guarantee that component memory accesses never stall.
To learn more about controlling memory system architectures, review the following topics:

Stallable and Stall-Free Memory Systems

Accesses to a memory (read or write) can be stall-free or stallable:
Stall-free memory access
A memory access is stall-free if it has contention-free access to a memory port. A memory system is stall-free if each of its memory operations has contention-free access to a memory port.
Stallable memory access
A memory access is stallable if it does not have contention free access to a memory port. When two datapath LSUs try to transact with a memory port in the same clock cycle, one of those memory accesses is delayed (or stalled) until the memory port in contention becomes available.

As much as possible, the Intel® HLS Compiler tries to create stall-free memory systems for your component.

Figure 22. Examples of Stall-free and Stallable Memory Systems
This figure shows the following example memory systems:
  • A: A stall-free memory system

    This memory system is stall-free because, even though the reads are scheduled in the same cycle, they are mapped to different ports. There is no contention for accessing the memory ports.

  • B: A stall-free memory system

    This memory system is stall-free because the two reads are statically-scheduled to occur in different clock cycles. The two reads can share a memory port without any contention for the read access.

  • C: A stallable memory system

    This memory system is stallable because two reads are mapped to the same port in the same cycle. The two reads happen at the same time. There reads require collision arbitration to manage their port access requests, and arbitration can affect throughput.



A component memory system consists of the following parts:
Port
A memory port is a physical access point into a memory. A port is connected to one or more load-store units (LSUs) in the datapath. An LSU can connect to one or more ports. A port can have one or more LSUs connected.
Bank
A memory bank is a division of the component memory system that contains of subset of the data stored. That is, all the of the data stored for a component is split across banks, with each bank containing a unique piece of the stored data.

A memory system always has at least one bank.

Replicate
A memory bank replicate is a copy of the data in the memory bank with its own ports. All replicates in a bank contain the same data. Each replicate can be accessed independent of the others

A memory bank always has at least one replicate.

Private Copy
A private copy is a copy of the data in a replicate that is created for nested loops to enable concurrent iterations of the outer loop.

A replicate can comprise multiple private copies, with each iteration of an outer loop having its own private copy. Because each outer loop iteration has its own private copy, private copies are not expected to all contain the same data.

The following figure illustrates the relationship between banks, replicates, ports, and private copies:

Figure 23. Schematic Representation of Component Memories Showing the Relationship between Banks, Replicates, Ports, and Private Copies


Strategies that Enable Concurrent Stall-Free Memory Accesses

The compiler uses a variety of strategies to ensure that concurrent accesses are stall-free including:

Despite the compiler’s best efforts, the component memory system can still be stallable. This might happen due to resource constraints or memory attributes defined in your source code. In that case, the compiler tries to minimize the hardware resources consumed by the arbitrated memory system.