Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 4/01/2024
Public
Document Table of Contents

4.1.2. Avalon® Memory Mapped Host Interfaces

By default, pointers in a component are implemented as Avalon® Memory Mapped ( Avalon® MM) host interfaces with default settings. You can mitigate poor performance from the default settings by configuring the Avalon® MM host interfaces.

You can configure the Avalon® MM host interface for the vector addition component example using the ihc::mm_host class as follows:

component void vector_add(
  ihc::mm_host<int, ihc::aspace<1>, ihc::dwidth<8*8*sizeof(int)>, 
                 ihc::align<8*sizeof(int)> >& a,
  ihc::mm_host<int, ihc::aspace<2>, ihc::dwidth<8*8*sizeof(int)>, 
                 ihc::align<8*sizeof(int)> >& b,
  ihc::mm_host<int, ihc::aspace<3>, ihc::dwidth<8*8*sizeof(int)>, 
                 ihc::align<8*sizeof(int)> >& c,
  int N) {
  #pragma unroll 8
  for (int i = 0; i < N; ++i) {
      c[i] = a[i] + b[i];
  }
}
The memory interfaces for vector a, vector b, and vector c have the following attributes specified:
  • The vectors are each assigned to different address spaces with the ihc::aspace attribute, and each vector receives a separate Avalon® MM host interface.

    With the vectors assigned to different physical interfaces, the vectors can be accessed concurrently without interfering with each other, so memory arbitration is not needed.

  • The width of the interfaces for the vectors is adjusted with the ihc::dwidth attribute.
  • The alignment of the interfaces for the vectors is adjusted with the ihc::align attribute.
The following diagram shows the Function View in the System Viewer that is generated when you compile this example.
Figure 25. System Viewer Function View for vector_add Component with Avalon® MM Host Interface


The diagram shows that vector_add.B2 has two loads and one store. The default Avalon® MM Host settings used by the code example in Pointer Interfaces had 16 loads and 8 stores.

By expanding the width and alignment of the vector interfaces, the original pointer interface loads and stores were coalesced into one wide load each for vector a and vector b, and one wide store for vector c.

Also, the memories are stall-free because the loads and stores in this example access separate memories.

Compiling this component with an Quartus® Prime compilation flow targeting an Arria® 10 device results in the following QoR metrics:
Table 3.  QoR Metrics Comparison for Avalon MM Host Interface1
QoR Metric Pointer Avalon MM Host
ALMs 15593.5 643
DSPs 0 0
RAMs 30 0
fMAX (MHz)2 298.6 472.37
Latency (cycles) 24071 142
Initiation Interval (II) (cycles) ~508 1
1The compilation flow used to calculate the QoR metrics used Quartus® Prime Pro Edition Version 17.1.
2The fMAX measurement was calculated from a single seed.
All QoR metrics improved by changing the component interface to a specialized Avalon® MM Host interface from a pointer interface. The latency is close to the ideal latency value of 128, and the loop initiation interval (II) is 1.
Important: This change to a specialized Avalon® MM Host interface from a pointer interface requires the system to have three separate memories with the expected width. The initial pointer implementation requires only one system memory with a 64-bit wide data bus. If the system cannot provide the required memories, you cannot use this optimization.