Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 6/26/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

8.4.1. Enable the Intel® HLS Compiler to Infer Data Path Buffer Capacity Requirements

In many situations, the Intel® HLS Compiler can add buffer capacity automatically to the data path in a system of tasks design to achieve maximum throughput for your design. Follow a few best practices to help the Intel® HLS Compiler effectively add data path buffer capacity to your design when needed.

As an example, consider the following design that runs two independent tasks. This kind of structure can be generated by code like the following example:
component foo() {
  // Parse/compute data for tasks
  ihc::launch<task1>(data1);
  ihc::launch<task2>(data2);

  auto r1 = ihc::collect<task1>();
  auto r2 = ihc::collect<task2>();
  // Usage of r1, r2
}
The following diagram shows the state of the system of tasks at the start of the third invocation of the component, and the location of data in the overall pipeline from previous invocations.
Figure 43. Data Flow of Multiple Component Invocations Through a System of TasksThe circles represent pipelined stages of the component, while the number indicate the location of data from different invocations of component foo. This digram shows three invocations of the component underway.


In this diagram, Entry represents the two independent launch calls, and the Exit represents the two independent collect calls.

Entry provides work to both tasks only if both tasks can take in data (that is, both task have available buffer capacity). Similarly, Exit consumes the results only when both results are available.

If Task1 and Task2 have the same number of pipeline stages, then the data path performs at full throughput. Some data path buffer capacity is needed in the caller function to ensure that the caller can continue issuing launch calls while the collect calls wait for the task functions to complete. The compiler adds this data path buffer capacity automatically.

If the two tasks have different pipeline depths, then the design encounters a bottleneck because the task with the smaller pipeline depth lacks the buffer capacity to store finished results while waiting for the other task to finish. In this case, you can add buffer capacity to either launch or the collect call of the task with the smaller pipeline depth. For details about adding launch/collect buffer capacity, see Explicitly Add Buffer Capacity to Your Design When Needed.

The Intel® HLS Compiler tries to balance data path buffer capacity automatically, but it can only add data path capacity automatically when your design follows certain practices.

Use the following best practices to obtain the maximum throughput for your system of tasks design:
  • A component or task function should do one of the following things:
    • Do all of the work by itself without launching other tasks.
    • Act as an orchestrator for issuing ihc::launch or ihc::collect calls and do none of the work.
  • If throughput is a priority for your design, avoid using multiple ihc::launch or ihc::collect calls to the same task function unless you are reusing the calls to the function by iterating in a loop.
  • Keep ihc::launch and ihc::collect calls to the same task function within the same block.

    Review the block structure of your design with the Graph Viewer in the High-Level Design Reports to confirm that your calls are in the same block.

  • Avoid guarding your ihc::launch and ihc::collect calls with an if-condition.

    If you are guarding your ihc::launch and ihc::collect calls with an if-condition, use the same if-condition for both the ihc::launch and ihc::collect calls.