Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

3.3.3.1.2. Pipelining

Similar to the implementation of a CPU with multiple pipeline stages, the compiler generates a deeply-pipelined hardware datapath. For more information, refer to Concepts of FPGA Hardware Design and How Source Code Becomes a Custom Hardware Datapath.

Pipelining allows for many data items to be processed concurrently (in the same clock cycle) while making efficient use of the hardware in the datapath by keeping it occupied.

Pipelining and Vectorizing a Pipelined Datapath

Consider the following example of code mapping to hardware:

Figure 13. Example Code Mapping to Hardware


Multiple invocations of this code when running on a CPU would not be pipelined. The output of an invocation is completed before inputs are passed to the next invocation of the code.

On an FPGA device, this kind of unpipelined invocation results in poor throughput and low occupancy of the datapath because many of the operations are sitting idle while other parts of the datapath are operating on the data. The following figure shows what throughput and occupancy of invocations looks like in this scenario:
Figure 14. Unpipelined Execution Resulting in Low Throughput and Low Occupancy


The Intel® HLS Compiler pipelines your design as much as possible. New inputs can be sent into the datapath each cycle, giving you a fully occupied datapath for higher throughput, as shown in the following figure:
Figure 15. Pipelining the Datapath Results in High Throughput and High Occupancy


You can gain even further throughput by vectorizing the pipelined hardware. Vectorizing the hardware improves throughput, but requires more FPGA area for the additional copies of the pipelined hardware:
Figure 16. Vectorizing the Pipelined Datapath Resulting in High Throughput and High Occupancy


Understanding where the data you need to pipeline is coming from is key to achieving high performance designs on the FPGA. You can use the following sources of data to take advantage of pipelining:

  • Components
  • Loop iterations