Intel® FPGA SDK for OpenCL™ Pro Edition: Programming Guide

ID 683846
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

11.1.2.1. Overview: Intel FPGA SDK for OpenCL Pipeline Approach

The following figure depicts the architecture of an Intel FPGA SDK for OpenCL pipeline:
Figure 29. Parallel Execution Model of Intel FPGA SDK for OpenCL Pipeline StagesThe operations on the right represent the SDK's pipeline implementation of the OpenCL kernel code on the left. Each yellow box is an operation or data value found in the pipeline. The number associated with each operation represents the number of threads in the pipeline.

Assume each level of operation is one stage in the pipeline. At each stage, the Intel® FPGA SDK for OpenCL™ Offline Compiler executes all operations in parallel by the thread existing at that stage. For example, thread 2 executes Load A, Load B, and copies the current global ID (via gid) to the next pipeline stage. Similar to the pipelined execution on instructions in reduced instruction set computing (RISC) processors, the SDK's pipeline stages also execute in parallel. The threads advances to the next pipeline stage only after all the stages have completed execution.

Some operations are capable of stalling the Intel FPGA SDK for OpenCL pipeline. Examples of such operations include variable latency operations like memory load and store operations. To support stalls, ready and valid signals need to propagate throughout the pipeline so that the offline compiler can schedule the pipeline stages. However, ready signals are not necessary if all operations have fixed latency. In these cases, the offline compiler optimizes the pipeline to statically schedule the operations, which significantly reduces the logic necessary for pipeline implementation.