Visible to Intel only — GUID: wgt1518024696077
Ixiasoft
Visible to Intel only — GUID: wgt1518024696077
Ixiasoft
1.2. Pipelines
The designs of microprocessors, digital signal processors (DSPs), hardware accelerators, and other high performance implementations of digital hardware often contain pipeline architectures.
For example, the diagram below represents the following example code fragment as a multistage pipeline:
for (i = 0; i < 1024; i++)
{
y[i] = (a[i] + b[i] + c[i] + d[i] + e[i] + f[i] + g[i] + h[i]) >> 3;
}
With a pipelined architecture, each arithmetic operation passes into the pipeline one at a time. Therefore, as shown in the diagram above, a saturated pipeline consists of eight stages that calculate the arithmetic operations simultaneously and in parallel. In addition, because of the large number of loop iterations, the pipeline stages continue to perform these arithmetic instructions concurrently for each subsequent loop iteration.
Pipeline Approach
A new pipeline is constructed based on your design. As a result, it can accommodate the highly configurable nature of FPGAs.
Consider the following OpenCL code fragment:
C = (A >> 5) + B;
F = (D – E) << 3;
G = C + F;
You can configure an FPGA to instantiate a complex pipeline structure that executes the entire code simultaneously. In this case, the SDK implements the code as two independent pipelined entities that feed into a pipelined adder, as shown in the figure below.
The provides a custom pipeline structure that speeds up computation by allowing operations within a large number of work-items to occur concurrently. The offline compiler can create a custom pipeline that calculates the values for variables C, F and G every clock cycle, as shown below. After a ramp-up phase, the pipeline sustains a throughput of one work-item per cycle.
A traditional processor has a limited set of shared registers. Eventually, a processor must write the stored data out to memory to allow more data to occupy the registers. The offline compiler keeps data "live" by generating enough registers to store the data for all the active work-items within the pipeline. The following code example and figure illustrate a live variable C in the OpenCL pipeline:
size_t index = get_global_id(0);
C = A[index] + B[index];
E[index] = C – D[index];