Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 10/04/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

10.5. Optimizing Data Path Control

To best use the Intel® Stratix® 10 design-specific new data path optimizations, modify your code to remove constructs or features that might prevent the implementation of these optimizations. If there are such constructs or features in your design, the Intel® FPGA SDK for OpenCL™ Offline Compiler will revert to the legacy optimizations, which might result in a lower fMAX. For example, if the offline compiler must instantiate cached LSUs for the memory access pattern, it will not enable the new optimizations.

The following constructs or features prevent Intel® Stratix® 10 data path optimization:

  • NDRange designs with loops
  • Stallable LSUs with the exception of burst-coalesced LSUs

    Burst-coalesced LSUs are the default type of LSUs that the offline compiler instantiates. Example of a burst coalesced LSU instantiation:

    kernel void burst_coalesced (global int * restrict in,
                                 global int * restrict out){
       int i = get_global_id(0);
       int value = in[i/2];  //Burst-coalesced LSU
       out[i] = value;
    }

    You can view the LSU type of various instructions in the High Level Design Report by hovering over the load or store operation in the System Viewer. Refer to the Load-Store Units section for more information about the types of LSUs and how you can influence the compiler on which type of LSUs to instantiate.

  • Channels with multiple call sites
  • Stallable RTL library calls

    Refer to the Create RTL Modules section for more information.

  • Reconvergent control flow in the optimized control flow graph, with the exception of loops that use the new control optimization

    The following pseudocode example of a simple reconvergent control flow shows that the flow of the code goes in one of two paths. The offline compiler implements different control logic for each path. It also implements logic to reconverge the control flow after the two paths are completed.

    while (some_some condition){
       if (some_other_condition){
          for(...){ }
       } else{
          for(...){ }
       }
    }
  • Loops that do not use the new loop control scheme

    Refer to the Loop Control Optimization section for more information about what loops are affected by this restriction.

  • Basic block structures with the exception of the following:
    • Basic block with only one predecessor, as shown in Figure 88
    • Basic block with exactly two predecessors, where one predecessor is the back-edge of a loop, as shown in Figure 89
    Note: The majority of optimized designs belong to one of the two supported basic block structures. You may review images of these basic blocks in the System Viewer of the High Level Design Report.

    The following code example generates the two types of supported basic block structures:

    __attribute__((max_global_work_dim(0)))
    void kernel basic_block(global unsigned int *myvar,
                                unsigned int insize)
    {
       for(int i=0; i < insize; i++){
          myvar[i] += insize;
       }
    }
Figure 88. Basic Block Structure with One PredecessorThe basic block in question refers to gzip.B2, which is outlined in red. Its predecessor is the previous basic block, gzip.B1.
Figure 89. Basic Block Structure with Two PredecessorsThe basic block in question is gzip.B1, which is outlined in red. This block refers to the body of the loop; its predecessors are itself and the previous basic block, gzip.B0. The purple line in gzip.B1 denotes the back end of the loop.