Intel® FPGA SDK for OpenCL™ Pro Edition: Best Practices Guide

ID 683521
Date 12/19/2022
Public
Document Table of Contents

3.4.3.1. Reducing the Area Consumed by Nested Loops Using loop_coalesce

When loops are nested to a depth greater than three, more area is consumed.

Consider the following example where orig and lc_test kernels are used to illustrate how to reduce latency in nested loops.

The orig kernel has nested loops to a depth of four. The nested loops created extra blocks (Block 2, 3, 4, 6, 7 and 8) that consume area due to the variables being carried, as shown in the following reports:

Figure 65. Area Report and System Viewer (System View) Before and After Loop Coalescing

Due to loop coalescing, you can see the reduced latency in the lc_test. The Block 5 of orig kernel and Block 12 of lc_test kernel are the inner most loops.

Figure 66. Area Report of lc_test and orig Kernels