5.2.4. Loop Concurrency (max_concurrency Pragma)

Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

Download PDF

ID 683342

Date 4/22/2019

Version

Public

Visible to Intel only — GUID: nvy1519750164391

Ixiasoft

View Details

5.2.4. Loop Concurrency (max_concurrency Pragma)

You can use the max_concurrency pragma to decrease the concurrency of a loop in your component. The concurrency of a loop is how many iterations of that loop can be in progress at one time. By default, the Intel® FPGA SDK for OpenCL™ tries to maximize the concurrency of loops so that your component runs at peak throughput.

The max_concurrency pragma applies to single work-item kernels (that is, single-threaded kernels) in which loops are pipelined. Refer to the Single Work-Item Kernel versus NDRange Kernel section of the Intel® FPGA SDK for OpenCL™ Standard Edition Best Practices Guide for information on loop pipelining, and on kernel properties that drive the offline compiler's decision on whether to treat a kernel as single-threaded.

The max_concurrency pragma enables you to control the on-chip memory resources required to implement your loop. To achieve simultaneous execution of loop iterations, the offline compiler must create independent copies of any memory that is private to a single iteration. The greater the permitted concurrency, the more copies the compiler must make.

The kernel's HTML report (report.html) provides the following information pertaining to loop concurrency:

Maximum concurrency that the offline compiler has chosen
This information is available in the Loop Analysis report. A message in the Details pane reports that the maximum number of simultaneous executions has been limited to N.
Impact to memory usage
This information is available in the Area Analysis report. A message in the Details pane reports that the offline compiler has created N independent copies of the memory to enable simultaneous execution of N loop iterations.

If you want to exchange some performance for physical memory savings, apply #pragma max_concurrency <N> to the loop, as shown below. When you apply this pragma, the offline compiler limits the number of simultaneously-executed loop iterations to N. The number of independent copies of loop memories is also reduced to N.


#pragma max_concurrency 1
for (int i = 0; i < N; i++) {
  int arr[M];
  // Doing work on arr
}

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® FPGA SDK for OpenCL™ Standard Edition: Programming Guide

5.2.4. Loop Concurrency (max_concurrency Pragma)