Visible to Intel only — GUID: nvy1519750164391
Ixiasoft
Visible to Intel only — GUID: nvy1519750164391
Ixiasoft
5.2.4. Loop Concurrency (max_concurrency Pragma)
The max_concurrency pragma applies to single work-item kernels (that is, single-threaded kernels) in which loops are pipelined. Refer to the Single Work-Item Kernel versus NDRange Kernel section of the Intel® FPGA SDK for OpenCL™ Standard Edition Best Practices Guide for information on loop pipelining, and on kernel properties that drive the offline compiler's decision on whether to treat a kernel as single-threaded.
The max_concurrency pragma enables you to control the on-chip memory resources required to implement your loop. To achieve simultaneous execution of loop iterations, the offline compiler must create independent copies of any memory that is private to a single iteration. The greater the permitted concurrency, the more copies the compiler must make.
The kernel's HTML report (report.html) provides the following information pertaining to loop concurrency:
- Maximum concurrency that the offline compiler has chosen
This information is available in the Loop Analysis report. A message in the Details pane reports that the maximum number of simultaneous executions has been limited to N.
- Impact to memory usage
This information is available in the Area Analysis report. A message in the Details pane reports that the offline compiler has created N independent copies of the memory to enable simultaneous execution of N loop iterations.
If you want to exchange some performance for physical memory savings, apply #pragma max_concurrency <N> to the loop, as shown below. When you apply this pragma, the offline compiler limits the number of simultaneously-executed loop iterations to N. The number of independent copies of loop memories is also reduced to N.
#pragma max_concurrency 1
for (int i = 0; i < N; i++) {
int arr[M];
// Doing work on arr
}