Visible to Intel only — GUID: GUID-F63BCFF2-DCD7-4834-AD14-12BC3DB7AD75
Visible to Intel only — GUID: GUID-F63BCFF2-DCD7-4834-AD14-12BC3DB7AD75
max_concurrency Attribute
Use the max_concurrency attribute to limit the concurrency of a loop in your kernel. The concurrency of a loop is how many iterations of that loop can be in progress at one time. By default, the Intel® oneAPI DPC++/C++ Compiler tries to maximize the concurrency of loops so that your kernel runs at peak throughput.
Syntax
[[intel::max_concurrency(n)]]
The max_concurrency attribute applies to pipelined loops in single task kernels. Refer to Pipelining for information about loop pipelining.
The max_concurrency attribute enables you to control the on-chip memory resources required to pipeline your loop. To achieve simultaneous execution of loop iterations, the Intel® oneAPI DPC++/C++ Compiler must create copies of any memory that is private to a single iteration. These copies are called private copies. The greater the permitted concurrency, the more private copies the compiler must create.
The attribute parameter n is required and must be a non-negative constant expression of integer type. The parameter directs the compiler to restrict the loop’s concurrency to n simultaneous iterations.
The kernel’s report.html (Review the FPGA Optimization Report) provides the following information pertaining to loop concurrency:
- Maximum concurrency that the Intel® oneAPI DPC++/C++ Compiler has chosen: This information is available in the Loop Analysis report and Kernel Memory Viewer.
- In the Loops Analysis report, a message in the Details pane reports as the maximum number of simultaneous executions has been limited to n.
NOTE:
The value of unsigned N can be greater than or equal to zero. A value of N = 0 indicates unlimited concurrency.
- In the Memory Viewer, the bank view of your local memory graphically shows the number of private copies.
- In the Loops Analysis report, a message in the Details pane reports as the maximum number of simultaneous executions has been limited to n.
- Impact to memory usage: This information is available in the Area Estimates report. A message in the Details pane reports that the Intel® oneAPI DPC++/C++ Compiler has created N independent copies of the memory to enable simultaneous execution of N loop iterations.
If you want to exchange some performance for physical memory savings, apply [[intel::max_concurrency(n)]] to the loop, as shown in the following code snippet:
[[intel::max_concurrency(1)]] for (int i = 0; i < N; i++) { int arr[M]; // Doing work on arr }
When you apply this attribute, the Intel® oneAPI DPC++/C++ Compiler limits the number of simultaneously-executed loop iterations to n. The number of private copies of loop-scooped memories is also restricted to n.
You can also control the number of private copies (created for a local memory and accessed within a loop) by using [[intel::private_copies(N)]]. If a local memory with [[intel::private_copies(N)]] is accessed with a loop that has [[intel::max_concurrency(M)]] attribute, the Intel® oneAPI DPC++/C++ Compiler limits the number of simultaneously-executed loop iterations to min(M,N). For more information about [[intel::private_copies(N)]], refer to FPGA Memory Attributes. For additional information about using [[intel::private_copies(N)]], refer to the FPGA tutorial sample “Private Copies” on GitHub.