max_concurrency Attribute

Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

Download PDF

ID 767853

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

max_concurrency Attribute

Use the max_concurrency attribute to limit the concurrency of a loop in your kernel. The concurrency of a loop is how many iterations of that loop can be in progress at one time. By default, the Intel® oneAPI DPC++/C++ Compiler tries to maximize the concurrency of loops so that your kernel runs at peak throughput.

Syntax

[[intel::max_concurrency(n)]]

The max_concurrency attribute applies to pipelined loops in single task kernels. Refer to Pipelining for information about loop pipelining.

The max_concurrency attribute enables you to control the on-chip memory resources required to pipeline your loop. To achieve simultaneous execution of loop iterations, the Intel® oneAPI DPC++/C++ Compiler must create copies of any memory that is private to a single iteration. These copies are called private copies. The greater the permitted concurrency, the more private copies the compiler must create.

The attribute parameter n is required and must be a non-negative constant expression of integer type. The parameter directs the compiler to restrict the loop’s concurrency to n simultaneous iterations.

The kernel’s report.html (Review the FPGA Optimization Report) provides the following information pertaining to loop concurrency:

Maximum concurrency that the Intel® oneAPI DPC++/C++ Compiler has chosen: This information is available in the Loop Analysis report and Kernel Memory Viewer.
- In the Loops Analysis report, a message in the Details pane reports as the maximum number of simultaneous executions has been limited to n.
  
  NOTE:
  
  The value of unsigned N can be greater than or equal to zero. A value of N = 0 indicates unlimited concurrency.
- In the Memory Viewer, the bank view of your local memory graphically shows the number of private copies.
Impact to memory usage: This information is available in the Area Estimates report. A message in the Details pane reports that the Intel® oneAPI DPC++/C++ Compiler has created N independent copies of the memory to enable simultaneous execution of N loop iterations.
If you want to exchange some performance for physical memory savings, apply [[intel::max_concurrency(n)]] to the loop, as shown in the following code snippet:
```
[[intel::max_concurrency(1)]]
for (int i = 0; i < N; i++) {
  int arr[M];
  // Doing work on arr
}
```
When you apply this attribute, the Intel® oneAPI DPC++/C++ Compiler limits the number of simultaneously-executed loop iterations to n. The number of private copies of loop-scooped memories is also restricted to n.

You can also control the number of private copies (created for a local memory and accessed within a loop) by using [[intel::private_copies(N)]]. If a local memory with [[intel::private_copies(N)]] is accessed with a loop that has [[intel::max_concurrency(M)]] attribute, the Intel® oneAPI DPC++/C++ Compiler limits the number of simultaneously-executed loop iterations to min(M,N). For more information about [[intel::private_copies(N)]], refer to FPGA Memory Attributes. For additional information about using [[intel::private_copies(N)]], refer to the FPGA tutorial sample “Private Copies” listed in the Intel® oneAPI Samples Browser on Linux* or Windows*, or access the code sample on GitHub.

Parent topic: Loop Directives

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

FPGA Optimization Guide for Intel® oneAPI Toolkits

max_concurrency Attribute