Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

max_concurrency Attribute

Use the max_concurrency attribute to limit the concurrency of a loop in your kernel. The concurrency of a loop is how many iterations of that loop can be in progress at one time. By default, the Intel® oneAPI DPC++/C++ Compiler tries to maximize the concurrency of loops so that your kernel runs at peak throughput.

Syntax

[[intel::max_concurrency(n)]]

The max_concurrency attribute applies to pipelined loops in single task kernels. Refer to Pipelining for information about loop pipelining.

The max_concurrency attribute enables you to control the on-chip memory resources required to pipeline your loop. To achieve simultaneous execution of loop iterations, the Intel® oneAPI DPC++/C++ Compiler must create copies of any memory that is private to a single iteration. These copies are called private copies. The greater the permitted concurrency, the more private copies the compiler must create.

The attribute parameter n is required and must be a non-negative constant expression of integer type. The parameter directs the compiler to restrict the loop’s concurrency to n simultaneous iterations.

The kernel’s report.html (Review the FPGA Optimization Report) provides the following information pertaining to loop concurrency:

  • Maximum concurrency that the Intel® oneAPI DPC++/C++ Compiler has chosen: This information is available in the Loop Analysis report and Kernel Memory Viewer.
    • In the Loops Analysis report, a message in the Details pane reports as the maximum number of simultaneous executions has been limited to n.
      NOTE:

      The value of unsigned N can be greater than or equal to zero. A value of N = 0 indicates unlimited concurrency.

    • In the Memory Viewer, the bank view of your local memory graphically shows the number of private copies.
  • Impact to memory usage: This information is available in the Area Estimates report. A message in the Details pane reports that the Intel® oneAPI DPC++/C++ Compiler has created N independent copies of the memory to enable simultaneous execution of N loop iterations.

    If you want to exchange some performance for physical memory savings, apply [[intel::max_concurrency(n)]] to the loop, as shown in the following code snippet:

    [[intel::max_concurrency(1)]]
    ​for (int i = 0; i < N; i++) {
      int arr[M];
      // Doing work on arr
    }

    When you apply this attribute, the Intel® oneAPI DPC++/C++ Compiler limits the number of simultaneously-executed loop iterations to n. The number of private copies of loop-scooped memories is also restricted to n.

    You can also control the number of private copies (created for a local memory and accessed within a loop) by using [[intel::private_copies(N)]]. If a local memory with [[intel::private_copies(N)]] is accessed with a loop that has [[intel::max_concurrency(M)]] attribute, the Intel® oneAPI DPC++/C++ Compiler limits the number of simultaneously-executed loop iterations to min(M,N). For more information about [[intel::private_copies(N)]], refer to FPGA Memory Attributes. For additional information about using [[intel::private_copies(N)]], refer to the FPGA tutorial sample “Private Copies” listed in the Intel® oneAPI Samples Browser on Linux* or Windows*, or access the code sample on GitHub.