Developer Guide

FPGA Optimization Guide for Intel® oneAPI Toolkits

ID 767853
Date 7/13/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Latency Controls (Beta)

The Intel® oneAPI DPC++/C++ Compiler allows you to set latency constraints between operations with side effects, such as pipes and LSUs, which are visible outside the kernel. Specifically, you can apply latency controls to pipe read/write and LSU load/store.

For stallable operations, the scheduler considers only the inherent latency of the operation without making any assumption about the actual stall time. The compiler strives to achieve the latency constraints. If the requested latency cannot be achieved, even when ignoring the possibility of stalls, the compiler errors out.

Use Model

CAUTION:

The APIs described in this section are experimental. Future versions of latency controls might change these APIs in ways that are incompatible with the version described here.

Latency control APIs are provided on read() and write() member functions of the sycl::ext::intel::experimental::pipe class, and on load() and store() member functions of the sycl::ext::intel::experimental::lsu class. Other than the latency controls support, the experimental pipe and LSU are identical to sycl::ext::intel::pipe and sycl::ext::intel::lsu, and the <sycl/ext/intel/fpga_extensions.hpp> header file provides experimental pipe and LSU. These member functions (read(), write(), load(), and store()) can accept a property list instance (sycl::ext::oneapi::experimental::properties) as a function argument, which can contain the following latency-control properties:

  • sycl::ext::intel::experimental::latency_anchor_id<N>, where N is a signed integer:

    A label that you can associate with pipes and LSU functions listed above. You can then reference this label using the latency_constraint property to define relative latency constraints. You can refer functions with this property as labeled functions.

  • sycl::ext::intel::experimental::latency_constraint<A, B, C>:

    A constraint that you can associate with pipes and LSU functions listed above. It provides a latency constraint between the function associated with the constraint and a different labeled function. You can refer functions with this property as constrained functions. The latency_constraint property has the following parameters:

    • A (signed integer): The label of the labeled function that is constrained relative to the constrained function.
    • B (enum value): The constraint type that can be one of the following:
      • latency_control_type::exact (exact latency)
      • latency_control_type::max (maximum latency)
      • latency_control_type::min (minimum latency)
    • C (signed integer): The relative clock cycle difference between the labeled function and constrained function that the constraint must infer, subject to the type of constraint (exact, max, or main).

Latency Controls and Stall-free Loops

In general, latency anchors and constraints may not span a loop because the number of cycles the loop takes to execute cannot be determined at compile time. Hence, it is not possible to satisfy the latency constraint.

An exception to this rule is a stall-free loop, which has a compile-time constant integer number of iterations and does not contain any stallable statements. These may include arithmetic operations and loads/stores to local stall-free memory. Stall-free loops execute in a compile-time known number of cycles.

Example of a stall-free loop:

float Value = <anything>;
float Result = 0;
for (int i = 0; i < 5; i++)  // compile time integer number of iterations
  Result *= (i+Value);       // arithmetic calculation that cannot stall
// Use Result  	

Example 1: Using Latency Controls on Pipes

The following example shows how to use latency controls on pipes. The example uses a function behaving both as a labeled function and a constrained function:

using Pipe1 = sycl::ext::intel::experimental::pipe<class PipeClass1, int, 8>;
using Pipe2 = sycl::ext::intel::experimental::pipe<class PipeClass2, int, 8>;
using Pipe3 = sycl::ext::intel::experimental::pipe<class PipeClass2, int, 8>;
...
// In kernel:
// The following read has a label 0.
int value = Pipe1::read(sycl::ext::oneapi::experimental::properties(
  sycl::ext::intel::experimental::latency_anchor_id<0>));

// The following write occurs exactly 2 cycles after the label-0 function, i.e.,
// the read above. Also, it has a label 1.
Pipe2::write(
  value,
  sycl::ext::oneapi::experimental::properties(
    sycl::ext::intel::experimental::latency_anchor_id<1>,
    sycl::ext::intel::experimental::latency_constraint<
      0, sycl::ext::intel::experimental::latency_control_type::exact,
      2>));

// The following write occurs at least 2 cycles after the label-1 function,
// i.e., the write above.
Pipe3::write(
  value,
  sycl::ext::oneapi::experimental::properties(
    sycl::ext::intel::experimental::latency_constraint<
      1, sycl::ext::intel::experimental::latency_control_type::min, 2>));

Example 2: Using Latency Controls on LSUs

The following example shows how to use latency controls on LSUs. It uses a negative relative cycle number in the latency_constraint property, which means that the constrained function is scheduled before the associated labeled function:

using BurstCoalescedLSU = sycl::ext::intel::experimental::lsu<
  sycl::ext::intel::experimental::burst_coalesce<false>,
  sycl::ext::intel::experimental::statically_coalesce<false>>;
...
// In kernel:
// The following load occurs at most 5 cycles before the label-2 function,
// i.e., the store below.
int value = BurstCoalescedLSU::load(
  input_ptr,
  sycl::ext::oneapi::experimental::properties(
    sycl::ext::intel::experimental::latency_constraint<
      2, sycl::ext::intel::experimental::latency_control_type::max, -5>));

// The following store has a label 2.
BurstCoalescedLSU::store(
  output_ptr, value,
  sycl::ext::oneapi::experimental::properties(
    sycl::ext::intel::experimental::latency_anchor_id<2>));

Using Latency Controls on LSUs With a Stall-free Loop

The following example shows how to use latency controls on LSUs with a stall-free loop. It uses a negative relative cycle number in the latency_constraint property, which means that the constrained function is scheduled before the associated labeled function:

using BurstCoalescedLSU = sycl::ext::intel::experimental::lsu<
  sycl::ext::intel::experimental::burst_coalesce<false>,
  sycl::ext::intel::experimental::statically_coalesce<false>>;
...
// In kernel:
// The following load occurs at most 25 cycles before the label-2 function,
// i.e., the store below.
int value = BurstCoalescedLSU::load(
  input_ptr,
  sycl::ext::oneapi::experimental::properties(
    sycl::ext::intel::experimental::latency_constraint<
      2, sycl::ext::intel::experimental::latency_control_type::max, -25>));
 
 float Result = 0;
 for (int i = 0; i < 5; i++)
   Result *= (Result-Value);    
// The following store has a label 2.
BurstCoalescedLSU::store(
  output_ptr, Result,
  sycl::ext::oneapi::experimental::properties(
    sycl::ext::intel::experimental::latency_anchor_id<2>));

Rules and Limitations

  • latency_anchor_id must be a non-negative number.
  • latency_anchor_id must be a unique number within the whole design.
  • The labeled function and constrained function of a constraint must meet one of the following conditions:
    • Both functions are in the same block but not in any cluster
    • Both functions are in the same cluster.

The compiler attempts to achieve latency constraints. However, it errors out if some constraints cannot be satisfied. For example, if one constraint specifies function A must be scheduled after function B, while another constraint specifies function B must be scheduled after function A, then that set of constraints is unsatisfiable.

TIP:

For additional information, refer to the FPGA tutorial sample "experimental/Latency Control" listed in the Intel® oneAPI Samples Browser on Linux* or Windows*, or access the code sample on GitHub.