Visible to Intel only — GUID: GUID-0A212EC6-183A-4CB4-A30C-85300F590C80
Visible to Intel only — GUID: GUID-0A212EC6-183A-4CB4-A30C-85300F590C80
Asynchronous Parallelism Within Kernels (task_sequence)
Your kernel design might contain operations that you want to run asynchronously from the main flow of your kernel. The Intel oneAPI DPC++/C++ Compiler allows you to define these asynchronous activities in task functions and to asynchronously launch parallel invocations of these task functions through object instances of the task_sequence class. To enable the task_sequence class, include the following task_sequence header file in your source code:
#include <sycl/ext/intel/experimental/task_sequence.hpp>
task_sequence is a templated class that resides in the sycl::ext::intel::experimental namespace. The template parameters include a reference to the task function to be associated with the class and optional parameters specifying the depth of the queues for launching the tasks and holding their results. Instantiated objects of a parameterized instance of task_sequence represent the FPGA hardware implementing the associated task function and task queues. You can control the amount of replication or hardware reuse by the number of objects you declare.
The task_sequence class objects are helpful in situations where you want to express coarse-grained thread-level parallelism. For example:
- Improving the performance of operations like executing loops in parallel.
- Reducing FPGA area utilization by sharing an expensive compute block with different parts of your kernel.
Template Parameter | Description |
---|---|
auto &f typename ReturnT, typename... ArgsT, ReturnT (&f)(ArgsT...) | Callable f that defines the asynchronous task to be associated with the task_sequence. f must be statically resolvable at compile time, which means it is not a function pointer, and the return type (ReturnT) and argument types (ArgsT…) of f must be resolvable and fixed. |
uint32_t invocation_capacity | The size of the hardware queue instantiated for async() function calls. This parameter value corresponds to the minimum number of outstanding async() function calls to be supported. When the outstanding number of async() function calls reach this value, further calls may block until the number of outstanding calls is reduced to the invocation_capacity. The default value of this parameter is 1. |
uint32_t response_capacity | The size of the hardware queue instantiated to hold task function results. This parameter value corresponds to the maximum number of outstanding async() calls such that all outstanding tasks are guaranteed to make forward progress. Further async() calls may block until the number of outstanding calls reduce to the response_capacity. The default value of this parameter is 1. |
Function API | Description |
---|---|
void async(ArgsT... Args) | Asynchronously calls f with arguments Args. It increments the number of outstanding tasks by 1. |
ReturnT get() | Synchronously retrieves the result of an asynchronous call. Results are retrieved in FIFO order of their async() invocations. It decrements the number of outstanding tasks by 1. |
~task_sequence() | Destructor for the task_sequence class. It implicitly invokes the get() function on all outstanding invocations launched through the async() function call. |