Visible to Intel only — GUID: ewa1456426575312
Ixiasoft
Visible to Intel only — GUID: ewa1456426575312
Ixiasoft
11.4.1. Customization of Replicated Kernels Using the get_compute_id() Function
Retrieving compute IDs is a convenient alternative to replicating your kernel in source code and then adding specialized code to each kernel copy. When a kernel uses the num_compute_units(X,Y,Z) attribute and calls the get_compute_id() function, the Intel® FPGA SDK for OpenCL™ Offline Compiler assigns a unique compute ID to each compute unit. The get_compute_id() function then retrieves these unique compute IDs. You can use the compute ID to specify how the associated compute unit should behave differently from the other compute units that are derived from the same kernel source code. For example, you can use the return value of get_compute_id() to index into an array of channels to specify which channel each compute unit should read from or write to.
The num_compute_units attribute accepts up to three arguments (that is, num_compute_units(X,Y,Z)). In conjunction with the get_compute_id() function, this attribute allows you to create one-dimensional, two-dimensional, and three-dimensional logical arrays of compute units. An example use case of a 1D array of compute units is a linear pipeline of kernels (also called a daisy chain of kernels). An example use case of a 2D array of compute units is a systolic array of kernels.
__attribute__((max_global_work_dim(0)))
__attribute__((autorun))
__attribute__((num_compute_units(4,4)))
__kernel void PE() {
row = get_compute_id(0);
col = get_compute_id(1);
…
}
For a 3D array of compute units, you can retrieve the X, Y, and Z coordinates of a compute unit in the logical compute unit array using get_compute_id(0), get_compute_id(1), and get_compute_id(2), respectively. In this case, the API is very similar to the API of the work-item's intrinsic functions (that is, get_global_id(), get_local_id(), and get_group_id()).
Global IDs, local IDs, and group IDs can vary at runtime based on how the host invokes the kernel. However, compute IDs are known at compilation time, allowing the offline compiler to generate optimized hardware for each compute unit.