Visible to Intel only — GUID: GUID-9C05BD8E-FBD3-4B4D-B39C-580FB1978542
Visible to Intel only — GUID: GUID-9C05BD8E-FBD3-4B4D-B39C-580FB1978542
The device_global Extension (Beta)
The device_global class introduces device-scoped memory allocations into SYCL that can be accessed within a kernel using C++ global variable syntax. These memory allocations have unique instances per SYCL device. Like global variables, the device_global class is declared at a namespace scope and visible to all kernels within that scope.
In the current oneAPI release, the device_global class supports only a few features on FPGAs that are described in this topic. Subsequent releases might include more features of this class. For detailed information about the device_global API, refer to the SYCL device_global Language Specification.
The device_global class is a class template that is parameterized by the underlying allocation type and a list of optional properties (see Table 1). The type of the allocation also encodes the size of the allocation for potentially multidimensional array types.
Declaring a device_global Object
The following is an example of the device_global object declaration that consists of an int array annotated with device_image_scope and host_access_none properties:
#include <sycl/sycl.hpp>
namespace exp = sycl::ext::oneapi::experimental;
using FPGAProperties = decltype(exp::properties(
exp::device_image_scope, exp::host_access_none));
exp::device_global<int[10], FPGAProperties> val;
In the above example, the device_image_scope property limits the lifetime of the val variable. The host_access_none property asserts that the host code does not copy to or from the device_global. These properties are further described in Table 1.
The current implementation does not support the device_global declaration without device_image_scope and host_access_none properties.
Accessing a device_global Object
The following is an example of accessing a device_global object:
int main () { sycl::queue q; q.single_task([=] { val[0] = 42; ... }).wait(); int val_0; q.copy(val, &val_0, 1 /*count*/, 0 /*startIndex*/).wait(); }
In most cases, you can directly access the underlying data through overloads of common operators. Host code needs to enqueue copy operations onto the queue to access the variable.
Properties of the device_global Class
The following table describes several compile-time-constant properties that the device_global class supports:
Property | Description |
---|---|
device_image_scope | Mandatory property for FPGAs. If you set this property, the device_global memory lifetime begins when you program the FPGA binary image on an FPGA and ends when you program a different FPGA binary image on that FPGA. |
host_access | Dictates how the host code accesses the device_global class and enables compiler optimizations. The following values are supported:
FPGA acceleration:
Only none is supported for multiarchitecture binaries
SYCL HLS:
All values are supported.
|
Hardware Implementation
Global variables are not captured implicitly by the lambda, so the compiler need not pass device_global variables as kernel arguments to the generated IP core.
A device_global variable with a device_image_scope property is implemented in on-chip memory, which carries all the on-chip memory advantages, as described in Perform Kernel Computations Using Local or Private Memory.
If you use a compiler-generated IP core in your hardware system, the values stored in a device_global variable can be reset to their initial values (reinitialized) only by reprogramming the device image on to the FPGA.
Host Access
Each device_global variable that allows for host (external) access exposes a dedicated Avalon® memory-mapped interface at the RTL module boundary. Accesses from this interface are not stallable and are also given priority over any arbitrated with any stallable kernel (internal) accesses. Addressing for this variable starts at 0x0 and goes up to the size of the variable.