Visible to Intel only — GUID: GUID-E45493B1-00B8-4F52-A621-A229CC690A56
Visible to Intel only — GUID: GUID-E45493B1-00B8-4F52-A621-A229CC690A56
Explicit SIMD SYCL Extension
oneAPI provides an Explicit SIMD SYCL extension (ESIMD) for lower-level Intel GPU programming.
ESIMD provides APIs that are similar to Intel's GPU Instruction Set Architecture (ISA), but it enables you to write explicitly vectorized device code. This explicit enabling gives you more control over the generated code and allows you to depend less on compiler optimizations.
The specification, API reference, and working code examples are available on GitHub.
ESIMD kernels and functions always require a subgroup size of one, which means that the compiler does not provide vectorization across work items in a subgroup. Instead, you must explicitly express the vectorization in your code. Below is an example that adds the elements of two arrays and writes the results to the third:
float *A = malloc_shared<float>(Size, q); float *B = malloc_shared<float>(Size, q); float *C = malloc_shared<float>(Size, q); for (unsigned i = 0; i != Size; i++) { A[i] = B[i] = i; } q.submit([&](handler &cgh) { cgh.parallel_for<class Test>( Size / VL, [=](id<1> i)[[intel::sycl_explicit_simd]] { auto offset = i * VL; // pointer arithmetic, so offset is in elements: simd<float, VL> va(A + offset); simd<float, VL> vb(B + offset); simd<float, VL> vc = va + vb; vc.copy_to(C + offset); }); }).wait_and_throw();
In the example above, the lambda function passed to the parallel_for is marked with a special attribute: [[intel::sycl_explicit_simd]]. This attribute tells the compiler that the kernel is ESIMD-based and ESIMD APIs can be used inside it. Here the simd objects and copy_to intrinsics are used. They are available only in the ESIMD extension.
Fully runnable code samples can be found on GitHub.
Compile and Run ESIMD Code
Code that uses the ESIMD extension can be compiled and run using the same commands as you would with standard SYCL:
To compile using the open-source oneAPI DPC++ Compiler:
clang++ -fsycl vadd_usm.cpp
To compile using an Intel® oneAPI Toolkit:
icpx -fsycl vadd_usm.cpp
To run on an Intel specific GPU device, through the oneAPI Level Zero (Level Zero) backend:
SYCL_DEVICE_FILTER=level_zero:gpu ./a.out
The resulting executable ($./a.out) can be run only on Intel GPU hardware, such as Intel® UHD Graphics 600 or later. The SYCL runtime automatically recognizes ESIMD kernels and dispatches their execution, so no additional setup is needed. Both Linux and Windows platforms are supported, including OpenCL™ and Level Zero backends.
Restrictions
This section contains lists of the main restrictions that apply when using the ESIMD extension.
- Features not supported with ESIMD:
- C and C++ standard libraries support.
- Device library extensions.
- A host device.
- Unsupported standard SYCL APIs:
- Local accessors. Local memory is allocated and accessed via explicit device-side APIs.
- 2D and 3D accessors.
- Constant accessors.
- sycl::accessor::get_pointer(). All memory accesses through an accessor are done via explicit APIs. Example: sycl::ext::intel::esimd::block_store(acc, offset)
- Accessors with offsets and/or access range specified.
- sycl::sampler and sycl::stream classes.
- Other restrictions:
- Only Intel GPU devices are supported.
- Interoperability between regular SYCL and ESIMD kernels is not yet supported. It is not possible to invoke an ESIMD kernel from SYCL kernel and vice versa.