Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intel® oneAPI Level Zero Switch

Data Parallel C++ (DPC++) is just one of the many components of the oneAPI project. The Intel® oneAPI Level Zero (Level Zero) API provides low-level direct-to-metal interfaces that are tailored to the devices on a oneAPI project. While heavily influenced by other low-level APIs, such as OpenCL™ API, Level Zero is designed to evolve independently.

More information on Level Zero is available in the oneAPI Specification.

Packages to Install

The packages you must install are intel-level-zero-gpu and level-zero.

Level Zero Loader

Level Zero is supportable across different oneAPI compute device architectures. The Level Zero loader discovers all Level Zero drivers in the system. In addition, the Level Zero loader is also the Level Zero software development kit: It carries the Level Zero headers and libraries where you build Level Zero programs.

Level Zero GPU Driver

The driver is open-source and regular public releases are maintained. It does not come with DPC++ and must be installed independently. The Level Zero driver and OpenCL™ driver come in the same package. More info about the Level Zero driver is available at GitHub.

DPC++ Plugins

SYCL targets a variety of devices: CPU, GPU, and Field Programmable Gate Array (FPGA). Different devices can be operated through different low-level drivers, such as OpenCL for FPGA. The Plugin Interface (PI) is a unified SYCL API for working with different devices in a unified way. SYCL plugins implement specific translations of the PI API into low-level runtime. The Level Zero PI Plugin was created to enable devices supported through the Level Zero system.

Scenario Information

SYCL Device Selection

The PI performs device discovery of all available devices through all available PI plugins. The same physical hardware device can be seen as multiple different SYCL devices if multiple plugins support it (for example, OpenCL Gen90 and Level Zero Gen90). The SYCL runtime performs device selection from the available devices based on device selectors. The device selectors can be user-defined or built in (for example, gpu_selector).

Discovery of Multiple PI Plugins

The implication of support for the discovery of multiple plugins is that the same GPU card can be seen as multiple different GPU devices available under different PI plugins.

NOTE:
Corresponding runtimes (OpenCL and/or Level Zero) must be installed correctly and independently for PI to see their devices. The SYCL specification does not define which device will be used if there are multiple devices that match criteria (for example, is_gpu()).

Default Preference is Given to a Level Zero GPU

By default, if no special action is taken and the Level Zero runtime reports support for the installed GPU, then the SYCL runtime uses the installed GPU. This is true for standard built-in device selectors and custom device selectors, where no action is taken to change the default behavior.

Devices that are not supported with the Level Zero runtime (CPU/FPGA) continue to run with OpenCL.

How to Change the Default Preference

Use the ONEAPI_DEVICE_SELECTOR environment variable to change the default preference. The valid values are PI_OPENCL and PI_LEVEL0.

For example, if you specify ONEAPI_DEVICE_SELECTOR=opencl and the PI OpenCL plugin reports the availability of the device of the required type, then that device is used. It overrides the default preference that is given to the Level Zero GPU, if the GPU is supported by the installed version of OpenCL.

NOTE:
The ONEAPI_DEVICE_SELECTOR setting only works when there are multiple choices.
Recommendation:
If your code does not work, try running it with ONEAPI_DEVICE_SELECTOR=opencl to see if the problem is related to Level Zero.

How to See Where the Code is Running

Use the SYCL_PI_TRACE=1 environment variable to see where your code is running. It reports the choice made by the built-in device selectors, if they are used.

Use SYCL_PI_TRACE=-1 to enable verbose tracing of the PI and show all the devices detected by the PI discovery process.

How to Find all DPC++ Plugins and Supported Devices Discovered in the System

Use the sycl-ls utility to find all the plugins on your system. sycl-ls queries all the platforms and devices available through the plugins, and prints useful information about SYCL devices and their ID numbers. This information is useful when you want to designate a specific device to run a SYCL program. The ONEAPI_DEVICE_SELECTOR string is printed at each line to show three information pieces:

  • The backend that the plugin supports
  • The device_type
  • The device_id

Verbose output is available with $ sycl-ls --verbose, which gives you the same choices that are made by standard built-in device selectors and other custom device selectors.

ONEAPI_DEVICE_SELECTOR

With no environment variables set to say otherwise, all platforms and devices presently on the machine are available. The default choice will be one of these devices, usually preferring a Level Zero GPU device, if available. The ONEAPI_DEVICE_SELECTOR can be used to limit that choice of devices, and to expose GPU sub-devices or sub-sub-devices as individual devices.

The syntax of this environment variable follows this BNF grammar:

ONEAPI_DEVICE_SELECTOR = <selector-string>
<selector-string> ::= { <accept-filters> | <discard-filters> | <accept-filters>;<discard-filters> }
<accept-filters> ::= <accept-filter>[;<accept-filter>...]
<discard-filters> ::= <discard-filter>[;<discard-filter>...]
<accept-filter> ::= <term>
<discard-filter> ::= !<term>
<term> ::= <backend>:<devices>
<backend> ::= { * | level_zero | opencl | cuda | hip | esimd_emulator }  // case insensitive
<devices> ::= <device>[,<device>...]
<device> ::= { * | cpu | gpu | fpga | <num> | <num>.<num> | <num>.* | *.* | <num>.<num>.<num> | <num>.<num>.* | <num>.*.* | *.*.*  }  // case insensitive

Each term in the grammar selects a collection of devices from a particular backend. The device names cpu, gpu, and fpga select all devices from that backend with the corresponding type. A backend's device can also be selected by its numeric index (zero-based) or by using * which selects all devices in the backend.

The dot syntax (example <num>.<num>) causes one or more GPU sub-devices to be exposed to the application as SYCL root devices. For example, 1.0 exposes the first sub-device of the second device as a SYCL root device. The syntax <num>.* exposes all sub-devices of the give device as SYCL root devices. The syntax *.* exposes all sub-devices of all GPU devices as SYCL root devices.

In general, a term with one or more asterisks ( * ) matches all backends, devices, or sub-devices with the given pattern. However, a warning is generated if the term does not match anything. For example, *:gpu matches all GPU devices in all backends (ignoring backends with no GPU devices), but it generates a warning if there are no GPU devices in any backend. Likewise, level_zero:*.* matches all sub-devices of partitionable GPUs in the Level Zero backend, but it generates a warning if there are no Level Zero GPU devices that are partitionable into sub-devices.

The device indices are zero-based and are unique only within a backend. Therefore, level_zero:0 is a different device from cuda:0. To see the indices of all available devices, run the sycl-ls tool. Note that different backends sometimes expose the same hardware as different devices. For example, the level_zero and opencl backends both expose the Intel GPU devices.

Additionally, if a sub-device is chosen (via numeric index or wildcard), then an additional layer of partitioning can be specified. In other words, a sub-sub-device can be selected. Like sub-devices, this is done with a period ( . ) and a sub-sub-device specifier which is a wildcard symbol ( * ) or a numeric index. Example ONEAPI_DEVICE_SELECTOR=level_zero:0.*.* would partition device 0 into sub-devices and then partition each of those into sub-sub-devices. The range of grandchild sub-sub-devices would be the final devices available to the app, neither device 0, nor its child partitions would be in that list.

Lastly, a filter in the grammar can be thought of as a term in conjunction with an action that is taken on all devices that are selected by the term. The action can be an accept action or a discard action. Based on the action, a filter can be an accept filter or a discard filter. The string <term> represents an accept filter and the string !<term> represents a discard filter. The underlying term is the same but they perform different actions on the matching devices list. For example, !opencl:* discards all devices of the opencl backend from the list of available devices. The discarding filters, if there are any, must all appear at the end of the selector string. When one or more filters accept a device and one or more filters discard the device, the latter have priority and the device is ultimately not made available to the user. This allows the user to provide selector strings such as *:gpu;!cuda:* that accepts all GPU devices except those with a CUDA backend. Furthermore, if the value of this environment variable only has discarding filters, an accepting filter that matches all devices, but not sub-devices and sub-sub-devices, will be implicitly included in the environment variable to allow the user to specify only the list of devices that must not be made available. Therefore, !*:cpu will accept all devices except those that are of the CPU type and opencl:*;!*:cpu will accept all devices of the OpenCL backend except those that are of the OpenCL backend and of the CPU type. It is legal to have a rejection filter even if it specifies devices have already been omitted by previous filters in the selection string. Doing so has no effect; the rejected devices are still omitted.

The following examples further illustrate the usage of this environment variable:

Example Result
ONEAPI_DEVICE_SELECTOR=opencl:*

Only the OpenCL devices are available.

ONEAPI_DEVICE_SELECTOR=level_zero:gpu Only GPU devices on the Level Zero platform are available.
ONEAPI_DEVICE_SELECTOR="opencl:gpu;level_zero:gpu"

GPU devices from both Level Zero and OpenCL are available. Escaping (like quotation marks) will likely be needed when using semi-colon separated entries.

ONEAPI_DEVICE_SELECTOR=opencl:gpu,cpu

Only CPU and GPU devices on the OpenCL platform are available.

ONEAPI_DEVICE_SELECTOR=opencl:0

Only the device with index 0 on the OpenCL backend is available.

ONEAPI_DEVICE_SELECTOR=hip:0,2

Only devices with indices of 0 and 2 from the HIP backend are available.

ONEAPI_DEVICE_SELECTOR=opencl:0.*

All the sub-devices from the OpenCL device with index 0 are exposed as SYCL root devices. No other devices are available.

ONEAPI_DEVICE_SELECTOR=opencl:0.2

The third sub-device (2 in zero-based counting) of the OpenCL device with index 0 will be the sole device available.

ONEAPI_DEVICE_SELECTOR=level_zero:*,*.*

Exposes Level Zero devices to the application in two different ways. Each device (known as a card) is exposed as a SYCL root device and each sub-device is also exposed as a SYCL root device.

ONEAPI_DEVICE_SELECTOR="opencl:*;!opencl:0"

All OpenCL devices except for the device with index 0 are available.

ONEAPI_DEVICE_SELECTOR="!*:cpu"

All devices except for CPU devices are available.

Notes:

  • The backend argument is always required. An error will be thrown if it is absent.
  • Additionally, the backend MUST be followed by colon ( : ) and at least one device specifier of some sort, else an error is thrown.
  • The sub-device and sub-sub-device syntax attempt to partition the root device according to the rules defined by info::partition_property::partition_by_affinity_domain and info::partition_affinity_domain::next_partitionable. The root device is determined by the underlying backend.
  • When using the Level Zero backend, see also the documentation of the ZE_FLAT_DEVICE_HIERARCHY. environment variable because it affects how this backend exposes root devices to SYCL. For Intel GPUs, the sub-device and sub-sub-device syntax can be used to expose tiles or CCSs to the SYCL application as SYCL root devices, however the exact mapping is determined by the ZE_FLAT_DEVICE_HIERARCHY environment variable.
  • The semi-colon character ( ; ) and the exclamation mark character ( ! ) are treated specially by many shells, so you may need to enclose the string in quotes if the selection string contains these characters.