Visible to Intel only — GUID: GUID-54472F5F-20B7-467B-8165-F0835A043153
Visible to Intel only — GUID: GUID-54472F5F-20B7-467B-8165-F0835A043153
Compiling and Running an OpenMP Application
Use the following compiler options to enable OpenMP offload onto Intel® GPUs. These options apply to both C/C++ and Fortran.
-fiopenmp -fopenmp-targets=spir64
By default the Intel® compiler converts the program into the intermediate language representation, SPIR-V, and stores that in the binary produced by the compilation process. The code can be run on any hardware platform by translating the SPIR-V code into the assembly code of the platform at runtime. This process is called Just-In-Time (JIT) compilation.
To enable the output of the compiler optimization report, add the following options:
-qopt-report=3 -O3
Note:
The -qopenmp compiler option is equivalent to -fiopenmp, and the two options can be used interchangeably.
Ahead-Of-Time (AOT) Compilation
For Ahead-Of-Time (AOT) compilation for Intel® Data Center GPU Max Series, you need to specify an additional compiler option (-Xs), as shown below. This option applies to both C/C++ and Fortran.
-fiopenmp -fopenmp-targets=spir64_gen -Xs "-device 0x0BD5 -revision_id 0x2f"
OpenMP Runtime Routines
The following are some device-related runtime routines:
omp_target_alloc omp_target_free omp_target_memcpy
The following runtime routines are supported by the Intel® compilers as Intel® extensions:
omp_target_alloc_host omp_target_alloc_device omp_target_alloc_shared
omp_target_free can be called to free up the memory allocated using the above Intel® extensions.
For a listing of OpenMP features supported in the icx, icpx, and ifx compilers, see:
Environment Variables
Below are some environment variables that are useful for debugging or improving the performance of programs.
For additional information on environment variables, see:
LIBOMPTARGET_DEBUG=1
Enables the display of debugging information from libomptarget.so.
LIBOMPTARGET_DEVICES=<DeviceKind>
Controls how sub-devices are exposed to users.
<DeviceKind> := DEVICE | SUBDEVICE | SUBSUBDEVICE | device | subdevice | subsubdevice
DEVICE/device: Only top-level devices are reported as OpenMP devices, and subdevice clause is supported.
SUBDEVICE/subdevice: Only 1st-level sub-devices are reported as OpenMP devices, and subdevice clause is ignored.
SUBSUBDEVICE/subsubdevice: Only second-level sub-devices are reported as OpenMP devices, and subdevice clause is ignored. On Intel® GPU using Level Zero backend, limiting the subsubdevice to a single compute slice within a stack also requires setting additional GPU compute runtime environment variable CFESingleSliceDispatchCCSMode=1.
The default is <DeviceKind>=device
LIBOMPTARGET_INFO=<Num>
Allows the user to request different types of runtime information from libomptarget. For details, see:
https://openmp.llvm.org/design/Runtimes.html#libomptarget-info
LIBOMPTARGET_LEVEL0_MEMORY_POOL=<Option>
Controls how reusable memory pool is configured.
<Option> := 0 | <PoolInfoList> <PoolInfoList> := <PoolInfo>[,<PoolInfoList>] <PoolInfo> := <MemType>[,<AllocMax>[,<Capacity>[,<PoolSize>]]] <MemType> := all | device | host | shared <AllocMax> := positive integer or empty, max allocation size in MB <Capacity> := positive integer or empty, number of allocations from a single block <PoolSize> := positive integer or empty, max pool size in MB
Pool is a list of memory blocks that can serve at least <Capacity> allocations of up to <AllocMax> size from a single block, with total size not exceeding <PoolSize>.
LIBOMPTARGET_LEVEL0_STAGING_BUFFER_SIZE=<Num>
Sets the staging buffer size to <Num> KB. Staging buffer is used to optimize copy operation between host and device when host memory is not Unified Shared Memory (USM). The staging buffer is only used for discrete devices. The default staging buffer size is 16 KB.
LIBOMPTARGET_LEVEL_ZERO_COMMAND_BATCH=copy
Enables batching of commands for data transfer in a target region.
If there are map(to: ) clauses on a target construct, then this environment variable allows multiple data transfers from the host to the device to occur concurrently. Similarly, if there are map(from: ) clauses on the target construct, this environment variable allows multiple data transfers from the device to the host to occur concurrently. Note that map(tofrom: ) or map( ) would be split into map(to: ) and map(from: ).
LIBOMPTARGET_LEVEL_ZERO_USE_IMMEDIATE_COMMAND_LIST=<Bool>
Enables/disables using immediate command list for kernel submission.
<Bool> := 1 | T | t | 0 | F | f
By default, using immediate command list is disabled.
LIBOMPTARGET_PLUGIN=<Name>
Designates the offload plugin name to use.
<Name> := LEVEL0 | OPENCL | X86_64 | level0 | opencl | x86_64
By default, the offload plugin is LEVEL0.
LIBOMPTARGET_PLUGIN_PROFILE=<Enable>[,<Unit>]
Enables basic plugin profiling and displays the result when the program finishes.
<Enable> := 1 | T <Unit> := usec | unit_usec
By default, plugin profiling is disabled.
if <Unit> is not specified, microsecond (usec) is the default unit
LIBOMPTARGET_PROFILE=<FileName>
Allows libomptarget.so to generate time profile output similar to Clang’s -ftime-trace option.
OMP_TARGET_OFFLOAD=MANDATORY
Specifies that program execution is terminated if a device construct or device memory routine is encountered and the device is not available or is not supported by the implementation.
Environment Variables to Control Implicit and Explicit Scaling
To disable implicit scaling and use one GPU stack only, set: ZE_AFFINITY_MASK=0.0
To enable explicit scaling, set: LIBOMPTARGET_DEVICES=subdevice
On Intel® Data Center GPU Max Series, implicit scaling is on by default.
Environment Variables for SYCL
There are several SYCL_PI_LEVEL_ZERO environment variables that are useful for the development and debugging of SYCL programs (not just OpenMP). They are documented at:
https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md
References
OpenMP Features and Extensions Supported in Intel® oneAPI DPC++/C++ Compiler
Fortran Language and OpenMP Features Implemented in Intel® Fortran Compiler
Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference - Supported Environment Variables
Environment variables that effect DPC++ compiler and runtime