Visible to Intel only — GUID: GUID-C4AB38D1-CE29-4B2E-87B1-208842A99249
C/C++ OpenMP* and SYCL* Composability
The oneAPI programming model provides a unified compiler based on LLVM/Clang with support for OpenMP* offload. This allows seamless integration that allows the use of OpenMP constructs to either parallelize host side applications or offload to a target device. The Intel® oneAPI DPC++/C++ Compiler, available with the Intel® oneAPI Base Toolkit, supports OpenMP and SYCL composability with a set of restrictions. A single application can offload execution to available devices using OpenMP target regions or SYCL constructs in different parts of the code, such as different functions or code segments.
OpenMP and SYCL offloading constructs may be used in separate files, in the same file, or in the same function with some restrictions. OpenMP and SYCL offloading code can be bundled together in executable files, in static libraries, in dynamic libraries, or in various combinations.
Restrictions
There are some restrictions to be considered when mixing OpenMP and SYCL constructs in the same application.
OpenMP directives cannot be used inside SYCL kernels that run in the device. Similarly, SYCL code cannot be used inside the OpenMP target regions. However, it is possible to use SYCL constructs within the OpenMP code that runs on the host CPU.
OpenMP and SYCL device parts of the program cannot have cross dependencies. For example, a function defined in the SYCL part of the device code cannot be called from the OpenMP code that runs on the device and vice versa. OpenMP and SYCL device parts are linked independently and they form separate binaries that become a part of the resulting fat binary that is generated by the compiler.
The direct interaction between OpenMP and SYCL runtime libraries are not supported at this time. For example, a device memory object created by OpenMP API is not accessible by SYCL code. That is, using the device memory object created by OpenMP in SYCL code results unspecified execution behavior.
Example
The following code snippet uses SYCL and OpenMP offloading constructs in the same application.
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
float computePi(unsigned N) {
float Pi;
#pragma omp target map(from : Pi)
#pragma omp parallel for reduction(+ : Pi)
for (unsigned I = 0; I < N; ++I) {
float T = (I + 0.5f) / N;
Pi += 4.0f / (1.0 + T * T);
}
return Pi / N;
}
void iota(float *A, unsigned N) {
cl::sycl::range<1> R(N);
cl::sycl::buffer<float, 1> AB(A, R);
cl::sycl::queue().submit([&](cl::sycl::handler &cgh) {
auto AA = AB.template get_access<cl::sycl::access::mode::write>(cgh);
cgh.parallel_for<class Iota>(R, [=](cl::sycl::id<1> I) {
AA[I] = I;
});
});
}
int main() {
std::array<float, 1024u> Vec;
float Pi;
#pragma omp parallel sections
{
#pragma omp section
iota(Vec.data(), Vec.size());
#pragma omp section
Pi = computePi(8192u);
}
std::cout << "Vec[512] = " << Vec[512] << std::endl;
std::cout << "Pi = " << Pi << std::endl;
return 0;
}
The following command is used to compile the example code: icpx -fsycl -fiopenmp -fopenmp-targets=spir64 offloadOmp_dpcpp.cpp
where
-fsycl option enables SYCL
-fiopenmp -fopenmp-targets=spir64 option enables OpenMP* offload
The following shows the program output from the example code.
./a.out
Vec[512] = 512
Pi = 3.14159