Visible to Intel only — GUID: GUID-5988C046-FCEE-45C9-BF79-1F9286308643
Visible to Intel only — GUID: GUID-5988C046-FCEE-45C9-BF79-1F9286308643
fsycl-pstl-offload
Enables the automatic offloading of C++ standard parallel algorithms to a SYCL device.
Syntax
Linux: |
-fsycl-pstl-offload[=arg] -fno-sycl-pstl-offload |
Windows: |
/fsycl-pstl-offload[:arg] /fno-sycl-pstl-offload |
Arguments
arg |
Is one of the following:
|
Default
-fno-sycl-pstl-offload |
C++ standard parallel algorithms are not offloaded automatically. |
Description
This option enables the automatic offloading of C++ standard parallel algorithms that were called with std::execution::par_unseq policy to a SYCL device. The offloaded algorithms are implemented via the oneAPI Data Parallel C++ Library (oneDPL).
If you do not specify arg, it tells the compiler to perform offloading to the default SYCL device.
oneDPL is required for offloading support. See the oneDPL documentation for information about how to make it available in the environment.
When using this option, you must also specify option -fsycl.
The following are restrictions, requirements, and limitations when using option fsycl-pstl-offload:
Parallel algorithms callable objects restrictions
Parallel algorithms callable objects have the same limitations as SYCL kernels:
Exceptions are not allowed.
Dynamic memory allocation is not allowed.
There can be unsupported API from std.
For the complete list of kernel limitations, see the SYCL 2020 specification.
Data placement requirements
Only heap memory allocated with C++ standard dedicated facilities can be passed to the standard algorithms for offloading.
std::vector can also be used with parallel algorithms for offloading since it dynamically allocated memory underneath.
Stack allocated on the host cannot be used in offloaded parallel algorithms as well as std::array and C-style array on the stack. The solution for such a situation is to make a "deep copy" by capturing it in an algorithm callable by value or by allocating std::array or C-style array on the heap.
Other limitations:
Only a subset of standard C++ APIs can be used in parallel algorithms callable objects. For the complete list, see the oneDPL documentation on Tested Standard C++ APIs.
Option -fsycl-pstl-offload with the same argument must be applied to all Translation Units (TU) in an executable or a dynamic library.
Performance
If the performance is not satisfactory, the following environment variables may help:
Performance of memory allocations may be improved by using the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable.
Launch time performance of the algorithms may be improved by SYCL_CACHE_PERSISTENT environment variable.
For more information about these environment variables, see Environment Variables on GitHub.
IDE Equivalent
Alternate Options
None
Example
The following shows a way to use this option:
#include <algorithm>
#include <vector>
#include <execution>
int main()
{
std::vector<int> v(1000000);
// If this code is compiled with -fsycl-pstl-offload=gpu, the
// for_each algorithm is going to be offloaded to the default
// SYCL GPU device automatically
std::for_each(std::execution::par_unseq, v.begin(), v.end(), [](auto& v)
{
// do some computation
});
}