fsycl-pstl-offload

Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

Download PDF

ID 767253

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-5988C046-FCEE-45C9-BF79-1F9286308643

View Details

fsycl-pstl-offload

Enables the offloading of C++ standard parallel algorithms to a SYCL device. This is an experimental feature.

Syntax

Linux:

-fsycl-pstl-offload[=arg]

-fno-sycl-pstl-offload

Windows:

None

Arguments

arg

Is one of the following:

cpu	Tells the compiler to perform offloading to a SYCL CPU device.
gpu	Tells the compiler to perform offloading to a SYCL GPU device.

Default

-fno-sycl-pstl-offload

C++ standard parallel algorithms are not offloaded.

Description

This option enables the offloading of C++ standard parallel algorithms that were called with std::execution::par_unseq policy to a SYCL device. The offloaded algorithms are implemented via the oneAPI Data Parallel C++ Library (oneDPL). This option is an experimental feature.

If you do not specify arg, it tells the compiler to perform offloading to the default SYCL device.

oneDPL is required for offloading support. See the oneDPL documentation for information about how to make it available in the environment.

NOTE:

When using this option, you must also specify option -fsycl.

The following are restrictions, requirements, and limitations when using option fsycl-pstl-offload:

Parallel algorithms callable objects restrictions

Parallel algorithms callable objects have the same limitations as SYCL kernels:
- Exceptions are not allowed.
- Dynamic memory allocation is not allowed.
- There can be no unsupported API from std.
For the complete list of kernel limitations, see the SYCL 2020 specification.
Data placement requirements
- Only heap memory allocated with C++ standard dedicated facilities can be passed to the standard algorithms for offloading.
- std::vector can also be used with parallel algorithms for offloading since it dynamically allocated memory underneath.
- Stack allocated on the host cannot be used in offloaded parallel algorithms as well as std::array and C-style array on the stack. The solution for such a situation is to make a "deep copy" by capturing it in an algorithm callable by value or by allocating std::array or C-style array on the heap.
- Performance of memory allocations may be improved by using the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable. For more information about this environment variable, see Environment Variables on GitHub.
Other limitations:
- Only a subset of standard C++ APIs can be used in parallel algorithms callable objects. For the complete list, see the oneDPL documentation on Tested Standard C++ APIs.
- Currently, this option is only supported for Linux.
- The maximum supported memory alignment is 2048 bytes.
- Option -fsycl-pstl-offload with the same argument must be applied to all Translation Units (TU) in an executable or a dynamic library.

IDE Equivalent

None

Alternate Options

None

Parent topic: Offload Compilation, OpenMP*, and Parallel Processing Options

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

fsycl-pstl-offload

Syntax

Arguments

Default

Description

IDE Equivalent

Alternate Options