Visible to Intel only — GUID: GUID-19E49F34-2444-43BF-A66A-5C26B004144D
Visible to Intel only — GUID: GUID-19E49F34-2444-43BF-A66A-5C26B004144D
OpenMP Offload Best Practices
In this chapter we present best practices for improving the performance of applications that offload onto the GPU. We organize the best practices into the following categories, which are described in the sections that follow:
- Using More GPU Resources
- Minimizing Data Transfers and Memory Allocations
- Making Better Use of OpenMP Constructs
- Memory Allocation
- Fortran Example
- Clauses: is_device_ptr, use_device_ptr, has_device_addr, use_device_addr
- Prefetching
- Atomics with SLM
- OpenMP Interop with SYCL
- Offloading DO CONCURRENT
Note:
Used the following when collecting OpenMP performance numbers:
2-stack Intel® GPU
One GPU stack only (no implicit or explicit scaling).
Intel® compilers, runtimes, and GPU drivers
Level-Zero plugin
Introduced a dummy target construct at the beginning of a program, so as not to measure startup time.
Just-In-Time (JIT) compilation mode.