Visible to Intel only — GUID: GUID-170F0D93-5F26-498D-9322-DC727884368F
Visible to Intel only — GUID: GUID-170F0D93-5F26-498D-9322-DC727884368F
OpenMP Offload Best Practices
In this chapter we present best practices for improving the performance of applications that offload onto the GPU. We organize the best practices into the following categories, which are described in the sections that follow:
- Using More GPU Resources
- Minimizing Data Transfers and Memory Allocations
- Making Better Use of OpenMP Constructs
- Memory Allocation
- Fortran Example
- Clauses: is_device_ptr, use_device_ptr, has_device_addr, use_device_addr
- Prefetching
- Atomics with SLM
Note:
Used the following when collecting OpenMP performance numbers:
2-stack Intel® GPU
One GPU stack only (no implicit or explicit scaling).
Intel® compilers, runtimes, and GPU drivers
Level-Zero plugin
Introduced a dummy target construct at the beginning of a program, so as not to measure startup time.
Just-In-Time (JIT) compilation mode.