Visible to Intel only — GUID: GUID-F145D09C-EFBD-49FA-B817-58CCF2F473E6
Execution Model Overview
Thread Mapping and GPU Occupancy
Kernels
Using Libraries for GPU Offload
Host/Device Memory, Buffer and USM
Host/Device Coordination
Using Multiple Heterogeneous Devices
Compilation
OpenMP Offloading Tuning Guide
Multi-GPU, Multi-Stack and Multi-C-Slice Architecture and Programming
Level Zero
Performance Profiling and Analysis
Configuring GPU Device
Sub-Groups and SIMD Vectorization
Removing Conditional Checks
Registerization and Avoiding Register Spills
Porting Code with High Register Pressure to Intel® Max GPUs
Small Register Mode vs. Large Register Mode
Shared Local Memory
Pointer Aliasing and the Restrict Directive
Synchronization among Threads in a Kernel
Considerations for Selecting Work-Group Size
Prefetch
Reduction
Kernel Launch
Executing Multiple Kernels on the Device at the Same Time
Submitting Kernels to Multiple Queues
Avoiding Redundant Queue Constructions
Programming Intel® XMX Using SYCL Joint Matrix Extension
Doing I/O in the Kernel
Explicit Scaling on Multi-GPU, Multi-Stack, Multi-C-Slice in SYCL
Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in SYCL
Explicit Scaling on Multi-GPU, Multi-Stack and Multi-C-Slice in OpenMP
Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in OpenMP
Explicit Scaling Summary
Visible to Intel only — GUID: GUID-F145D09C-EFBD-49FA-B817-58CCF2F473E6
Compilation
oneAPI has multiple types of compilation. The main source to the application is compiled, and the offloaded kernels are compiled. For the kernels, this might be Ahead-Of-Time (AOT) or Just-In-Time (JIT).
In this section we cover topics related to this compilation and how it can impact the efficiency of the execution.