Visible to Intel only — GUID: GUID-AAB809CC-0911-4D81-9389-9A026B5AAED8
Execution Model Overview
Thread Mapping and GPU Occupancy
Kernels
Using Libraries for GPU Offload
Host/Device Memory, Buffer and USM
Host/Device Coordination
Using Multiple Heterogeneous Devices
Compilation
OpenMP Offloading Tuning Guide
Multi-GPU, Multi-Stack and Multi-C-Slice Architecture and Programming
Level Zero
Performance Profiling and Analysis
Configuring GPU Device
Sub-Groups and SIMD Vectorization
Removing Conditional Checks
Registerization and Avoiding Register Spills
Porting Code with High Register Pressure to Intel® Max GPUs
Small Register Mode vs. Large Register Mode
Shared Local Memory
Pointer Aliasing and the Restrict Directive
Synchronization among Threads in a Kernel
Considerations for Selecting Work-Group Size
Prefetch
Reduction
Kernel Launch
Executing Multiple Kernels on the Device at the Same Time
Submitting Kernels to Multiple Queues
Avoiding Redundant Queue Constructions
Programming Intel® XMX Using SYCL Joint Matrix Extension
Doing I/O in the Kernel
Explicit Scaling on Multi-GPU, Multi-Stack, Multi-C-Slice in SYCL
Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in SYCL
Explicit Scaling on Multi-GPU, Multi-Stack and Multi-C-Slice in OpenMP
Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in OpenMP
Explicit Scaling Summary
Visible to Intel only — GUID: GUID-AAB809CC-0911-4D81-9389-9A026B5AAED8
Explicit Scaling
Under explicit scaling, the programmer can take direct control over work group distribution and memory placement. In this chapter, we will cover:
- Explicit Scaling on Multi-GPU, Multi-Stack, Multi-C-Slice in SYCL
- Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in SYCL
- Explicit Scaling on Multi-GPU, Multi-Stack and Multi-C-Slice in OpenMP
- Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in OpenMP
- Explicit Scaling Summary