Visible to Intel only — GUID: GUID-FB052396-F712-4661-919F-E758E09D5AAE
Execution Model Overview
Thread Mapping and GPU Occupancy
Kernels
Using Libraries for GPU Offload
Host/Device Memory, Buffer and USM
Host/Device Coordination
Using Multiple Heterogeneous Devices
Compilation
OpenMP Offloading Tuning Guide
Multi-GPU, Multi-Stack and Multi-C-Slice Architecture and Programming
Level Zero
Performance Profiling and Analysis
Configuring GPU Device
Sub-Groups and SIMD Vectorization
Removing Conditional Checks
Registerization and Avoiding Register Spills
Porting Code with High Register Pressure to Intel® Max GPUs
Small Register Mode vs. Large Register Mode
Shared Local Memory
Pointer Aliasing and the Restrict Directive
Synchronization among Threads in a Kernel
Considerations for Selecting Work-Group Size
Prefetch
Reduction
Kernel Launch
Executing Multiple Kernels on the Device at the Same Time
Submitting Kernels to Multiple Queues
Avoiding Redundant Queue Constructions
Programming Intel® XMX Using SYCL Joint Matrix Extension
Doing I/O in the Kernel
Explicit Scaling on Multi-GPU, Multi-Stack, Multi-C-Slice in SYCL
Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in SYCL
Explicit Scaling on Multi-GPU, Multi-Stack and Multi-C-Slice in OpenMP
Explicit Scaling Using Intel® oneAPI Math Kernel Library (oneMKL) in OpenMP
Explicit Scaling Summary
Visible to Intel only — GUID: GUID-FB052396-F712-4661-919F-E758E09D5AAE
General-Purpose Computing on GPU
Traditionally, GPUs are used for creating computer graphics such as images, videos, etc. Due to their large number of execution units for massively parallelism, modern GPUs are also used for computing tasks that are conventionally performed on CPU. This is commonly referred to as General-Purpose Computing on GPU or GPGPU.
Many high performance computing and machine learning applications benefit greatly from GPGPU.
- Execution Model Overview
- Thread Mapping and GPU Occupancy
- Kernels
- Using Libraries for GPU Offload
- Host/Device Memory, Buffer and USM
- Host/Device Coordination
- Using Multiple Heterogeneous Devices
- Compilation
- OpenMP Offloading Tuning Guide
- Multi-GPU, Multi-Stack and Multi-C-Slice Architecture and Programming
- Level Zero
- Performance Profiling and Analysis
- Configuring GPU Device