Visible to Intel only — GUID: GUID-8A454FD7-BC53-416B-A635-D13E0A5EF54E
SYCL* Thread and Memory Hierarchy
Thread Hierarchy
The SYCL* execution model exposes an abstract view of GPU execution. The SYCL thread hierarchy consists of a 1-, 2-, or 3-dimensional grid of work-items. These work-items are grouped into equal sized thread groups called work-groups. Threads in a work-group are further divided into equal sized vector groups called sub-groups.
To learn more about how this hierarchy works with a GPUor a CPU with Intel® UHD Graphics, see SYCL* Thread Mapping and GPU Occupancy in the oneAPI GPU Optimization Guide.
Memory Hierarchy
The General Purpose GPU (GPGPU) compute model consists of a host connected to one or more compute devices. Each compute device consists of many GPU Compute Engines (CE), also known as Execution Units (EU) or Xe Vector Engines (XVE). The compute devices may also include caches, shared local memory (SLM), high-bandwidth memory (HBM), and so on, as shown in the figure below. Applications are then built as a combination of host software (per the host framework) and kernels submitted by the host to run on the VEs with a predefined decoupling point.
To learn more about memory hierarchy within the General Purpose GPU (GPGPU) compute model, see Execution Model Overview in the oneAPI GPU Optimization Guide.
Using Data Prefetching to Reduce Memory Latency in GPUs
Utilizing data prefetching can reduce the amount of write backs, reduce latency, and improve performance in Intel® GPUs.
To learn more about how prefetching works with oneAPI, see Prefetching in the oneAPI GPU Optimization Guide.