Visible to Intel only — GUID: GUID-36988906-1155-4DF8-B78B-0EF705A1F235
Visible to Intel only — GUID: GUID-36988906-1155-4DF8-B78B-0EF705A1F235
Explicit Scaling Summary
Performance tuning for a multi-stack GPU imposes a tedious process given the parallelism granularity is at a finer level. However, the fundamentals are similar to CPU performance tuning. To understand performance scaling dominators, one needs to pay attention to:
VE utilization efficiency - how kernels utilize the execution resources of different stacks
Data placement - how allocations are spread across the HBM of different stacks
Thread-data affinity: where data “located” and how they are accessed in the system
In addition, there are several critical programming model concepts for application developers to keep in mind in order to select their favorite scaling scheme for productivity, portability and performance.
Sub-devices (numa_domains) and Sub-sub-devices (subnuma_domains)
Explicit and implicit scaling
Contexts and queues
Environment variables and program language APIs or constructs