Visible to Intel only — GUID: GUID-871D59D8-8DF3-4C54-A812-B6795874DB7D
Visible to Intel only — GUID: GUID-871D59D8-8DF3-4C54-A812-B6795874DB7D
Explicit Scaling Summary
Performance tuning for a multi-stack GPU imposes a tedious process given the parallelism granularity is at a finer level. However, the fundamentals are similar to CPU performance tuning. To understand performance scaling dominators, pay attention to:
VE utilization efficiency - how kernels utilize the execution resources of different stacks
Data placement - how allocations are spread across the HBM of different stacks
Thread-data affinity: where data “located” and how they are accessed in the system
In addition, there are several critical programming model concepts for application developers to keep in mind in order to select their favorite scaling scheme for productivity, portability and performance.
Sub-devices (numa_domains) and Sub-sub-devices (subnuma_domains)
Implicit and explicit scaling
Contexts and queues
Environment variables and program language APIs or constructs