Visible to Intel only — GUID: GUID-3E777BF4-EEFB-4AFC-839E-C2C2B1780A27
Visible to Intel only — GUID: GUID-3E777BF4-EEFB-4AFC-839E-C2C2B1780A27
Kernel Overview
The Kernel Overview page provides data that can help you optimize your kernel code.
This section includes the API Calls report, that shows every OpenCL kernel that was launched during the program execution.
Kernels with different name, different global work size, or different local work size are considered as a different kernels and presented in a different rows.
Each row shows:
- The total, minimum, maximum and average kernel execution time.
- EU Active - The normalized sum of all cycles on all cores spent actively executing instructions.
- EU Stalled - The normalized sum of all cycles on all cores spent stalled. At least one thread is loaded, but the core is stalled for some reason.
- GPU Memory Reads/Writes - Reads/Writes from GPU from/to chip uncore (LLC) and memory. Those are all memory accesses that miss in internal GPU L3 cache and are serviced either from uncore or main memory.
- L3 Cache Misses - All read and write misses in GPU L3 cache.
- Untyped Memory Reads/Writes - Memory accesses to buffer created with clCreateBuffer
- Typed Memory Reads/Writes - Memory accesses to typed buffers, e.g., writes to buffers created with clCreateImage. However, reads from images are counted by Sampler accesses and Texture Read.
- SLM Reads/Writes Memory accesses to Shared Local Memory
Click the + button on the left of any kernel name to expand its row. The expanded area presents additional information, including the latency, return value, command queue, context and timing data of each time this kernel was executed during the program execution.