GPU Metrics
This section describes all the GPU metrics accessible from the Intel® GPA. The table below provides an overview of all GPU metrics available for Intel GPUs starting from the 3rd Generation Intel Core Processors.
- Families of Intel® Xe graphics products starting with Intel® Arc™ Alchemist (formerly DG2) and newer generations feature GPU architecture terminology that shifts from legacy terms. For more information on the terminology changes and to understand their mapping with legacy content, see GPU Architecture Terminology for Intel® Xe Graphics.
For products formerly named Kaby Lake G, see GPU metrics description at https://gpuperfapi.readthedocs.io/en/latest/counters.html.
For DirectX* 11 targets, metrics are collected for a given application being profiled. For DirectX 12 targets, metrics are collected system-wide, including all running applications. While profiling a DirectX 12 application, it is recommended to stop all other running graphic applications.
Main Metrics
Metric Name |
Description |
---|---|
GPU Duration |
Represents the total GPU time for the frame, or for the selected event for Graphics Frame Analyzer within that frame. Examples: If GPU Duration is 80,000, it means that the GPU spends around 80 milliseconds to render the selected ergs. Improving Performance:
|
GPU Frequency |
Represents the average GPU core frequency during the measurement period. The latest Intel GPUs support the Intel® Turbo Boost Technology 2.0 and can dynamically change frequency depending on CPU and GPU workloads. Examples: For Intel® HD Graphics 3000, the GPU Frequency increases to its maximum frequency when a heavy GPU load occurs. Improving Performance: Typically the system automatically adjusts the GPU Frequency to optimize total system performance between the CPU and the GPU. When running the HUD, if the GPU frequency is always at its peak value for a particular system configuration, this could indicate that your system is GPU bound; if the GPU frequency is always at the lower end of the range, this could indicate that either you are CPU bound and/or that the GPU is not being fully utilized. When running the Graphics Frame Analyzer, currently this metric does not provide an accurate measure of GPU performance, since the CPU is not being utilized as it would be during the running of your game when the frame was captured.
NOTE:
If the Intel graphics device supports multiple GPU frequencies, to minimize variation in metric values the Graphics Frame Analyzer locks the GPU at the maximum frequency available.
|
Avg GPU Core Frequency, MHz |
Represents the average GPU Core Frequency in the measurement. |
GPU Core Clocks |
Represents the total number of GPU core clocks elapsed during the measurement period. |
GPU Busy |
Represents the percentage of time when the GPU is busy. Examples: For GPU-bound workloads, the value of the GPU Busy metric is 100%. A value less than 100% indicates that the GPU is spending time in an idle state, waiting for data from the CPU, in which case your game or application might be CPU-bound. Improving Performance: If GPU Busy is consistently less than 100% and you are encountering performance issues, consider threading your game and using the Graphics Trace Analyzer to understand the interaction between the CPU and GPU. |
HUD Overhead Time |
Represents the Head’s-up Display overhead time. |
Non-Culled Polygons |
Represents the number of polygons processed that were not culled. |
GTI Metrics
Main Metric |
Description |
---|---|
GTI Write Throughput |
Represents the total number of GPU memory bytes written to GTI. |
GTI Read Throughput |
Represents the total number of GPU memory bytes read from GTI. |
DRAM LLC Throughput, bytes |
Represents the total number of successful LLC cache lookups done from the GPU. |
LLC GPU Accesses, messages |
Represents the approximate amount of GPU memory bytes transferred between LLC and DRAM controller.
NOTE:
This metric might show incorrect results and will be disabled with the next driver update.
|
LLC GPU Throughput, bytes |
Represents the total number of GPU memory bytes transferred between GPU and LLC. |
LLC GPU Hits, messages |
Represents the total number of LLC cache lookups done from the GPU (64B reads, 32B writes).
NOTE:
This metric might show incorrect results and will be disabled with the next driver update.
|
EU Array Metrics
Metric Name |
Description |
---|---|
EU Idle % |
Represents the percentage of time when the GPU execution units (EUs) were idle. An EU is idle when it is neither actively executing shader instructions nor stalled trying to execute shader instructions. Examples:
Improving Performance: If EU Idle % is significantly higher than 0%, this indicates that there are stalls elsewhere in the rendering pipeline. |
EU Active % |
Represents the percentage of time when the GPU execution units (EUs) were actively executing pixel, geometry, or vertex shader instructions. Examples: If EU Active % is 80, it means that the EUs were active 80% of the rendering time for the selected events. Improving Performance: If the EUs are not active, it means that they are either stalled waiting for a request to be fulfilled, or idle. You can see how much of the non-active time is caused by stalls by examining the EU Stall % metric. If the total EU busy time ( EU Active % + EU Stall % ) is significantly lower than 100%, this indicates that there are stalls elsewhere in the rendering pipeline. |
EU Stall % |
Represents the percentage of time when the GPU execution units (EUs) were stalled. An EU becomes stalled when all of its threads are waiting for results from fixed function units (for example, a pixel shader requests texels from the texture sampler).
Improving Performance: If this metric is unexpectedly high, especially when compared with the EU Active % metric, you can analyze where the stalls happen by looking at the VS EU Stall %| GS EU Stall % | PS EU Stall % metrics. If any of these metrics show that most of the stall time is in one particular shader, examine your shader code in the Graphics Frame Analyzer to determine why this shader might be causing the EUs to stall. |
EU AVG IPC Rate, Number |
Represents the average rate of IPC calculated for two FPU pipelines.
NOTE:
This metric might show incorrect results and will be disabled with the next driver update.
|
VS Duration |
Represents an approximation of the total GPU time spent executing vertex shader code.
Improving Performance: If the Vertex Shader Duration time is significant compared to GPU Duration , vertex processing optimizations might be needed. In this situation, optimize the geometry by minimizing the Vertex Count , Primitive Count , and Vertex Shader Invocations Count . If you are using triangle lists, try to convert them to a single triangle strip to minimize the number of vertices sent to pipeline. Also optimize the geometry for VCache (see Vertex Shader Invocations Count metric description). To see whether optimizations are possible, examine your vertex shader code in the Graphics Frame Analyzer. Refer to the Graphics API Performance Guide to find recommendations for vertex shader optimizations. |
VS EU Active % |
Represents the percentage of overall GPU time that the EUs were actively executing Vertex Shader instructions.
Inspect the shader code in the Graphics Frame Analyzer. |
VS EU Stall % |
Represents the percentage of overall GPU time that the EUs were stalled in Vertex Shader instructions.
NOTE:
This metric does not include the total amount of time stalled in the vertex shader, but only the fraction of the time when vertex shader stalls were causing the entire EU to stall. The entire EU stalls when all of its threads are stalled.
Inspect the shader code in the Graphics Frame Analyzer. |
VS Invocations |
Represents the number of vertex shader invocations - the vertex shader is invoked once per vertex. The number of vertex shader invocations depends both on the vertex and primitive counts and the operation of the post-transform vertex cache (VCache). In an optimal situation the GPU fetches already-processed vertices from the cache rather than recalculating this data, which could impact the value of this metric. Therefore, when the VS Invocations and the Vertex Count have similar values, it means that the geometry is not optimized to take advantage of the VCache. Examples:
Improving Performance: To improve vertex processing performance and reduce the number of vertex shader invocations, try to reorder the geometry for optimum VCache usage. The D3DX utility library contains functions that reorder the geometry to improve VCache utilization ( ID3DXMesh::Optimize, ID3DXMesh::Optimize, D3DXOptimizeFaces, D3DXOptimizeVertices ).
NOTE:
|
VS Send Pipe Active % |
Represents the percentage of time in which EU send pipeline was actively processing a vertex shader instruction. |
VS FPU0 Pipe Active % |
Represents the percentage of time in which EU FPU0 pipeline was actively processing a vertex shader instruction. |
VS FPU1 Pipe Active % |
Represents the percentage of time in which EU FPU1 pipeline was actively processing a vertex shader instruction. |
HS Duration |
Represents the total amount of time the GPU spent executing hull shader code.
The heading in this template is a special field for topic titles, so generally you do not need to edit it. Improving Performance: If the HS Duration is larger than you expect, you can examine your hull shader code in the Graphics Frame Analyzer to investigate possible optimizations. |
HS EU Active % |
Represents the percentage of overall GPU time that the EUs were actively executing Hull Shader instructions. |
HS EU Stall % |
Represents the percentage of overall GPU time that the EUs were stalled in Hull Shader instructions. A shader thread will stall when it reaches an instruction that cannot complete until some time-consuming operation is completed.
NOTE:
This metrics does not include the total amount of stalled time in the Hull Shader, but only the amount of time when the Hull Shader was causing the entire EU to stall. The EUs in the Intel® HD Graphics are hyperthreaded, which means that each EU can very quickly (within 2 clock cycles) switch from a stalled shader thread to another shader thread. Therefore, it is possible at any given time for a number of shader threads to be stalled on an EU, but for the EU to continue actively executing instructions on another shader thread. The entire EU is considered to be stalled only when all of its threads are stalled.
Improving Performance: If a large amount of stall time seems to be occurring in a particular shader, then you should examine that shader to see whether you can reduce or eliminate some of the stalls. Short shaders might normally stall for a majority of their execution time, since in such situations instruction or data fetch (texels, constants) latency cannot be ‘hidden’. If a large stall time occurs in longer shaders, it usually indicates inefficient shader execution and possible optimization opportunities. Inspect the shader code that was executed for a given draw call and experiment with optimizations in the Graphics Frame Analyzer. |
HS Invocations |
Represents the number of Hull Shader invocations. The Hull Shader is invoked once per patch. Examples: The SimpleBezier11 sample from the Microsoft* DirectX* SDK is a good example to understand Hull Shaders. This sample renders a Mobius strip comprised of four patches with 64 control points per patch. Execution of this sample will result in an HS Invocations value of four. Improving Performance: The Hull Shader is not usually a performance bottleneck, but it can definitely cause performance issues further down the rendering pipeline. If the Hull Shader specifies large tessellation factors, or as the HS Invocations value increases, it will result in more work for the fixed function tessellator as well as an increased number of DS Invocations and GS Invocations . |
DS Duration |
Represents the total amount of time the GPU spent executing domain shader code.
Improving Performance: If DS Duration is larger than you expect, you can examine your domain shader code in the Graphics Frame Analyzer to investigate possible optimizations. |
DS EU Active % |
Represents the percentage of overall GPU time that the EUs were actively executing Domain Shader instructions. |
DS EU Stall % |
Represents the percentage of overall GPU time that the EUs were stalled in Domain Shader instructions. A shader thread will stall when it reaches an instruction that cannot complete until some time-consuming operation is completed.
NOTE:
This metrics does not include the total amount of stalled time in the Domain Shader, but only the amount of time when the Domain Shader was causing the entire EU to stall. The EUs in the Intel® HD Graphics are hyperthreaded, which means that each EU can very quickly (within 2 clock cycles) switch from a stalled shader thread to another shader thread. Therefore, it is possible at any given time for a number of shader threads to be stalled on an EU, but for the EU to continue actively executing instructions on another shader thread. The entire EU is considered to be stalled only when all of its threads are stalled.
Improving Performance If a large amount of stall time seems to be occurring in a particular shader, then you should examine that shader to see whether you can reduce or eliminate some of the stalls. Short shaders might normally stall for a majority of their execution time, since in such situations instruction or data fetch (texels, constants) latency cannot be ‘hidden’. If a large stall time occurs in longer shaders, it usually indicates inefficient shader execution and possible optimization opportunities. You can inspect the shader code that was executed for a given draw call and experiment with optimizations in Graphics Frame Analyzer. |
DS Invocations |
Represents the number of Domain Shader invocations. The Domain Shader is invoked once per fixed function tessellator output point. Examples: The SimpleBezier11 sample from the Microsoft* DirectX* SDK is a good example to understand Domain Shaders. This sample renders a Mobius strip comprised of 4 patches with 64 control points per patch.Increasing the Patch Divisions slider increases the tessellation factors of the Hull Shader which results in and increased number of inputs into the Domain Shader. When the Patch Divisions slider is set to 4.0, the DS Invocations value will be 192. When the Patch Divisions slider is set to 5.0, the DS Invocations value will be 320. Improving Performance: The purpose of a Domain Shader is to calculate the vertex positions for subdivided points output by the fixed function tessellator. The best way to improve performance is to minimize the number of DS Invocations . This can be done by decreasing the amount of tessellation performed by either decreasing the number Hull Shader Invocations or decreasing the tessellation factors in the Hull Shader. |
GS Duration |
Represents the approximate total GPU time spent executing geometry shader code.
Improving Performance: If you are encountering performance issues and the GS Duration time is more than 20% to 40% of the total GPU Duration , geometry shader code optimizations may be needed.Examine geometry shader code in the Graphics Frame Analyzer to see if optimizations are possible.Refer to the Graphics API Performance Guide for recommendations on how to optimize the geometry shader. |
GS EU Active % |
Represents the percentage of overall GPU time that the EUs were actively executing Geometry Shader instructions.
Inspect the shader code using the Graphics Frame Analyzer. |
GS EU Stall % |
Represents the percentage of overall GPU time that the EUs were stalled in Geometry Shader instructions.
NOTE:
This metric does not include the total amount of stalled time in the geometry shader but only the fraction of time when the geometry shader stalls were causing the entire EU to stall. The entire EU stalls when all of its threads are stalled.
Inspect the shader code using the Graphics Frame Analyzer. |
GS Invocations |
Represents the number of geometry shader invocations. The value is 0 if no geometry shader is associated with the rendering call.
NOTE:
See Microsoft* DirectX* SDK for a description of the shader invocation count.
Examples: If GS Invocations is 1000 it means that the geometry shader was invoked for 1000 primitives. Improving Performance: The only way to minimize the number of geometry shader invocations is to minimize the number of input primitives. The impact on rendering performance of reducing the invocation count is highly dependent upon your specific game or application. |
Post-GS Primitives |
Represents the number of primitives that flowed out of the geometry shader (GS), if enabled, to the clipper. This metric is important if a geometry shader was associated with the selected rendering calls, and even more important if the number of primitives spawned by geometry shader code is dynamic.
NOTE:
If the GS was not enabled for the selected rendering calls, the metric returns a value of 0.
Examples: If Post-GS Primitives is 1000 and Primitive Count is 100, it means that 1000 primitives were constructed in the geometry shader from the original 100. Improving Performance: Analyze the geometry shader code using Graphics Frame Analyzer. |
PS Duration |
Represents an approximation of the total GPU time spent executing pixel shader code.
Improving Performance: Examine the Pixel Shader Duration time versus the GPU Duration ; when Pixel Shader Duration is high you may improve overall rendering performance by optimizing your pixel shader code.Refer to the Graphics API Performance Guide to find advice for pixel shader optimizations. |
PS EU Active % |
Represents the percentage of overall GPU time that the EUs were actively executing Pixel Shader instructions.
|
PS EU Stall % |
Represents the percentage of overall GPU time that the EUs were stalled in Pixel Shader instructions.
NOTE:
This metric does not show total amount of stalled time in the pixel shader, but only the fraction of time when pixel shader stalls caused the entire EU to stall. The entire EU stalls when all of its threads are stalled.
|
PS Invocations |
Represents the number of pixel shader invocations. The pixel shader is invoked once per pixel. Examples: If you render a quad with 8x8 pixels size that is located entirely within the viewing frustum, the Pixel Shader Invocation Count is 64. Improving Performance: Usually PS Invocations workloads are one of the most expensive in the rendering pipeline due to the processing time required within the pixel shader. Therefore, keeping the number of invocations as low as possible will likely improve your rendering performance.
NOTE:
For Intel® microarchitecture code name Ivy Bridge and Bay Trail, this metric includes pixels rejected by Early-Depth test, even though the pixel shader was not actually invoked for these pixels.
|
PS Send Pipeline Active % |
Represents the percentage of time in which EU send pipeline was actively processing a pixel shader instruction. |
PS FPU0 Pipe Active % |
Represents the percentage of time in which EU FPU0 pipeline was actively processing a pixel shader instruction. |
PS FPU1 Pipe Active % |
Represents the percentage of time in which EU FPU1 pipeline was actively processing a pixel shader instruction. |
EU FPU0 Pipe Active % |
Represents the percentage of time during which the EU FPU0 pipeline was actively processing. |
EU FPU1 Pipe Active % |
Represents the percentage of time during which the EU FPU1 pipeline was actively processing. |
EU Both FPU Pipes Active % |
Represents the percentage of time in which both EU FPU pipelines were actively processing. |
EU Send Pipe Active % |
Represents the percentage of time during which the EU Send pipeline was actively processing. |
CS Duration |
Represents the total amount of time the GPU spent executing compute shader code.
Improving Performance: If CS Duration is larger than you expect, you can examine your compute shader code in the Graphics Frame Analyzer to investigate possible optimizations. |
CS EU Active % |
Represents the percentage of overall GPU time that the EUs were actively executing Compute Shader instructions.
|
CS EU Stall % |
Represents the percentage of overall GPU time that the EUs were stalled in Compute Shader instructions. A shader thread will stall when it reaches an instruction that cannot complete until some time-consuming operation is completed.
NOTE:
This metric does not include the total amount of stalled time in the Compute Shader, but only the amount of time when the Compute Shader was causing the entire EU to stall. The EUs in the Intel® HD Graphics are hyperthreaded, which means that each EU can very quickly (within 2 clock cycles) switch from a stalled shader thread to another shader thread. Therefore, it is possible at any given time for a number of shader threads to be stalled on an EU, but for the EU to continue actively executing instructions on another shader thread. The entire EU is considered to be stalled only when all of its threads are stalled.
Improving Performance: If a large amount of stall time seems to be occurring in a particular shader, then you should examine that shader to see whether you can reduce or eliminate some of the stalls.Short shaders might normally stall for a majority of their execution time, since in such situations instruction or data fetch (texels, constants) latency cannot be ‘hidden’. If a large stall time occurs in longer shaders, it usually indicates inefficient shader execution and possible optimization opportunities. Inspect the shader code that was executed for a given draw call and experiment with optimizations in Graphics Frame Analyzer. |
CS Invocations |
Represents the number of compute shader invocations. The Compute Shader is invoked once per thread per thread group. The number of threads per thread group is defined by the Compute Shader’s numthreads attribute (numthreads(tX, tY, tZ)). The number of thread groups executed is determined by the parameters to the Dispatch call (Dispatch(gX, gY, gZ)). CS Invocations is equal to (gX*gY*gZ)*(tX*tY*tZ).
|
Sampler Metrics
Metric Name |
Description |
---|---|
Sampler Busy % |
Represents the percentage of time the texture sampler was busy handling texel fetch requests (that is, was either active or stalled).
NOTE:
This metric is unreliable when protected HD media content is being played back on a system with Intel® HD Graphics 5000/ 4600 / 4400 / 4200, Intel® Iris® graphics 5100, or Intel® Iris® Pro graphics 5200 configuration.
Improving Performance: When Sampler Busy % is running this might lead to execution unit stalls, especially if texture fetch latency does not occur in parallel with mathematical instructions (as the shader compiler attempts to optimize shader code to cover such latencies). Examine the EU Stall % metric to see the amount of EUs stalls. If the percentage is high and the Sampler Busy % is close to 100%, most likely you have a texturing bottleneck. Try the 2x2 textures experiment in the Experiments pane in the Graphics Frame Analyzer to see if this is the case. |
Sampler Texels, texels |
Represents the number of texels returned from the texture sampler.
NOTE:
This metric is unreliable when protected HD media content is being played back on a system with Intel® HD Graphics 5000/ 4600 / 4400 / 4200, Intel® Iris® graphics 5100, or Intel® Iris® Pro graphics 5200 configuration.
Examples: If Sampler Texels, texels is 1000, it means that 1000 texels were delivered to the execution units (EUs) from the texture sampler. Improving Performance: A high number of texels fetched from textures leads to a higher texture bandwidth and a higher number of texture sampler unit stalls, which might cause a high number of EU stalls caused by shaders awaiting texels from the sampler unit.Note that this metric could indicate that the shader stalls while fetching texture data inside branching logic. For example, if the shader fetches texture samples only inside an if() block in the code, this metric can help you understand how often the shader takes the branch.
NOTE:
This metric is accurate only to four texels, and generally is slightly larger than the actual number of texels used. This is because the texture sampler returns data in 2x2 texel quads. When sampling along angular edges, this inaccuracy becomes more pronounced.
|
Sampler Cache Misses, messages |
Represents the number of bytes of texture data read from memory by the GPU due to texture cache misses when rendering this frame. Note that the Texture Sampler reads data from memory in 64-byte blocks, so this metric can be used to calculate the number of texture cache misses as follows:
NOTE:
This metric is unreliable when protected HD media content is being played back on a system with Intel® HD Graphics 5000/ 4600 / 4400 / 4200, Intel® Iris® graphics 5100, or Intel® Iris® Pro graphics 5200 configuration.
Improving Performance: Usually a higher value for this metric leads to a higher percentage of Texture Sampler stalls. Therefore, utilize techniques that minimize the number of texture reads, such as shown in the “Improving Performance” section of the Sampler Stalled metric. |
Sampler Bottleneck % |
Represents the percentage of time that the texture sampler is a bottleneck. The sampler is stalling Execution Units (EUs) due to a full input FIFO and starving EUs due to a lack of results.
NOTE:
This metric is unreliable when protected HD media content is being played back on a system with Intel® HD Graphics 5000/ 4600 / 4400 / 4200, Intel® Iris® graphics 5100, or Intel® Iris® Pro graphics 5200 configuration.
Examples:
NOTE:
This metric might show incorrect results and will be disabled with the next driver update.
|
Sampler Stalled |
Represents the percentage of time the texture sampler was stalled. The texture sampler is stalled when its output queue is full, which can occur when it returns texture requests faster than the EUs can process them. When the texture sampler is stalled, it cannot process new requests.
To inspect shader code, see the Shaders tab in the Graphics Frame Analyzer. |
3D Pipe Metrics
Metric Name |
Description |
---|---|
Early Hi-Depth Test Fails, pixels |
Represents the total number of pixels dropped on the early hierarchical depth test. |
Early Depth Test Fails, pixels |
Represents the number of pixels that failed the early depth/stencil tests. |
Clipper Invocations |
Represents the number of primitives processed by the Clipper.
Improving Performance: In most cases you do not have to care about the clipper performance on Intel® HD Graphics 2000/3000 GPUs because these graphic processors utilize a fast clipping algorithm implemented in silicon. For more information on enabling/disabling hardware clipping read the Microsoft* DirectX* SDK documentation. |
Post-Clip Primitives |
Represents the number of primitives that flowed out of the clipper. The metric includes original primitives that passed the trivial clipping test (trivial accept), and new primitives that were created by the clipper as a result of the clipping operation.
Improving Performance: In most cases you do not have to care about the clipper performance on Intel® HD Graphics 2000/3000 GPUs because these graphic processors implement an efficient clipping algorithm in silicon. For more information on enabling/disabling hardware clipping read the Microsoft* DirectX* SDK documentation. |
Samples Killed in PS, pixels |
Represents the total number of samples or pixels dropped in pixel shaders. |
Primitive Count |
Represents the number of primitives sent to the 3D hardware.
NOTE:
For Microsoft* DirectX* 9: the Primitive Count metric matches the PrimitiveCount parameter in the rendering calls.
Improving Performance:
|
Vertex Count |
NOTE:
Improving Performance: To minimize the number of vertices sent to the pipeline and thereby improve vertex processing performance, use graphics primitives that minimize the amount of data being sent to and processed by the GPU, such as using single triangle strips. |
Samples Blended, pixels |
Represents the total number of samplers or pixels written to all render targets. |
Samples Written |
Represents the number of pixels/samples written to render targets. The graphics driver 9.17.10 introduces a new notion of deferred clears. For the sake of optimization, the driver decides whether to defer the actual rendering of clear calls in case subsequent clear and draw calls make it unnecessary. As a result, when clear calls are deferred, the Graphics Frame Analyzer shows their GPU Duration and Samples Written as zero. If later it turns out that a clear call needs to be drawn, the work associated with that clear call gets included in the duration of the erg that was being drawn when this clear call was deferred, not necessarily a clear call. This means that in the Graphics Frame Analyzer metrics associated with a clear call accurately reflect the real work associated with that erg. |
Alpha Test Fails |
Represents the number of pixels that failed the alpha test and are ignored (not written to the surface). Examples: If Alpha Test Fails is 5000, then 5000 pixels failed the alpha test and were not written to the surface. |
Pixels Rendered |
Represents the number of pixels that passed the depth-test (both Z-buffer and Stencil if enabled). If the depth-test was disabled, Pixels Rendered counts all the pixels that passed through from the previous pipeline stage.
NOTE:
Pixels that passed the depth-test might not necessarily appear in the render target, which could occur if the color buffer write mask is set to 0.
Improving Performance: A high number of rendered pixels results in a high number of pixel shader executions, which requires more rendering time. To keep the number of rendered pixels as low as possible, optimize the rendering order to maximize Early-Z benefit or use a Z-only pass if possible. To find areas with high depth complexity, use the Overdraw option in the Graphics Frame Analyzer. |