Introduction
The free toolset Intel® Graphics Performance Analyzers (Intel® GPA) provides actionable information for developers tuning graphics applications for Intel® hardware. The performance analysis and optimization data from Intel GPA can be invaluable to game developers who want to reach a desired target frame rate or resolve troublesome areas of game play. It identifies bottlenecks, highlights problem frames, and spots resource-hogging interactions across CPU and GPU threads, which means you can pinpoint areas for analysis early and avoid contentious rewrites later.
This article provides an overview of the toolset as of the 2020.4 release, based on recent videos and software updates.
Intel® GPA Overview
Intel GPA works on Microsoft DirectX*, Apple Metal*, Vulkan*, and OpenGL* applications on Windows*, macOS*, and Ubuntu* hosts. The toolset has, for many years, been a popular topic at GDC* among game developers at all levels – and 2021 is no exception. Intel GPA is vital to understanding performance on Intel® CPUs and GPUs; with it, you can:
- Optimize code for Intel hardware.
- Pinpoint GPU versus CPU issues
- Identify bottlenecks throughout the rendering pipeline.
- Observe real-time data during game play.
To help highlight how Intel GPA works, we enlisted the help of power-couple Alex and Tim Porter, the principals at Austin, Texas-based MOD Tech Labs*. Their new startup helps customers automate and optimize workflows for a wide range of graphics-heavy applications, from data visualization to VR driving simulators. The couple’s backgrounds in creative arts and technologies mean they are well positioned to explain issues to artists and developers alike – they quite simply speak the same language.
Alex Porter is the CEO and co-founder of MOD Tech Labs. Her background in physical and digital design began while earning her bachelor’s degree in interior design and construction technology at Texas State University, San Marcos. As a serial entrepreneur, Alex has founded several companies since 2001. She has been recognized as a Forbes “The Next 1000” honoree and as an Intel® Software Innovator. Alex has presented at SXSW*, Siggraph*, and she volunteers with organizations including Girls in Tech*, the Seedling foundation, and the Black Technology Mentorship Program.
Tim Porter, the CTO and co-founder of MOD Tech Labs, is busy creating groundbreaking AI-powered technology that is reshaping the future of content creation. After graduating from Full Sail University, Tim began his career as an animation artist and, over two decades, expanded into highly technical developer roles at leading game and production studios. He worked at Sony Pictures Imageworks*, Two Bit Circus*, Gameloft*, and GameHouse*, to name a few. Tim was recognized – as joint CEO of Underminer Studios* – with the Austin Chamber of Commerce Innovation Award in 2019, and has been an IntelⓇ Software Innovator since 2017.
Tim credits much of his ability to troubleshoot client problems to his work with Intel. “It's really attractive for me to use Intel GPA,” Tim explained. “I enjoy having all of the information at my fingertips, and that's something that's really difficult with most performance analyzers that are out there. When I really need to see what is going on, I pop open Intel GPA."
System Analyzer Provides Top-Level View
The System Analyzer tool helps developers visualize the high-level performance profile of their game, and provides a holistic view of a game's performance impact on the system. With the December 2020 R4 release, you’ll find a new user interface with a simplified workflow and new metric counter sets. You can view CPU, GPU, and graphics metrics in real time to determine whether your application is CPU- or GPU-bound. You can also experiment with graphics pipeline state overrides to perform a high-level iterative analysis of your game – without changing a single line of code.
Tim says he often begins by running System Analyzer to get initial frames-per-second (fps) measurements. He can see GPU core clocks per cycle, identify frame time, and watch the primitive operations count – basic elements such as lines, curves, and polygons that combine to create more complex images.
Alex Porter says that while Tim is digging into the specifics of a single frame, she frequently works at the high level. “I am the bystander who's getting the issues explained to [them], so often the only part that I typically see is the dashboard, and it's really easy to follow,” she says. “It's easy to explain and understand clearly, because of the visual representations of the actual tool itself, which I think is very effective."
Graphics Frame Analyzer
To go deeper into the game mechanics – and in particular resolve issues in the GPU such as bottlenecks in the graphics pipeline – the Porters point to Graphics Frame Analyzer. It helps developers visualize and analyze multiframe streams to identify single frames of interest, and profile them down to draw-call level. This ability is crucial to understand bottlenecks, resource usage, and system status.
Tim says it is extremely important to show clients the assets that are causing issues, which he can do frame-by-frame. He can give the client a breakdown analysis and allow them to watch the video as the data comes in.
“Then it becomes an education kind of thing,” he says. “I've always found most artists and developers really want to do a good job and make the product better. They love what they do, and they put their hearts and souls into it. But they can't solve for things they don't know about, so they want to be educated. I just have to figure out what level they want to be educated at."
Single Frame Analysis for Extreme Detail
“Most of the time, I'm just going to do a frame analysis,” Tim says (to get a top-level view). “But it's nice to know that more detail is there if I need to get down into it, because sometimes with optimization, the issue is right there in front of you."
When the issue is obscure, he can keep looking. “That last 10% of the time, you're sitting there scratching your head and you're going to optimize a billion things. And then you realize it's just this one little bizarre micro detail."
Tim says it’s nice to actually watch how the packets jump around, what’s happening between states, and what the pipeline is doing. “You can go through each of the different assets and really dig in. Sometimes it seems like you’ve identified the wall, but you realize somebody didn’t turn on a particular call. So, it was actually in the assets that were behind that frame, but it didn't look like that according to the frame splits."
Single Frame Analysis provides detailed metrics down to the draw call level, to see what the shaders, render states, pixel history, and textures reveal. You can also experiment with performance and visual impacts without having to recompile your source code
Multiframe Analysis Finds Hotspots and Bottlenecks
Intel GPA can capture a multiframe stream to isolate intermittent rendering anomalies, glitches, and frame hitches. Tim uses multiframe analysis to profile otherwise undetectable multiframe algorithm issues in the user interface and to help him hone in on a single frame by using the traditional Frame Analyzer workflow. He can spot issues in the graphics pipeline anatomy, gather metrics, and quickly find hotspots and bottlenecks.
Tim explains he can look at an issue from seven or eight different points of interest. The client may report a low frame rate, or heavy RAM usage, with uneven spikes. “That’s great for a starting point,” Tim says, “but unless you understand the ecosystem behind an issue, you can lose a lot of time on possible solutions that are actually a waste of energy and time.” And with time at such a premium for startups and established studios alike, it’s vital to identify issues early and solve at a system level whenever possible.
Graphics Trace Analyzer Helps Balance Resources
Another key ability in Intel GPA is the Graphics Trace Analyzer, which allows you to evaluate workload performance across the CPU and GPU. Without the right balance, bottlenecks can form, and Tim spends a lot of time making sure tasks are apportioned properly. Using the Graphics Trace Analyzer, he can explore queued GPU tasks, examine CPU thread utilization, and pinpoint CPU and GPU activity based on captured platform and hardware metrics.
Analyzing traces is crucial to reviewing tasks and threads in Microsoft DirectX, Vulkan, and GPU-accelerated media applications. Tim can correlate events on the same timeline and get further insight into performance details. He can understand full system latency with VR compositor events, and toggle event domains to collect only the data he needs, reducing the application workload.
Tim appreciates the very clear focus on the handoff between the GPU and the CPU. CPU cores and threads are great for handling complex branching tasks, but all cores are not equal. The first 9th Gen Intel® Core™ i9 desktop processor offered eight cores and 16 threads. Modern discrete graphics cards contain thousands of less-powerful GPU cores, which can handle one simple, non-branching calculation swiftly and repeatedly.
“Some of the material that we're playing in is targeting 120 fps, so that’s where the handoff between CPU and GPU is crucial,” Tim says. At 120 fps, the entire system is stressed continually. “You start running into these microbubbles of transfer issues,” as he puts it. “There's a driver overhead with every single frame transfer that happens from GPU to CPU. As you go down that stream, there's a small buffer time that happens between each handoff. As you start increasing your frame rate, that latency becomes more apparent as things go along."
With more frames and more transfers between the CPU and the GPU, timing is crucial – and that’s where analysis tools come in. “You end up being able to watch the process,” Tim says, spotting individual packets. Eventually, Tim reaches a point where he can spot exactly where the issue is, and, in turn, show developers the “perceptual chug” that can make all the difference between immersion and frustration."
Alex says that Tim is not the ideal date to take to a new film with a lot of computer-generated imagery. “It is impossible to take him to a movie without hearing all the things that they've done wrong,” she says. “But the worst part is that it's now infected me, too. It's not the same as it used to be."
Intel GPA Framework for Command-Line Scripting
For complete control of the system, Intel GPA also provides GPA Framework, a command line and scripting interface that allows you to access frameworks that expose the capture and playback functionality of Intel GPA. Intel developed the backend of Graphics Frame Analyzer into a separate product, enabling you to access its profiling capabilities through programmatic interfaces in C++ and Python*. This allows you to create performance reporting and move into integration and deployment systems to assess performance regressions and improvements.
Download Intel GPA Today
Tim recommends that any developer using IntelⓇ architecture – and that’s just about everyone – should immediately download the Intel GPA toolset and start coming up to speed on it. Intel has developed tremendous training resources for various aspects of the tool, from beginner cookbooks to detailed usage guides to take you from early conflict detection to resolution.
Explaining to a development team why they need to spend more energy in a certain area – without hard data and comparison video – can be a problem; it’s hard to visualize an issue without those elements. Tim says, “Intel GPA will give you not just the hard number results, but also the visual results so that people can actually take a look and feel educated as they go along.” Anybody who wants to get deeper into optimization will have the data they need to prove their case.
“You can have Intel GPA on your system, have it open, and get all the information that you want,” Tim added. “If you find yourself frustrated with optimization, it’s probably because you don't have a good tool. And Intel GPA is really going to meet you where you are."