Intel® Graphics Performance Analyzers User Guide

ID 767266
Date 12/19/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Instrument Your Application

Graphics Trace Analyzer is an instrumentation-based tool. It obtains its data from gpa_trace files, which are generated by Intel® GPA while profiling an application that has been instrumented with Instrumentation and Tracing Technology API (ITT API) calls.

To get the most out of the ITT APIs, you need to add API calls in your code to designate logical tasks. This will help you visualize the relationship between tasks in your code, including when they start and end, relative to other CPU and GPU tasks.

At the highest level a task is a logical group of work executing on a specific thread, and may correspond to any grouping of code within your program that you consider important. You can mark up your code by identifying the beginning and end of each logical execution chunk.

To resolve the majority of performance bottlenecks, the following API calls are enough

  • __itt_domain_create() creates a domain required in most ITT API calls. You need to define at least one domain.
  • __itt_string_handle_create() creates string handles for identifying your tasks. String handles are more efficient for identifying traces than strings.
  • __itt_task_begin() marks the beginning of a task.

Example

The following sample shows how four basic ITT API functions are used in a multi threaded application:

#include <windows.h>
#include <ittnotify.h>
 
// Forward declaration of a thread function.
DWORD WINAPI workerthread(LPVOID);
bool g_done = false;
// Create a domain that is visible globally: we will use it in our example.
__itt_domain* domain = __itt_domain_create("Example.Domain.Global");
// Create string handles which associates with the "main" task.
__itt_string_handle* handle_main = __itt_string_handle_create("main");
__itt_string_handle* handle_createthread = __itt_string_handle_create("CreateThread");
void main(int, char* argv[])
{
// Create a task associated with the "main" routine.
__itt_task_begin(domain, __itt_null, __itt_null, handle_main);
// Now we'll create 4 worker threads
for (int i = 0; i < 4; i++)
{
// We might be curious about the cost of CreateThread. We add tracing to do the measurement.
__itt_task_begin(domain, __itt_null, __itt_null, handle_createthread);
::CreateThread(NULL, 0, workerthread, (LPVOID)i, 0, NULL);
__itt_task_end(domain);
}
 
// Wait a while,...
::Sleep(5000);
g_done = true;
// Mark the end of the main task
__itt_task_end(domain);
}
// Create string handle for the work task.
__itt_string_handle* handle_work = __itt_string_handle_create("work");
DWORD WINAPI workerthread(LPVOID data)
{
// Set the name of this thread so it shows  up in the UI as something meaningful
char threadname[32];
wsprintf(threadname, "Worker Thread %d", data);
__itt_thread_set_name(threadname);
// Each worker thread does some number of "work" tasks
while(!g_done)
{
__itt_task_begin(domain, __itt_null, __itt_null, handle_work);
::Sleep(150);
__itt_task_end(domain);
}
return 0;
}