GPU Roofline Accuracy Levels in Command Line

Intel® Advisor User Guide

Download PDF

ID 766448

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-2E6394B4-39C9-4D97-B3C4-A91A6301F8F2

View Details

GPU Roofline Accuracy Levels in Command Line

For each perspective, Intel® Advisor has several levels of collection accuracy. Each accuracy level is a set of analyses and properties that control what data is collected and the level of collection details. The higher accuracy value you choose, the higher runtime overhead is added.

In CLI, each accuracy level corresponds to a set of commands with specific options that you should run one by one to get a desired result.

The following accuracy levels are available:

Comparison / Accuracy Level	Low	Medium	High
Overhead	5 - 10x	15 - 20x	20 - 50x
Goal	Analyze kernels in your application running on GPU	Analyze kernels running on GPU and loops/functions running on CPU in more details	Analyze kernels running on GPU and loops/functions running on CPU in more details
Analyses	Survey with GPU profiling + Characterization (FLOP)	Survey with GPU profiling + Characterization (FLOP, memory object analysis with light data transfer simulation between host and target device memory) + Performance Modeling for a baseline GPU	Survey with GPU profiling + Characterization (Trip Counts and FLOP with call stacks for CPU, CPU cache simulation, memory object analysis with medium data transfer simulation between host and target device memory) + Performance Modeling for a baseline GPU
Result for kernels on GPU	Memory-level GPU Roofline (for CARM, L3, SLM, GTI) with basic set of recommendations for performance optimization	Memory-level GPU Roofline (for CARM, L3, SLM, GTI) with extended set of recommendations for performance optimization	Memory-level GPU Roofline (for CARM, L3, SLM, GTI) with extended set of recommendations for performance optimization
Result for loops/functions on CPU	Cache-aware CPU Roofline for L1 cache	Memory-level Roofline with call stacks (for L1, L2, L3, DRAM)	Memory-level Roofline with call stacks (for L1, L2, L3, DRAM)

You can generate commands for a desired accuracy level from the Intel Advisor GUI. See Generate Command Lines from GUI for details.

NOTE:

There is a variety of techniques available to minimize data collection, result size, and execution overhead. Check Minimize Analysis Overhead.

Consider the following command examples.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Low Accuracy

To run the GPU Roofline Insights perspective with the low accuracy:

advisor --collect=roofline --profile-gpu --enable-data-transfer-analysis --project-dir=./advi_results -- ./myApplication

Medium Accuracy

Run the GPU Roofline.

advisor --collect=roofline --profile-gpu --enable-data-transfer-analysis --track-memory-objects --data-transfer=light --project-dir=./advi_results -- ./myApplication

Run Performance Modeling for the GPU that the application runs on.
```
advisor --collect=projection --profile-gpu --enforce-baseline-decomposition --model-baseline-gpu --project-dir=./advi_results
```
NOTE:
The --model-baseline-gpu option automatically enables --enforce-baseline-decomposition. To simplify the command, you can skip the --enforce-baseline-decomposition option and use only --model-baseline-gpu.

High Accuracy

Run the GPU Roofline.

advisor --collect=roofline --profile-gpu --stacks --enable-cache-simulation --enable-data-transfer-analysis --track-memory-objects --data-transfer=medium --project-dir=./advi_results -- ./myApplication

Run Performance Modeling for the GPU that the application runs on.
```
advisor --collect=projection --profile-gpu --enforce-baseline-decomposition --model-baseline-gpu --project-dir=./advi_results
```
NOTE:
The --model-baseline-gpu option automatically enables --enforce-baseline-decomposition. To simplify the command, you can skip the --enforce-baseline-decomposition option and use only --model-baseline-gpu.

You can view the results in the Intel Advisor GUI or generate an interactive HTML report.

Parent topic: Run GPU Roofline Insights Perspective from Command Line

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® Advisor User Guide

GPU Roofline Accuracy Levels in Command Line

Low Accuracy

Medium Accuracy

High Accuracy

See Also