Intel® Advisor User Guide

ID 766448
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Run GPU Roofline Insights Perspective from Command Line

To plot a Roofline chart, the Intel® Advisor runs two steps:

  1. Collect OpenCL™ kernels timings and memory data using the Survey analysis with GPU profiling.
  2. Measure the hardware limitations and collect floating-point and integer operations data using the Characterization analysis with GPU profiling.

    Intel® Advisor calculates compute operations (FLOP and INTOP) as a weighted sum of the following groups of instructions: BASIC COMPUTE, FMA, BIT, DIV, POW, MATH.

    Intel Advisor automatically determines data type in the collected operations using the dst register.

For convenience, Intel Advisor has the shortcut --collect=roofline command line action, which you can use to run both Survey and Characterization analyses with a single command. This shortcut command is recommended to run the GPU Roofline Insights perspective.

TIP:
See Intel Advisor cheat sheet for quick reference on command line interface.

Prerequisites

  1. Configure your system to analyze GPU kernels.
  2. Set Intel Advisor environment variables with an automated script to enable the advisor command line interface (CLI).

Run the GPU Roofline Insights Perspective

There are two methods to run the GPU Roofline analysis. Use one of the following:

  • Run the shortcut --collect=roofline command line action to execute the Survey and Characterization analyses for GPU kernels with a single command. This method is recommended to run the CPU / Memory Roofline Insights perspective, but it does not support MPI applications.
  • Run the Survey and Characterization analyses for GPU kernels with the --collect=survey and --collect=tripcounts command actions separately one by one. This method is recommended if you want to analyze an MPI application.

Optionally, you can also run the Performance Modeling analysis as part of the GPU Roofline Insights perspective. If you select this analysis, it models your application performance on a baseline GPU device as a target to compare it with the actual application performance. This data is used to suggest more recommendations for performance optimization.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Method 1. Run the Shortcut Command

  1. Collect data for a GPU Roofline chart with a shortcut.
    advisor --collect=roofline --profile-gpu --project-dir=./advi_results -- ./myApplication

    This command collects data both for GPU kernels and CPU loops/functions in your application. For kernels running on GPU, it generates a Memory-Level Roofline.

  2. Run Performance Modeling for the GPU that the application runs on.
    advisor --collect=projection --profile-gpu --model-baseline-gpu --project-dir=./advi_results
    IMPORTANT:
    Make sure to use the --model-baseline-gpu option for Performance Modeling to work correctly.

    This command models your application potential performance on a baseline GPU as a target to determine additional optimization recommendations.

Method 2. Run the Analyses Separately

Use this method if you want to analyze an MPI application.

  1. Run the Survey analysis.
    advisor --collect=survey --profile-gpu --project-dir=./advi_results -- ./myApplication
  2. Run the Characterization analysis to collect trip counts and FLOP data:
    advisor --collect=tripcounts --flop --profile-gpu --project-dir=./advi_results -- ./myApplication

    These commands collect data both for GPU kernels and CPU loops/functions in your application. For kernels running on GPU, it generates a Memory-Level Roofline.

  3. Run Performance Modeling for the GPU that the application runs on.
    advisor --collect=projection --profile-gpu --model-baseline-gpu --project-dir=./advi_results
    IMPORTANT:
    Make sure to use the --model-baseline-gpu option for Performance Modeling to work correctly.

    This command models your application potential performance on a baseline GPU as a target to determine additional optimization recommendations.

You can view the results in the Intel Advisor graphical user interface (GUI) or in CLI, or generate an interactive HTML report. See View the Results below for details.

Analysis Details

The CPU / Memory Roofline Insights workflow includes only the Roofline analysis, which sequentially runs the Survey and Characterization (trip counts and FLOP) analyses.

The analysis has a set of additional options that modify its behavior and collect additional performance data.

Consider the following options:

Roofline Options

To run the Roofline analysis, use the following command line action: --collect=roofline.

NOTE:
You can also use these options with --collect=survey and --collect=tripcounts if you want to run the analyses separately.

Recommended action options:

Options

Description

--profile-gpu

Analyze GPU kernels. This option is required for each command.

--target-gpu

Specify one or more target GPU adapters to collect profiling data. The adapter address should be in the following format <domain>:<bus>:<device-number>.<function-number>. Only decimal numbers are accepted.

For example: --target-gpu=0:77:0.0

To specify multiple adapters, use a comma-separated list.

For example: --target-gpu=0:77:0.0,0:154:0.0

If this option is not configured (default setting), all GPUs available on your system will be processed.

TIP:
To see a list of GPU adapters available on your system, run advisor --help target-gpu and see the option description.

--gpu-sampling-interval=<double>

Set an interval (in milliseconds) between GPU samples. By default, it is set to 1.

--enable-data-transfer-analysis

Model data transfer between host memory and device memory. Use this option if you want to run the Performance Modeling analysis.

--track-memory-objects

Attribute memory objects to the analyzed loops that accessed the objects. Use this option if you want to run the Performance Modeling analysis.

--data-transfer=<level>

Set the level of details for modeling data transfers during Characterization. Use this option if you want to run the Performance Modeling analysis.

Use one of the following values:

  • Use light to model only data transfer between host and device memory.
  • Use medium to model data transfers, attribute memory object, and track accesses to stack memory.
  • Use high to model data transfers, attribute memory objects, track accesses to stack memory, and identify where data can be reused.

See advisor Command Option Reference for more options.

Performance Modeling Options

To run the Performance Modeling analysis, use the following command line action: --collect=projection.

The action options in the table below are required to use when you run the Performance Modeling analysis as part of the GPU Roofline Insights perspective:

Options

Description

--profile-gpu

Analyze GPU kernels. This option is required for each command.

--enforce-baseline-decomposition

Use the same local size and SIMD width as measured on the baseline. This option is required.

--model-baseline-gpu

Use the baseline GPU configuration as a target device for modeling. This option is required.

This option automatically enables the --enforce-baseline-decomposition option, so you can use only --model-baseline-gpu.

See advisor Command Option Reference for more options.

Next Steps

Continue to explore GPU Roofline results. For details about the metrics reported, see Accelerator Metrics.

See Also