Model GPU Application Performance for a Different GPU Device
This recipe illustrates how to estimate application performance from one Intel® graphics processing unit (GPU) architecture to another by running the Offload Modeling perspective from the Intel® Advisor.
The performance estimation plays an important role in determining the next steps for the future-generation GPU architectures. For such cases, the GPU-to-GPU modeling is more accurate than the CPU-to-GPU modeling because of inherent differences between CPU and GPU execution flows.
In this recipe, use the Intel Advisor to analyze performance a SYCL application with the GPU-to-GPU modeling flow of the Offload Modeling perspective to estimate the profitability of offloading the application to the Intel® Iris® Xe MAX graphics (gen12_dg1 configuration).
Directions:
- Prerequisites.
- Run GPU-to-GPU performance modeling.
- Examine performance speedup on the target GPU.
- Alternative steps.
Ingredients
This section lists the hardware and software used to produce the specific result shown in this recipe:
- Performance analysis tools: Intel Advisor 2021
Available for download as a standalone and as part of the Intel® oneAPI Base Toolkit.
- Application: SYCL implementation of the Mandelbrot sample application, which is part of oneAPI samples
- Compiler: Intel® oneAPI DPC++/C++ Compiler 2021
Available for download as part of the Intel® oneAPI Base Toolkit.
- Operating system: Ubuntu* 20.04
- Baseline GPU: Intel® Iris® Plus Graphics 655
You can download a precollected Offload Modeling report for the Mandelbrot application to follow this recipe and examine the analysis results.
Prerequisites
Set up environment variables for oneAPI tools:
source <oneapi-install-dir>/setvars.sh
- Configure your system to analyze GPU kernels.
- Build the SYCL version of the Mandelbrot application:
cd mandelbrot/ && mkdir build && cd build && cmake .. && make
Run GPU-to-GPU Performance Modeling
You can run the GPU-to-GPU modeling using Intel Advisor command line interface (CLI), Python* scripts, or Intel Advisor graphical user interface (GUI).
In this section, use a special command line collection preset for the Offload Modeling perspective with the --gpu option to run all perspective analyses for the GPU-to-GPU modeling with a single command:
advisor --collect=offload --project-dir=./mandelbrot-advisor --gpu --config=gen12_dg1 -- ./mandelbrot
This command runs the perspective with the default medium accuracy and runs the following analyses one-by-one:
- Survey analysis to collect baseline performance data
- Characterization analysis to collect trip counts and FLOP and model data transfers
- Performance Modeling from the baseline Intel® UHD Graphics P630 device to the target Intel® Iris® Xe MAX Graphics
Important: The command line collection preset does not support MPI applications. You will need to run the analyses separately to analyze MPI application.
Once the analyses are completed, the result summary is printed to the terminal. You can continue to view the results in the Intel Advisor GUI or in an interactive HTML report from your preferred web browser.
Examine Performance Speedup on the Target GPU
In this section, examine the HTML report to understand the GPU-to-GPU modeling results. The HTML report is generated automatically after you run the Offload Modeling from CLI or using the Python scripts and is saved to ./mandelbrot-advisor/e000/report/advisor-report.html. You can open the report in your preferred web browser.
In the Summary tab, examine the Top Metrics and Program Metrics panes to understand the performance gain.
- The Top Metrics pane shows an average speed up of 5.311x from offloading one code region of the Mandelbrot application from the baseline Intel® Iris® Plus Graphics 655 GPU device to the target Intel® Iris® Xe MAX Graphics GPU device.
- The Program Metrics shows measured execution time for the current run on the baseline GPU and an estimated time for the run on the target GPU.
You can navigate between Summary, Accelerated Regions, and Source View tabs to understand details about the offloaded regions, examine useful metrics and the potential performance gain.
The Accelerated Regions tab provides detailed information for the offloaded code regions along with the source code in the bottom pane. In this view, you can examine different useful metrics for offloaded regions of interest. For example, examine the following metrics measured for the kernels running on the baseline GPU: iteration space, thread occupancy, SIMD width, local size, global size.
Examine the following metrics estimated for the target GPU: performance issues, time, speedup, data transfer with reuse.
See Accelerator Metrics for detailed description and interpretation of these metrics.
Alternative Steps
You can run the GPU-to-GPU modeling using Intel Advisor command line interface (CLI), Python* scripts, or Intel Advisor GUI.
Run Intel Advisor Python Scripts (Instead of Offload Modeling Collection Preset)
Use the special Python scripts delivered with the Intel Advisor to run the GPU-to-GPU modeling. These scripts use the Intel Advisor Python API to run the analyses.
For example, run the run_oa.py script with the --gpu to execute the perspective using a single command as follows:
$ advisor-python $APM/run_oa.py ./mandelbrot-advisor --collect=basic --gpu --config=gen12_dg1 -- ./mandlebrot
The run_oa.py script runs the following analyses one-by-one:
- Survey analysis to collect baseline performance data
- Characterization analysis to collect trip counts and FLOP and model data transfers
- Performance Modeling from the baseline Intel® UHD Graphics P630 device to the target Intel® Iris® Xe MAX Graphics
Important: The command line collection preset does not support MPI applications. Use the Intel Advisor CLI to analyze MPI application.
Once the analyses are completed, the result summary is printed to the terminal. You can continue to view the results in the Intel Advisor GUI or in an interactive HTML report from your preferred web browser.
Run Intel Advisor GUI (Instead of Offload Modeling Collection Preset)
Prerequisite: Create a project for the Mandelbrot application.
To run GPU-to-GPU modeling from Intel Advisor GUI:
- From the Perspective Selector window, select the Offload Modeling perspective.
- In the Analysis Workflow pane, select the following:
- Select GPU from the Baseline Device drop-down.
- Select Xe LP Max from the Target Platform Model drop-down.
- Run the perspective.
Once the perspective is completed, the GPU-to-GPU offload modeling result is shown in the pane on the right.
Key Take-Aways
With the GPU-to-GPU modeling, you can get more accurate projections for your application performance on the next-generation GPUs even before you have the hardware. The metrics collected by Offload Modeling can help you understand performance of the kernels running on the baseline GPU. The new interactive HTML report gives GUI-like experience and allows you to switch between Offload Modeling and GPU Roofline Insights perspectives, almost as in GUI.