Run a Roofline Analysis

Intel® Advisor Tutorial: Use the Automated Roofline Chart to Make Optimization Decisions

Download PDF

ID 758351

Date 12/04/2020

Version 2021.1

Public

Visible to Intel only — GUID: GUID-04F4DCA7-1CDA-42CA-8DB2-F93174C797D0

View Details

Run a Roofline Analysis

This topic is part of a tutorial that shows how to use the automated Roofline chart to make prioritized optimization decisions.

Perform the following steps:

Run a Roofline Analysis.
Show/hide the Roofline chart.
Get to know Roofline chart data.
Get to know Roofline chart controls.

Key take-aways from these steps:

The Roofline analysis is a combination of the Survey analysis followed immediately by the Trip Counts/FLOPs analysis. The Trip Counts/FLOPs analysis may run three to four times longer than the Survey analysis.
The size and color of each Roofline chart dot represent relative execution time for each loop/function. Large red dots take the most time; small green dots take less time.
Horizontal Roofline chart lines (rooflines) indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization.
Diagonal Roofline chart lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization.
A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.
The best candidates for the greatest performance improvement are large, red dots that are farther from the topmost achievable roofline.
The Roofline chart offers a variety of controls to configure appearance and focus on data of interest.

Run a Roofline Analysis

In the Vectorization Workflow pane, click the control under Run Roofline to execute your target application twice to:

Measure the hardware limitations of your machine and collect loop/function timings using the Survey analysis.
Collect FLOPs data using the Trip Counts and FLOPS analysis - this collection can take three to four times longer than the Survey analysis.

Upon completion, the Intel Advisor displays a Roofline chart.

NOTE:

If the Workflow is not displayed in the Visual Studio IDE: Click the icon on the Intel Advisor toolbar. (It may take a few seconds to display.)

Show/Hide the Roofline Chart

There are several controls to help you show/hide the Roofline chart:

1	Click to toggle between Roofline chart view and Survey Report view.
2	Click to toggle to and from side-by-side Roofline chart and Survey Report view.
3	Drag to adjust the dimensions of the Roofline chart and Survey Report.

TIP:

For the remainder of this tutorial, view the Roofline chart and Survey Report side by side.

Get to Know Roofline Chart Data

The Roofline chart plots an application's achieved performance and arithmetic intensity against the machine's maximum achievable performance:

Arithmetic intensity (x axis) - measured in number of floating-point operations (FLOPs) and/or integer operations (INTOPs) per byte, based on the loop/function algorithm, transferred between CPU/VPU and memory
Performance (y axis) - measured in billions of floating-point operations per second (GFLOPS) and/or billions of integer operations per second (GINTOPS)

In general:

The size and color of each Roofline chart dot represent relative execution time for each loop/function. Large red dots take the most time, so are the best candidates for optimization. Small green dots take less time, so may not be worth optimizing.
Roofline chart diagonal lines indicate memory bandwidth limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The L1 Bandwidth roofline represents the maximum amount of work that can get done at a given arithmetic intensity if the loop always hits L1 cache. A loop does not benefit from L1 cache speed if a dataset causes it to miss L1 cache too often, and instead is subject to the limitations of the lower-speed L2 cache it is hitting. So a dot representing a loop that misses L1 cache too often but hits L2 cache is positioned somewhere below the L2 Bandwidth roofline.
Roofline chart horizontal lines indicate compute capacity limitations preventing loops/functions from achieving better performance without some form of optimization. For example: The Scalar Add Peak represents the peak number of add instructions that can be performed by the scalar loop under these circumstances. The Vector Add Peak represents the peak number of add instructions that can be performed by the vectorized loop under these circumstances. So a dot representing a loop that is not vectorized is positioned somewhere below the Scalar Add Peak roofline.
A dot cannot exceed the topmost rooflines, as these represent the maximum capabilities of the machine; however, not all loops can utilize maximum machine capabilities.
The greater the distance between a dot and the highest achievable roofline, the more opportunity exists for performance improvement.

In the following Roofline chart representation, loops A and G (large red dots), and to a lesser extent B (yellow dot far below the roofs), are the best candidates for optimization. Loops C, D, and E (small green dots) and H (yellow dot) are poor candidates because they do not have much room to improve or are too small to have significant impact on performance.

NOTE:

The Roofline chart and Survey Report are synchronized: Click a dot in the Roofline chart to highlight the corresponding data row in the Survey Report, and single-click a data row in the Survey Report to make the corresponding dot flash in the Roofline chart - as long as the loop contains floating-point operations. Loops without floating-point operations do not appear in the Roofline chart.

Mouse over each roofline (line), peak (rectangle), and loop (dot) in your Roofline chart to learn more about each chart element.

Get to Know Roofline Chart Controls

There are several controls to help you focus on the Roofline chart data most important to you, including the following.

1	Select Loops by Mouse Rect: Select one or more loops/functions by tracing a rectangle with your mouse. Zoom by Mouse Rect: Zoom in and out by tracing a rectangle with your mouse. You can also zoom in and out using your mouse wheel. Move View By Mouse: Move the chart left, right, up, and down. Undo or Redo: Undo or redo the previous zoom action. Cancel Zoom: Reset to the default zoom level. Export as x: Export the chart as a dynamic and interactive HTML or SVG file that does not require the Intel Advisor viewer for display. Use the arrow to toggle between the options.
2	Use the Cores drop-down toolbar to: Adjust rooflines to see practical performance limits for your code on the host machine. Build roofs for single-threaded applications (or for multi-threaded applications configured to run single threaded, such as one thread-per-rank for MPI applications. (You can use Intel Advisor filters to control the loops displayed in the Roofline chart; however, the Roofline chart does not support the Threads filter.) Choose the appropriate number of CPU cores to scale roof values up or down: 1 – if your code is single-threaded Number of cores equal or close to the number of threads – if your code has fewer threads than available CPU cores Maximum number of cores – if your code has more threads than available CPU cores By default, the number of cores is set to the number of threads used by the application (even values only). You’ll see the following options if your code is running on a multisocket PC: Choose Bind cores to 1 socket (default) if your application binds memory to one socket. For example, choose this option for MPI applications structured as one rank per socket. NOTE: This option may be disabled if you choose a number of CPU cores exceeding the maximum number of cores available on one socket. Choose Spread cores between all n sockets if your application binds memory to all sockets. For example, choose this option for non-MPI applications.
3	Toggle the display between floating-point (FLOP), integer (INT) operations, and mixed operations (floating-point and integer). If you collected Roofline with Calltacks: Enable the display of Roofline with Callstacks additions to the Roofline chart.
4	Display Roofline chart data from other Intel Advisor results or non-archived snapshots for comparison purposes. Use the drop-down toolbar to: Load a result/snapshot and display the corresponding filename in the Compared Results region. Clear a selected result/snapshot and move the corresponding filename to the Ready for comparison region. Note: Click a filename in the Ready for comparison region to reload the result/snapshot. Save the comparison itself to a file. NOTE: The arrowed lines showing the relationship among loops/functions do not reappear if you upload the comparison file. Click a loop/function dot in the current result to show the relationship (arrowed lines) between it and the corresponding loop/function dots in loaded results/snapshots.
5	Add visual indicators to the Roofline chart to make the interpretation of data easier, including performance limits and whether loops/functions are memory bound, compute bound, or both. Use the drop-down toolbar to: Show a vertical line from a loop/function to the nearest and topmost performance ceilings by enabling the Display roof rulers checkbox. To view the ruler, hover the cursor over a loop/function. Where the line intersects with each roof, labels display hardware performance limits for the loop/function. If you collected Roofline for All Memory Levels: Visually emphasize the relationships among displayed memory levels and roofs and for a selected loop/function dot by enabling the Show memory level relationships checkbox. Color the roofline zones to make it easier to see if enclosed loops/functions are fundamentally memory bound, compute bound, or bound by compute and memory roofs by enabling the Show Roofline boundaries checkbox. The preview picture is updated as you select guidance options, allowing you to see how changes will affect the Roofline chart’s appearance. Click Apply to apply your changes, or Default to return the Roofline chart to its original appearance. Once you have a loop/function's dots highlighted, you can zoom and fit the Roofline chart to the dots for the selected loop/function by once again double-clicking the loop/function or pressing SPACE or ENTER with the loop/function selected. Repeat this action to return to the original Roofline chart view. To hide the labeled dots, select another loop/function, or double-click an empty space in the Roofline chart.
6	Roofline View Settings: Adjust the default scale setting to show: The optimal scale for each Roofline chart view A scale that accommodates all Roofline chart views Roofs Settings: Change the visibility and appearance of roofline representations (lines): Enable calculating roof values based on single-threaded benchmark results instead of multi-threaded. Click a Visible checkbox to show/hide a roofline. Click a Selected checkbox to change roofline appearance: display a roofline as a solid or a dashed line. Manually fine-tune roof values in the Value column to set hardware limits specific to your code. Loop Weight Representation: Change the appearance of loop/function weight representations (dots): Point Weight Calculation: Change the Base Value for a loop/function weight calculation. Point Weight Ranges: Change the Size, Color, and weight Range (R) of a loop/function dot. Click the + button to split a loop weight range in two. Click the - button to merge a loop weight range with the range below. Point Colorization: color loop/function dots by weight ranges or by type (vectorized or scalar). You can also change the color of loop with no self time. You can save your Roofs Settings or Point Weight Representation configuration to a JSON file or load a custom configuration.
7	Zoom in and out using numerical values.
8	Click a loop/function dot to: Outline it in black. Display metrics for it. Display corresponding data in other window tabs. Right-click a loop/function dot or a blank area in the Roofline chart to perform more functions, such as: Further simplify the Roofline chart by filtering out (temporarily hiding a dot), filtering in (temporarily hiding all other dots), and clearing filters (showing all originally displayed dots). Copy data to the clipboard.
9	Show/hide the metrics pane: Review the basic performance metrics in the Point Info pane. If you collected the Roofline for All Memory Levels: Review how efficiently the loop/function uses cache and what memory level bounds the loop/function in the Memory Metrics pane.
10	Display the number and percentage of loops in each loop weight representation category.

This tutorial uses prepackaged analysis results from this point forward...

...because of tutorial duration and hardware dependency considerations.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® Advisor Tutorial: Use the Automated Roofline Chart to Make Optimization Decisions

Run a Roofline Analysis

Run a Roofline Analysis

Show/Hide the Roofline Chart

Get to Know Roofline Chart Data

Get to Know Roofline Chart Controls

This tutorial uses prepackaged analysis results from this point forward...