Intel® Advisor User Guide

ID 766448
Date 11/07/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Loop Markup to Minimize Analysis Overhead

Issue

Running your target application with the Intel® Advisor can take substantially longer than running your target application without the Intel® Advisor. Depending on an accuracy level and analyses you choose for a perspective, different overhead is added to your application execution time. For example:

Runtime Overhead / Analysis

Survey

Characterization

Dependencies

MAP

Target application runtime with Intel® Advisor compared to runtime without Intel® Advisor

1.1x longer

2 - 55x longer

5 - 100x longer

5 - 20x longer

Solutions

Use the following techniques to skip uninteresting loops and analyze only interesting loops.

Select Loops by ID

Goal: Minimize collection overhead.

Applicable analyses: Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

Use when...

  • You want to perform a deeper analysis on only a few loops.

  • CLI environment: You cannot identify source file/line numbers, such as when you are analyzing a target application for which you do not have access to source code.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Prerequisites:

  1. Run a Survey analysis.

  2. advisor CLI environment: Identify the loop IDs for the loops of interest.

    advisor --report=survey --project-dir=./advi_results -- ./myApplication

    In the report, the first column is the loop IDs.

TIP:

Intel® Advisor reports tend to be very wide. Do one of the following to generate readable reports:

  • Set your console width appropriately to avoid line wrapping.

  • Pipe your report using the appropriate truncation command if you care only about the first few report columns.

After performing the prerequisites, do one of the following:

  • For Vectorization and CPU Roofline: Mark the loop(s) of interest by enabling the associated checkbox on the Survey Report.

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

  • For Offload Modeling: Go to Project Properties > Performance Modeling and enter the CLI action option --select=<string> in the Other parameters field. For example, --select=5,10,12.

  • Mark the loop(s) of interest using the CLI action option --select=<string> (recommended) or --mark-up-list=<string> when running a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis. For example, with the --select option:

    advisor --collect=tripcounts --flop --project-dir=./advi_results --select=5,10,12 -- ./myApplication

    Then run a Characterization with Trip Counts and FLOP collections enabled, Dependencies, or Memory Access Patterns analysis.

NOTE:

There are different ways to select loops is in the CLI environment:

  • The advisor CLI action options --mark-up-list=<string> and --select=<string> merely simulate enabling a GUI checkbox when used within -collect action. They are active only for the duration of the --collect command.

  • The same options used with advisor CLI action --mark-up-loops actually enable a GUI checkbox. They are active beyond the duration of the -mark-up-loops command and applies to all downstream analyses, such as Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

Select Loops by Source File/Line Number

Minimize collection overhead.

Applicable analyses: Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

Use when...

  • You want to perform a deeper analysis on only a few loops.

  • CLI environment: You are analyzing a target application for which you have access to source code and can identify source file/line numbers.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

Prerequisites:

  1. Run a Survey analysis.

  2. advisor CLI environment: If necessary, identify the source file and line number for the loops of interest.

    advisor --report=survey --project-dir=./advi_results -- ./myApplication

After performing the prerequisites, do one of the following:

  • For Vectorization and CPU Roofline: Mark the loop(s) of interest by enabling the associated checkbox on the Survey report.

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

  • For Offload Modeling: Go to Project Properties > Performance Modeling and enter the CLI action option --select=<string> in the Other parameters field. For example, --select=foo.cpp:34,bar.cpp:192.

  • Mark the loop(s) of interest using the CLI action option --select=<string> (recommended) or --mark-up-list=<string> for a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis. For example, with the -select option:

    advisor --collect=tripcounts --flop --project-dir=./advi_results --select=foo.cpp:34,bar.cpp:192 -- ./bin/myApplication
  • Mark the loop(s) of interest by enabling the associated checkbox on the Survey Report.

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

  • Mark the loop(s) of interest using the advisor CLI action --mark-up-loops and action option --select=<string>. For example:

    advisor --mark-up-loops --select=foo.cpp:34,bar.cpp:192 --project-dir=./advi_results -- ./myApplication

    Then run a Characterization with Trip Counts and FLOP collection enabled, Dependencies, or Memory Access Patterns analysis.

NOTE:
  • There is essentially no difference between selecting loops by ID and selecting loops by source file/line in the GUI environment. The difference is in the advisor CLI environment:

    • The advisor CLI action option--mark-up-list=<string> merely simulates enabling a GUI checkbox; therefore it persists only for the duration of the --collect command.

    • The advisor CLI action--mark-up-loops and action option --select=<string> actually enables a GUI checkbox; therefore it persists beyond the duration of the --mark-up-loops command and applies to downstream analyses, such as Characterization with Trip Counts and FLOP collection enabled, Dependencies, and Memory Access Patterns.

  • If you use the --mark-up-loops CLI action to mark up loops, you can append and remove source file/line numbers for an analysis run after it using the advisor CLI action option --append=<string> and --remove=<string> respectively.

Select Loops by Criteria

Goal: Minimize collection overhead.

Applicable analyses: Dependencies, Memory Access Patterns.

Use when you want to perform a deeper analysis on loops chosen by criteria instead of by human input, such as when you are running the Intel® Advisor with a collection preset or using automated scripts.

To implement in the advisor CLI environment, run the commands similar to the following one by one from the command line or create a script similar to the following examples and run it to execute the commands automatically. Use the --select (recommended) or --loops option to select loops by criteria.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

For example, to analyze loop-carried dependencies in loops/functions that have the Assumes dependency present issue, use one of the following:

  • Example 1:

    advisor --collect=survey --project-dir=./advi_results -- ./bin/myApplication
    advisor --collect=dependencies --project-dir=./advi_results  -- ./myApplicaton
    
  • Example 2:

    advisor --collect=survey --project-dir=./advi_results -- ./bin/myApplication
    advisor --collect=dependencies select="scalar,has-issue" --project-dir=./advi_results  -- ./myApplicaton
    

Select Loops by Markup Algorithm

Goal: Minimize collection overhead.

Applicable analyses: Characterization with Trip Counts and FLOP collection enabled, Dependencies, Memory Access Patterns.

NOTE:
This is only applicable to the Offload Modeling perspective.

Use --select=r:markup=<algorithm> when you want to perform a deeper analysis on loops chosen by a pre-defined markup algorithm based on a programming model used and/or estimated offload profitability.

If you analyze an application that runs on a CPU, use the gpu_generic algorithm. This algorithm selects all potentially profitable loops/functions for additional analyses to collect more data and make sure they can be safely offloaded.

If you analyze code regions that are already offloaded and use a specific programming model, use one of the following algorithms:

  • omp - Select OpenMP* loops.

  • icpx -fsycl - Select SYCL loops.

  • ocl - Select OpenCL™ loops.

  • daal - Select Intel® oneAPI Data Analytics Library loops.

  • tbb - Select Intel® oneAPI Threading Building Blocks loops.

Note: In the commands below, make sure to replace the myApplication with your application executable path and name before executing a command. If your application requires additional command line options, add them after the executable name.

For example, to run the Offload Modeling and analyze potentially profitable code regions in details:

  • Example 1. Use the --select=r:markup=<algorithm> option with the --collect action option to select loops only for the specific analysis.

    advisor --collect=survey --project-dir=./advi_results --static-instruction-mix -- ./myApplication
    advisor --collect=tripcounts --project-dir=./advi_results --flop --cache-simulation=single  --target-device=xehpg_512xve --stacks --data-transfer=light  -- ./myApplication
    advisor --collect=dependencies --filter-reductions --loop-call-count-limit=16 --select markup=gpu_generic --project-dir=./advi_results -- ./myApplication
    advisor --collect=projection --project-dir=./advi_results
  • Example 2. Use the --select=r:markup=<algorithm> option with the --mark-up-loops action option in a separate step to select loops for all analysis executed after this command.

    advisor --collect=survey --project-dir=./advi_results --static-instruction-mix -- ./myApplication
    advisor --collect=tripcounts --project-dir=./advi_results --flop --cache-simulation=single  --target-device=xehpg_512xve --stacks --data-transfer=light  -- ./myApplication
    advisor --mark-up-loops --project-dir=./advi_results --select markup=gpu_generic -- ./myApplication
    advisor --collect=dependencies --filter-reductions --loop-call-count-limit=16 --project-dir=./advi_results -- ./myApplication
    advisor --collect=projection --project-dir=./advi_results
NOTE:
Currently, there is no GUI equivalent of the markup strategies. The gpu_generic strategy is used by default.