Application Performance Snapshot User Guide for Linux* OS

ID 772048
Date 10/31/2024
Public

Analyzing Applications

Prerequisites

(Optional) Use the following software to get an advanced metric set when running Application Performance Snapshot:

  • Recommended compilers: Intel® C++ Compiler Classic, Intel® oneAPI DPC++/C++ Compiler or Intel® Fortran Compiler Classic and Intel® Fortran Compiler. Although you can use other compilers for this purpose, information about OpenMP* imbalance is available from the Intel OpenMP library only.

Set up Application Performance Snapshot before you begin to analyze applications. In a terminal window, run this command to set the environment variables:

source /opt/intel/oneapi/vtune/latest/apsvars.sh

You can open apsvars.sh with the --help option to see a list of available setup options.

Analyzing Shared Memory Applications

  1. Run a data collection on your application using Application Performance Snapshot:

    $ aps <my app> [<app parameters>]

    where <my app> is the path to your application and <app parameters> are your application parameters.

  2. After the analysis completes, a report appears in the command window. You can also open an HTML report with the same information in a supported browser. The path to the HTML report is included in the command window. For example:

     firefox ./aps_report_01012017_1234.html &

  3. Analyze the data shown in the report. Hover over a metric in the HTML report for more information.

  4. Determine appropriate next steps based on result analysis. Common next steps may include application tuning or using another performance analysis tool for more detailed information, such as Intel® VTune™ Profiler or Intel® Advisor.

Analyzing MPI Applications

  1. Run the following command to collect data about your MPI application:

    $ <mpi launcher> <mpi parameters> aps <my app> [<app parameters>]

    where:

    • <mpi launcher> is an MPI job launcher, such as mpirun, srun, or aprun.

    • <mpi parameters> are the MPI launcher parameters.

      NOTE:

      aps must be the last <mpi launcher> parameter.

    • <my app> is the path to your application.

    • <app parameters> are your application parameters.

    Application Performance Snapshot launches the application and runs the data collection. After the analysis completes, a aps_result_<date> directory is created.

  2. Run this command to complete the analysis:

    $ aps --report aps_result_<date>

    After the analysis completes, a report appears in the command window. You can also open an HTML report with the same information in a web browser.

  3. Analyze the data shown in the report. Hover over a metric in the HTML report for more information.

  4. Use the analysis result to identify your next steps.

    You can continue the analysis by tuning with one of these tools:

    • The mpitune utility
    • Intel® Trace Analyzer and Collector
    • Intel® VTune™ Profiler

To learn more additional profiling capabilities, open Detailed MPI Analysis.

Storage Formats for MPI Analysis

When you use Application Performance Snapshot to analyze MPI applications, you can save results in two formats:

  • Simple Format
  • Compact Format
This table explains the differences between the two formats.

  Simple Format Compact Format

Purpose

Use this format when the number of application ranks is small (<1000) or is limited by only one node.

Use this format when the number of nodes and ranks per node are large enough to slow down report generation in Application Performance Snapshot significantly.

Usage

Default storage format for MPI analyses.

To enable this format explicitly, do one of the following:

  • Use --storage-format=simple in the analysis command.
  • Set this environment variable:APS_STORAGE=simple

Must be enabled explicitly for MPI analyses. To enable this format, do one of the following:

  • Use --storage-format=compact in the analysis command.
  • Set this environment variable:APS_STORAGE=compact

Requirements

None

  • There exists more than one MPI rank per node.
  • A writeable temporary folder is available. Depending on availability, the folder for use is chosen by this priority order:
    • A folder specified by environment variable APS_TMP_FOLDER or using aps ... --tmp-dir=<path>
    • A folder specified by the TMPDIR environment variable
    • A folder specified by the TMP environment variable
  • The /tmp directory

If these requirements are not fully met, the storage format defaults to the simple format.

Action

Saving trace data in this format creates a single trace file in the result directory for each MPI rank / process.

Each MPI rank / process writes its trace to a temporary folder that is local to the node. When all ranks on a node are finalized for the MPI application, one of the ranks merges the trace files from the temporary file system into a single file. This single file is then written to the specified result directory.

Analysis of Results

To analyze performance traces, use aps-report or aps --report switches. You can use these switches to load traces collected using older versions of Application Performance Snapshot, if those traces were collected in the simple format.

To analyze performance traces, use aps-report or aps --report switches. You can use these switches to load traces collected using older versions of Application Performance Snapshot, if those traces were collected in the compact format.

Versions of Application Performance Snapshot older than 2025.0 cannot load traces collected in the compact format.

Analyze Python Applications with MPI

You can use Application Performance Snapshot to run data collections on Python applications written in MPI, such as PyTorch or AI applications. To do this,

  1. Get the path to your MPI library:
    LD_DEBUG=libs mpirun -np 2 aps -c mpi python3 application.py 2> >(grep /libmpi)
         62663:       trying file=/lib/x86_64-linux-gnu/libmpich.so.12
         62664:       trying file=/lib/x86_64-linux-gnu/libmpich.so.12
         62663:     calling init: /lib/x86_64-linux-gnu/libmpich.so.12
         62664:     calling init: /lib/x86_64-linux-gnu/libmpich.so.12 
  2. Specify the path to the MPI library in your LD_PRELOAD command. For example,

    LD_PRELOAD=/lib/x86_64-linux-gnu/libmpich.so.12  mpirun -np 2 aps -c mpi python3 application.py

where application.py is the Python application you want to profile.