Intel® VTune™ Profiler

User Guide

ID 766319
Date 12/20/2024
Public
Document Table of Contents

OpenSHMEM* Code Analysis with Fabric Profiler

On Linux systems, analyze the runtime behavior of OpenSHMEM or Intel® SHMEM applications with Fabric Profiler (preview feature).

NOTE:

This is a PREVIEW FEATURE. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases.

Fabric Profiler is a performance analysis application that you use to profile OpenSHMEM or Intel® SHMEM code. The application collects and displays diagnostics, trace files, and runtime information for these types of code. Fabric Profiler runs on Linux* platforms only.

Get Started with Fabric Profiler

The Fabric Profiler application has two components:

  • The Data collector monitors the behavior of the application and network when the OpenSHMEM application is running.

  • The Analyzer is a collection of tools that executes after the OpenSHMEM application has finished running. These tools display profiling results with interactive features to help you explore communication-centric behaviors.

Fabric Profiler operates in two modes:

  • Use the OpenSHMEM mode to profile OpenSHMEM (SoS)-based applications.

  • Use the Intel® SHMEM mode to profile OpenSHMEM applications along with a subset of the Intel® SHMEM API. If you use any subset of the Intel® SHMEM API, you must use Fabric Profiler in this mode.

Access Fabric Profiler

The Fabric Profiler package is bundled with the installation package of Intel® VTune™ Profiler. Access Fabric Profiler in the vtune\<vtune_profiler_version>\fabric_profiler directory.

In addition to the Fabric Profiler application, this directory contains:

  • Product documentation
  • Examples
  • Sample trace files

Install Fabric Profiler

To install Fabric Profiler, you set up the data collector and the analyzer components.

Set Up the Data Collector

The Fabric Profiler data collector is implemented as a library that intercepts the OpenSHMEM calls and/or Intel® SHMEM host calls of the application. The data collector also monitors network activity. The data collector populates binary trace files with this information.

Prerequisites:

  • Unzip the data collector package.
  • Set the ESP_ROOT environment variable to point to the location where you unzipped the data collector.
  • Install these libraries:
    • Fabric Profiler uses PAPI to gather system metrics at runtime. To add PAPI to your environment, run module load papi. You can also download PAPI from https://icl.utk.edu/papi/ and build it.

    • The Libfabric library helps to obtain access to the fabric of the OpenSHMEM code. This library should be present in the cluster with CXI support. To do this, run module load libfabric. For more information, see the Libfabric Programmer's Manual.

    • To track portions of the Intel® SHMEM API, Fabric Profiler uses Intel® Pin. Download a version of Intel® Pin that is 3.28 or newer.

Set Up the Analyzer

The Fabric Profiler analyzer is a collection of MATLAB* programs that run in the MATLAB runtime environment. These programs read trace files and display results.

Prerequisite:

To set up the analyzer, you must have the MATLAB Runtime Environment. Download the environment from https://www.mathworks.com/products/compiler/mcr.html. Select a version that is R2021b(9.11) or newer.

You can find the Fabric Profiler analyzer executable (fpro) in $ESP_ROOT/bin/analyzer/fpro_analyzer.

Fabric Profiler Workflow

In the Fabric Profiler workflow, you perform these steps:

  1. Build and run an application using the data collector.
  2. Generate trace files.
  3. View trace files using the analyzer.

Step 1: Build and Run an Application

Once you have installed Fabric Profiler on a Linux machine, complete these steps to build and run an application. This procedure describes how you build an OpenSHMEM as well as an Intel® SHMEM application.

  1. Define Fabric Profiler regions in the source code. This way, you can see named regions in the analyzer displays for easier analysis.

    1. Include the header file esp.h.
    2. Mark regions of interest:
      esp_enter("<region_A name>");     
               esp_exit("<region_A name>");
      Make sure to use the same name of the region in the enter and exit calls.
    3. Rebuild the application.
    NOTE:
    You cannot nest or interleave regions.
  2. Build an application with Fabric Profiler instrumentation.

    Make sure that you have set the required environment variables. To do this, edit setMyVars.sh as needed.

    • OpenSHMEM Applications:

      Fabric Profiler uses LD_PRELOAD at runtime to link in the data collector library before the SHMEM library. If you did not add Fabric Profiler regions to your source code, you do not need to rebuild your application.

      For example, to build the $ESP_ROOT/examples/SHMEM/sanity application, run make on the makefile of the sanity application.

    • Intel® SHMEM Applications:

      For Intel® SHMEM applications, Fabric Profiler uses a combination of Intel® Pin and LD_PRELOAD at runtime to link in the data collector library before the SHMEM library. If you did not add Fabric Profiler regions to your source code, you do not need to rebuild your application.

  3. Run the application.

    Use the $ESP_ROOT/bin/collector/fpro script. This script adds the data collector library to the LD_PRELOAD variable. Since the data collector library uses the PAPI library. You may need to run module load papi, or add PAPI to your library paths.

    • OpenSHMEM Applications:

      For example, to run fpro on the sanity application,

      1. Go to $ESP_ROOT/bin/collector/ directory.

      2. Run

        ./fpro -j pbs -r "R1234" -n 1 -p 2 -l 1 $ESP_ROOT/examples/SHMEM/sanity/sanity
        The -l 1 indicates that sanity is an OpenSHMEM application. Make sure to use your own PBS reservation number.

    • Intel® SHMEM Applications:

      For example, to run fpro on the Intel SHMEM sycl_sanity application,

      1. Go to $ESP_ROOT/bin/collector directory.

      2. Run

        ./fpro -j pbs -r "R1234" -n 1 -p 2 -l 0 $ESP_ROOT/examples/iSHMEM/sycl_sanity/sycl_sanity

        The -l 0 indicates that sycl_sanity is an Intel® SHMEM application.

Step 2: Generate Trace Files

The data collector monitors network activity and the execution of your application. Once the execution completes, the data collector writes output to the trace files. This phase can add an additional 10% to your wall time.

To generate trace files,

  1. Check the output of the application. Make sure that the code instrumentation by the data collector was successful. To do this,

    1. Ensure that the ESP_VERBOSITY_LEVEL environment variable is greater than 0.

    2. Call shmem_init (OpenSHMEM applications) or ishmem_init (Intel®SHMEM applications). The start banner of Fabric Profiler displays.

    3. Call shmem_finalize (OpenSHMEM applications) or ishmem_finalize (Intel®SHMEM applications). The stop banner of Fabric Profiler displays.

  2. The Fpro script merges the trace files. This script uses the following tools in sequence:

    1. mergeFuncFile
    2. mergeProfileFile
    3. mergePutFile
  3. Copy the merged trace files from the root level of the traces directory to the machine where you have installed the analyzer.

You can now use the analyzer to view trace files.

Step 3 : View Trace Files using the Analyzer Suite

Fabric Profiler provides a set of five different analyzers to help you read trace files. In the Fabric Profiler application, you can find all of the analyzers in the $ESP_ROOT/bin/analyzer directory. The analyzers are:

  • ba - Barrier analyzer

  • fbla - Fabric backlog analyzer

  • la - Fabric latency analyzer

  • msa - Message straggler analyzer

  • r - An HTML report that contains a summary of all analyzer results.

Access the Analyzer Set

Access all of these analyzers through the fpro_analyzer executable in the $ESP_ROOT/bin/analyzer/ directory. You can run the executable from the command prompt.

Command Line Options

To access the help menu, run:

$ ./fpro_analyzer --help
Command Purpose
fpro_analyzer --help
Display usage
fpro_analyzer --version | -v | -V
Display information about version and build
fpro_analyzer - start
Start the fabric backlog analyzer
fpro_analyzer {ba|fbla|la|msa|r}
Start with ba, fbla, la, msa, or r analyzers
fpro_analyzer <trace file>
Open fabric backlog analyzer with the specified trace file
fpro_analyzer {ba|fbla|la|msa|r} <trace file>
Open the selected analyzer with the specified trace file
fpro_analyzer {ba|fbla|la|msa|r} <fabric select> <trace file>

Open the selected analyzer and selected fabric with the specified trace file.

For the fabric, select Cray-Slingshot11 (or 1) or Cray-Aries (or 2).

You can specify the trace file by providing the full path to the trace or the directory that contains traces. For example, these are both valid commands:

fpro_analyzer fbla /path/to/traces/
fpro_analyzer fbla /path/to/traces/my_trace.uc1.put

Here are some more examples that use the options described in this section:

  • Run the fpro_analyzer executable and start the function backlog analyzer:
    $ ./fpro_analyzer
  • Run the fpro_analyzer executable and start the message straggler analyzer:
    $ ./fpro_analyzer msa
  • Run the fpro_analyzer executable and start the function latency analyzer to review traces with the Cray-Slingshot11 fabric:
    $ ./fpro_analyzer la Cray-Slingshot11 /path/to/traces/
Contents of Trace Files

When your application calls shmem_finalize or ishmem_finalize, the data collector writes five trace files which contain information about application behavior.

Trace File Format Contents
{trace-file-prefix}.uc1.func

Binary

Information about every profiled SHMEM function call. Each process writes out a separate function trace file. Once the job completes, the individual function trace files are merged into a single file with the $ESP_ROOT/bin/collector/mergeFuncFile script. The analyzers require this merged file.

{trace-file-prefix}.uc1.hfi

Binary

When the SHMEM application is running, Fabric Profiler monitors send and receive counters on the host fabric interface card. The HFI file contains these time-stamped counter values.

{trace-file-prefix}.uc1.profile

Binary

When the SHMEM application is running, Fabric Profiler monitors system performance counters and gathers system information. This data is written into the profile files. Each process writes out a separate profile file. When the job completes, the individual profile trace files are merged into a single file with the $ESP_ROOT/bin/collector/mergeProfileFile binary. The analyzers require this merged file.

{trace-file-prefix}.uc1.put

Binary

Fabric Profiler monitors the amount of data injected into the network with each shmem_put call and the destination node for each put operation. The put file contains these values.

When the job is complete, the individual put trace files are merged into a single file with the $ESP_ROOT/bin/collector/mergePutFile binary.

{trace-file-prefix}.uc1.ev.txt

Text

The environment file is a list of all environment variables defined at SHMEM application run-time.

Types of Analyzers
Analyzer Type Name Purpose Available Operations
ba

Barrier Trace Analyzer

Reads the function trace file and displays barrier wait times for each barrier call in the source code for each PE.

  • Take these measurements:

    • PE wait time
    • PE arrival time
    • Node wait density
    • PE percent Late
    • PE Outlier Late
  • Vary the threshold.
  • Restrict your results to a specific lexical occurrence (a particular source code line containing a barrier)
fbla

Fabric Backlog Analyzer

Reads the put trace file and correlates that with the HFI trace file to visualize fabric backlog at any point in time.

  • Select Show Region Bounds and choose regions of interest. If the SHMEM code defined code regions, the temporal regions are highlighted on the graph of network backlog against time.

  • Select an individual node to display its associated backlog.

  • View injection and/or ejection backlog

    • Injection requested: Data that is sent to a different node by the application

    • Injection actual: Data that is actually sent into the network by the Host Fabric Interface (HFI)

    • Ejection requested: Data that is received by the current node

    • Ejection actual: Data actually received from the network according to HFI

  • Zoom and pan to bring areas into focus.

  • Try offset adjustment modes.

  • Switch between toggle and rate displays.

  • Use the data cursor. Click on the widget first. Next click anywhere on the plot to see data values for that point.

la

Fabric (latency) Trace Analyzer

Reads the function trace file and displays fabric latency for all instrumented SHMEM calls. Trace files that contain ~100,000s of function calls can take several minutes to complete. The default display shows composite PE wait time for all calls at each point in time.

  • Select individual function calls to display latency hot spots for each call.

  • If the application defined Fabric Profiler regions, click View Regions. Choose regions to highlight temporal spans on the graph which represent those regions of code.

  • Switch to the communications matrix. This visualizes the volume of data sent from each Processing Element (PE) to every other PE.

  • Use the zoom, pan and data cursor widgets (under File and Help menus) to drill into the display data.

  • Experiment with the threshold controls for frequency, high value, and low value.

msa

Message Straggler Analyzer

Reads the function trace file and correlates the activity in the trace file with network activity in the HFI trace file.

 
r

Analyzer Report

A non-interactive report that gathers information about a SHMEM application run and displays it in HTML format. The report can take several minutes to be completed. When completed, the HTML report is saved in the same location as the profile trace file, with a matching file name.

Use the File menu to select the profile trace file for a particular application run.