Intel® MPI Library Developer Guide for Linux* OS

ID 768728
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

MPI Tuning

Intel® MPI Library provides the following tuning utilities:

Autotuner

Autotuner is the recommended utility for the application-specific tuning.

If an application is spending significant time in MPI collective operations, autotuning might improve its performance. Autotuner is easy-to-use, and its overhead is very low.

The autotuner utility tuning scope is the I_MPI_ADJUST_<opname> family of environment variables, which are MPI collective operation algorithms. Autotuner limits tuning to the current cluster configuration (fabric, number of ranks, number of ranks per node). It works while an application is running, so performance could be potentially improved just by enabling the autotuner. It is also possible to generate new tuning file with MPI collective operations adjusted to application needs, and this file can be further passed to the I_MPI_TUNING_BIN variable.

mpitune_fast

mpitune_fast is the recommended easy-to-use utility for the cluster-wide tuning. It uses the autotuner internally, so its search space is also collective operation algorithms. mpitune_fast iteratively launches IMB with options provided (e.g., scale of tuning and collective operations to tune) and generates a file with tuning parameters for cluster configuration. This file could be provided to the Intel MPI Library with the I_MPI_TUNING_BIN environment variable. mpitune_fast supports Slurm* and LSF* workloads managers and should automatically detect job-allocated hosts to use. mpitune_fast can also perform validation of new tuning files and generate CSV files with performance results, so you do not have to validate tuning manually.

mpitune_fast finds the optimal I_MPI_ADJUST_<opname> after running IMB for that collective operation once.

Differences between the tuning utilities:

Parameter Autotuner mpitune_fast
Low tuning overhead + +
Ease of use + +
Application tuning + -
Microbenchmark tuning + +
Tuning beyond collective operations - -