I_MPI_ADJUST Family Environment Variables

I_MPI_ADJUST_<opname>

Control collective operation algorithm selection.

Syntax

I_MPI_ADJUST_<opname>="<presetid>[:<conditions>][;<presetid>:<conditions>[...]]"

Arguments

`<presetid>`	Preset identifier
`>= 0`	Set a number to select the desired algorithm. The value 0 uses basic logic of the collective algorithm selection.

`<conditions>`	A comma separated list of conditions. An empty list selects all message sizes and process combinations
`<l>`	Messages of size `<l>`
`<l>-<m>`	Messages of size from `<l>` to `<m>`, inclusive
`<l>@<p>`	Messages of size `<l>` and number of processes `<p>`
`<l>-<m>@<p>-<q>`	Messages of size from `<l>` to `<m>` and number of processes from `<p>` to `<q>`, inclusive

Description

Set this environment variable to select the desired algorithm(s) for the collective operation <opname> under particular conditions. Each collective operation has its own environment variable and algorithms.

Environment Variables, Collective Operations, and Algorithms
Environment Variable	Collective Operation	Algorithms
`I_MPI_ADJUST_ALLGATHER`	`MPI_Allgather`	Recursive doubling Bruck's Ring Topology aware Gatherv + Bcast Knomial
`I_MPI_ADJUST_ALLGATHERV`	`MPI_Allgatherv`	Recursive doubling Bruck's Ring Topology aware Gatherv + Bcast
`I_MPI_ADJUST_ALLREDUCE`	`MPI_Allreduce`	Recursive doubling Rabenseifner's Reduce + Bcast Topology aware Reduce + Bcast Binomial gather + scatter Topology aware binominal gather + scatter Shumilin's ring Ring Knomial Topology aware SHM-based flat Topology aware SHM-based Knomial Topology aware SHM-based Knary
`I_MPI_ADJUST_ALLTOALL`	`MPI_Alltoall`	Bruck's Isend/Irecv + waitall Pair wise exchange Plum's
`I_MPI_ADJUST_ALLTOALLV`	`MPI_Alltoallv`	Isend/Irecv + waitall Plum's
`I_MPI_ADJUST_ALLTOALLW`	`MPI_Alltoallw`	Isend/Irecv + waitall
`I_MPI_ADJUST_BARRIER`	`MPI_Barrier`	Dissemination Recursive doubling Topology aware dissemination Topology aware recursive doubling Binominal gather + scatter Topology aware binominal gather + scatter Topology aware SHM-based flat Topology aware SHM-based Knomial Topology aware SHM-based Knary
`I_MPI_ADJUST_BCAST`	`MPI_Bcast`	Binomial Recursive doubling Ring Topology aware binomial Topology aware recursive doubling Topology aware ring Shumilin's Knomial Topology aware SHM-based flat Topology aware SHM-based Knomial Topology aware SHM-based Knary NUMA aware SHM-based (SSE4.2) NUMA aware SHM-based (AVX2) NUMA aware SHM-based (AVX512)
`I_MPI_ADJUST_EXSCAN`	`MPI_Exscan`	Partial results gathering Partial results gathering regarding layout of processes
`I_MPI_ADJUST_GATHER`	`MPI_Gather`	Binomial Topology aware binomial Shumilin's Binomial with segmentation
`I_MPI_ADJUST_GATHERV`	`MPI_Gatherv`	Linear Topology aware linear Knomial
`I_MPI_ADJUST_REDUCE_SCATTER`	`MPI_Reduce_scatter`	Recursive halving Pair wise exchange Recursive doubling Reduce + Scatterv Topology aware Reduce + Scatterv
`I_MPI_ADJUST_REDUCE`	`MPI_Reduce`	Shumilin's Binomial Topology aware Shumilin's Topology aware binomial Rabenseifner's Topology aware Rabenseifner's Knomial Topology aware SHM-based flat Topology aware SHM-based Knomial Topology aware SHM-based Knary Topology aware SHM-based binomial
`I_MPI_ADJUST_SCAN`	`MPI_Scan`	Partial results gathering Topology aware partial results gathering
`I_MPI_ADJUST_SCATTER`	`MPI_Scatter`	Binomial Topology aware binomial Shumilin's
`I_MPI_ADJUST_SCATTERV`	`MPI_Scatterv`	Linear Topology aware linear
`I_MPI_ADJUST_SENDRECV_REPLACE`	`MPI_Sendrecv_replace`	1. Generic 2. Uniform (with restrictions)
`I_MPI_ADJUST_IALLGATHER`	`MPI_Iallgather`	Recursive doubling Bruck’s Ring
`I_MPI_ADJUST_IALLGATHERV`	`MPI_Iallgatherv`	Recursive doubling Bruck’s Ring
`I_MPI_ADJUST_IALLREDUCE`	`MPI_Iallreduce`	Recursive doubling Rabenseifner’s Reduce + Bcast Ring (patarasuk) Knomial Binomial Reduce scatter allgather SMP Nreduce
`I_MPI_ADJUST_IALLTOALL`	`MPI_Ialltoall`	Bruck’s Isend/Irecv + Waitall Pairwise exchange
`I_MPI_ADJUST_IALLTOALLV`	`MPI_Ialltoallv`	Isend/Irecv + Waitall
`I_MPI_ADJUST_IALLTOALLW`	`MPI_Ialltoallw`	Isend/Irecv + Waitall
`I_MPI_ADJUST_IBARRIER`	`MPI_Ibarrier`	Dissemination
`I_MPI_ADJUST_IBCAST`	`MPI_Ibcast`	Binomial Recursive doubling Ring Knomial SMP Tree knominal Tree kary
`I_MPI_ADJUST_IEXSCAN`	`MPI_Iexscan`	Recursive doubling SMP
`I_MPI_ADJUST_IGATHER`	`MPI_Igather`	Binomial Knomial
`I_MPI_ADJUST_IGATHERV`	`MPI_Igatherv`	Linear Linear ssend
`I_MPI_ADJUST_IREDUCE_SCATTER`	`MPI_Ireduce_scatter`	Recursive halving Pairwise Recursive doubling
`I_MPI_ADJUST_IREDUCE`	`MPI_Ireduce`	Rabenseifner’s Binomial Knomial
`I_MPI_ADJUST_ISCAN`	`MPI_Iscan`	Recursive Doubling SMP
`I_MPI_ADJUST_ISCATTER`	`MPI_Iscatter`	Binomial Knomial
`I_MPI_ADJUST_ISCATTERV`	`MPI_Iscatterv`	Linear

The message size calculation rules for the collective operations are described in the table. In the following table, "n/a" means that the corresponding interval <l>-<m> should be omitted.

NOTE:

The I_MPI_ADJUST_SENDRECV_REPLACE=2 ("Uniform") algorithm can be used only in the case when datatype and objects count are the same across all ranks.

To get the maximum number (range) of presets available for each collective operation, use the impi_info command:

$ impi_info -v I_MPI_ADJUST_ALLREDUCE
I_MPI_ADJUST_ALLREDUCE
  MPI Datatype:
    MPI_CHAR
  Description:
    Control selection of MPI_Allreduce algorithm presets.
    Arguments
    <presetid> - Preset identifier
    range: 0-27

Message Collective Functions
Collective Function	Message Size Formula
`MPI_Allgather`	`recv_count*recv_type_size`
`MPI_Allgatherv`	`total_recv_count*recv_type_size`
`MPI_Allreduce`	`count*type_size`
`MPI_Alltoall`	`send_count*send_type_size`
`MPI_Alltoallv`	n/a
`MPI_Alltoallw`	n/a
`MPI_Barrier`	n/a
`MPI_Bcast`	`count*type_size`
`MPI_Exscan`	`count*type_size`
`MPI_Gather`	`recv_countrecv_type_size` if `MPI_IN_PLACE` is used, otherwise `send_countsend_type_size`
`MPI_Gatherv`	n/a
`MPI_Reduce_scatter`	`total_recv_count*type_size`
`MPI_Reduce`	`count*type_size`
`MPI_Scan`	`count*type_size`
`MPI_Scatter`	`send_countsend_type_size` if `MPI_IN_PLACE` is used, otherwise `recv_countrecv_type_size`
`MPI_Scatterv`	n/a

Examples

Use the following settings to select the second algorithm for MPI_Reduce operation: I_MPI_ADJUST_REDUCE=2

Use the following settings to define the algorithms for MPI_Reduce_scatter operation: I_MPI_ADJUST_REDUCE_SCATTER="4:0-100,5001-10000;1:101-3200;2:3201-5000;3"

In this case. algorithm 4 is used for the message sizes between 0 and 100 bytes and from 5001 and 10000 bytes, algorithm 1 is used for the message sizes between 101 and 3200 bytes, algorithm 2 is used for the message sizes between 3201 and 5000 bytes, and algorithm 3 is used for all other messages.

I_MPI_ADJUST_<opname>_LIST

Syntax

I_MPI_ADJUST_<opname>_LIST=<presetid1>[-<presetid2>][,<presetid3>][,<presetid4>-<presetid5>]

Description

Set this environment variable to specify the set of algorithms to be considered by the Intel MPI runtime for a specified <opname>. This variable is useful in autotuning scenarios, as well as tuning scenarios where users would like to select a certain subset of algorithms.

NOTE:

Setting an empty string disables autotuning for the <opname> collective.

I_MPI_COLL_INTRANODE

Syntax

I_MPI_COLL_INTRANODE=<mode>

Arguments

<mode>	Intranode collectives type
pt2pt	Use only point-to-point communication-based collectives
shm	Enables shared memory collectives. This is the default value

Description

Set this environment variable to switch intranode communication type for collective operations. If there is large set of communicators, you can switch off the SHM-collectives to avoid memory overconsumption.

I_MPI_COLL_INTRANODE_SHM_THRESHOLD

Syntax

I_MPI_COLL_INTRANODE_SHM_THRESHOLD=<nbytes>

Arguments

<nbytes>	Define the maximal data block size processed by shared memory collectives
> 0	Use the specified size. The default value is 16384 bytes.

Description

Set this environment variable to define the size of shared memory area available for each rank for data placement. Messages greater than this value will not be processed by SHM-based collective operation, but will be processed by point-to-point based collective operation. The value must be a multiple of 4096.

I_MPI_COLL_EXTERNAL

Syntax

I_MPI_COLL_EXTERNAL=<arg>

Arguments

<arg>	Description
enable \| yes \| on \| 1	Enable the external collective operations functionality using available collectives libraries.
disable \| no \| off \| 0	Disable the external collective operations functionality. This is the default value.
hcoll	Enable the external collective operations functionality using HCOLL library.

Description

Set this environment variable to enable external collective operations. For reaching better performance, use an autotuner after enabling I_MPI_COLL_EXTERNAL. This process gets the optimal collectives settings.

To force external collective operations usage, use the following I_MPI_ADJUST_<opname> values: I_MPI_ADJUST_ALLREDUCE=24, I_MPI_ADJUST_BARRIER=11, I_MPI_ADJUST_BCAST=16, I_MPI_ADJUST_REDUCE=13, I_MPI_ADJUST_ALLGATHER=6, I_MPI_ADJUST_ALLTOALL=5, I_MPI_ADJUST_ALLTOALLV=5, I_MPI_ADJUST_SCAN=3, I_MPI_ADJUST_EXSCAN=3, I_MPI_ADJUST_GATHER=5, I_MPI_ADJUST_GATHERV=4, I_MPI_ADJUST_SCATTER=5, I_MPI_ADJUST_SCATTERV=4, I_MPI_ADJUST_ALLGATHERV=5, I_MPI_ADJUST_ALLTOALLW=2, I_MPI_ADJUST_REDUCE_SCATTER=6, I_MPI_ADJUST_REDUCE_SCATTER_BLOCK=4, I_MPI_ADJUST_IALLGATHER=5, I_MPI_ADJUST_IALLGATHERV=5, I_MPI_ADJUST_IGATHERV=3, I_MPI_ADJUST_IALLREDUCE=9, I_MPI_ADJUST_IALLTOALLV=2, I_MPI_ADJUST_IBARRIER=2, I_MPI_ADJUST_IBCAST=5, I_MPI_ADJUST_IREDUCE=4.

For more information on HCOLL tuning, refer to NVIDIA* documentation.

I_MPI_COLL_DIRECT

Syntax

I_MPI_COLL_DIRECT=<arg>

Arguments

<arg>	Description
on	Enable direct collectives. This is the default value.
off	Disable direct collectives.

Description

Set this environment variable to control direct collectives usage. Disable this variable to eliminate OFI* usage for intra-node communications in case of shm:ofi fabric.

I_MPI_CBWR

Control reproducibility of floating-point operations results across different platforms, networks, and topologies in case of the same number of processes.

Syntax

I_MPI_CBWR=<arg>

Arguments

`<arg>`	CBWR compatibility mode	Description
`0`	None	Do not use CBWR in a library-wide mode. CNR-safe communicators may be created with `MPI_Comm_dup_with_info` explicitly. This is the default value.
`1`	Weak mode	Disable topology aware collectives. The result of a collective operation does not depend on the rank placement. The mode guarantees results reproducibility across different runs on the same cluster (independent of the rank placement).
`2`	Strict mode	Disable topology aware collectives, ignore CPU architecture, and interconnect during algorithm selection. The mode guarantees results reproducibility across different runs on different clusters (independent of the rank placement, CPU architecture, and interconnection)

Description

Conditional Numerical Reproducibility (CNR) provides controls for obtaining reproducible floating-point results on collectives operations. With this feature, Intel MPI collective operations are designed to return the same floating-point results from run to run in case of the same number of MPI ranks.

Control this feature with the I_MPI_CBWR environment variable in a library-wide manner, where all collectives on all communicators are guaranteed to have reproducible results. To control the floating-point operations reproducibility in a more precise and per-communicator way, pass the {“I_MPI_CBWR”, “yes”} key-value pair to the MPI_Comm_dup_with_info call.

NOTE:

Setting the I_MPI_CBWR in a library-wide mode using the environment variable leads to performance penalty.

CNR-safe communicators created using MPI_Comm_dup_with_info always work in the strict mode. For example:

MPI_Info hint;
MPI_Comm cbwr_safe_world, cbwr_safe_copy;
MPI_Info_create(&hint);
MPI_Info_set(hint, “I_MPI_CBWR”, “yes”);
MPI_Comm_dup_with_info(MPI_COMM_WORLD, hint, & cbwr_safe_world);
MPI_Comm_dup(cbwr_safe_world, & cbwr_safe_copy);

In the example above, both cbwr_safe_world and cbwr_safe_copy are CNR-safe. Use cbwr_safe_world and its duplicates to get reproducible results for critical operations.

Note that MPI_COMM_WORLD itself may be used for performance-critical operations without reproducibility limitations.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® MPI Library Developer Reference for Linux* OS