Intel® MPI Library Developer Guide for Linux* OS

ID 768728
Date 10/31/2024
Public
Document Table of Contents

OFI* Providers Support

Intel® MPI Library supports mlx, tcp, psm2 ,psm3, verbs, and RxM OFI* providers . Each OFI provider is built as a separate dynamic library to ensure that a single libfabric* library can be run on top of different network adapters.

Additionally, Intel MPI Library supports the efa provider, which is not a part of the Intel® MPI Library package and supplied by AWS EFA installer. Please see the efa section below for more details.

NOTE:
Use the environment variable FI_PROVIDER to select a provider. Set the FI_PROVIDER_PATH environment variable to specify the path to provider libraries.

To get a full list of environment variables available for configuring OFI, run the following command:

$ fi_info  -e

mlx

The MLX provider runs over the UCX that is currently available for the Mellanox InfiniBand* hardware.

For more information on using MLX with InfiniBand, see Improve Performance and Stability with Intel MPI Library on InfiniBand.

The following runtime parameters can be used:

Name Description
FI_MLX_INJECT_LIMIT Sets the control for maximal tinject/inject message sizes.
FI_MLX_ENABLE_SPAWN Enables dynamic processes support.
FI_MLX_TLS Specifies the transports available for the MLX provider.

tcp

The TCP provider is a general purpose provider for the Intel MPI Library that can be used on any system that supports TCP sockets to implement the libfabric API. The provider lets you run the Intel MPI Library application over regular Ethernet, in a cloud environment that has no specific fast interconnect (e.g., GCP, Ethernet empowered Azure*, and AWS* instances) or using IPoIB.

The following runtime parameters can be used:

Name Description
FI_TCP_IFACE Specifies a particular network interface.
FI_TCP_PORT_LOW_RANGE

FI_TCP_PORT_HIGH_RANGE

Sets the range of ports to be used by the TCP provider for its passive endpoint creation. This is useful when only a range of ports are allowed by the firewall for TCP connections.

psm2

The PSM2 provider runs over the PSM 2.x interface supported by the Intel® Omni-Path Fabric. PSM 2.x has all the PSM 1.x features, plus a set of new functions with enhanced capabilities. Since PSM 1.x and PSM 2.x are not application binary interface (ABI) compatible, the PSM2 provider works with PSM 2.x only and does not support Intel® True Scale Fabric.

The following runtime parameters can be used:

Name Description
FI_PSM2_INJECT_SIZE Define the maximum message size allowed for fi_inject and fi_tinject calls.  The default value is 64.
FI_PSM2_LAZY_CONN

Control the connection mode established between PSM2 endpoints that OFI endpoints are built on top of. When set to 0 (eager connection mode), connections are established when addresses are inserted into the address vector. When set to 1 (lazy connection mode), connections are established when addresses are used the first time in communication.

NOTE:
Lazy connection mode may reduce the start-up time on large systems at the expense of higher data path overhead.

psm3

The Intel® Performance Scaled Messaging 3 (Intel® PSM3) provider is a high-performance protocol that provides a low-level communication interface for the Intel® Ethernet Fabric Suite family of products. PSM3 enables mechanisms that are necessary for implementing higher level communication interfaces in parallel environments such as MPI and AI training frameworks.

The Intel® PSM3 interface differs from the Intel® Omni-Path PSM2 interface in the following ways:

  • PSM3 includes new features and optimizations for Intel® Ethernet Fabric hardware and processors.
  • PSM3 supports only the Open Fabrics Interface (OFI, aka Libfabric). The PSM API is no longer exposed.
  • PSM3 includes performance improvements specific to the Intel® Ethernet Fabric Suite.
  • PSM3 supports standard Ethernet networks and leverages standard RoCEv2 protocols as implemented by the Intel® Ethernet Fabric Suite NICs.

The following runtime parameters can be used:

Name Description
PSM3_NIC

Specifies the Device Unit number or the RDMA device name (as shown in ibv_devices). The specified unit number is relative to the alphabetical sort of the RDMA device names. Unit 0 is the first name.

Default: PSM3_NIC=any.

PSM3_RDMA Controls the use of RC QPs and RDMA. Options:
  • 0 - Use only UD QPs.
  • 1 - Use Rendezvous module for node-to-node level RC QPs for Rendezvous.
  • 2 - Use user space RC QPs for Rendezvous.
  • 3 - Use user space RC QPs for eager and Rendezvous.

Default: 0

PSM3_ALLOW_ROUTERS Indicates whether endpoints with different IP subnets should be considered accessible.
  • 0 - Consider endpoints with different IPv4 subnets inaccessible.
  • 1 - Consider all endpoints accessible, even if they have different IPv4 subnets.

Default: 0

PSM3_IDENTIFY Enables verbose output of the PSM3 software version identification, including library location, build date, Rendezvous module API version (if the Rendezvous module is used), process rank IDs, total ranks per node, total ranks in the job, and NIC selected. Options:
  • 0 - disabled. No output.
  • 1 - enabled on all processes. •
  • 1: - enabled only on rank 0 (abbreviation for PSM3_IDENTIFY=1:*:rank0 ).
  • 1:pattern - enabled only on processes whose label matches the extended glob pattern.

Default: 0

For the full list of controls and details, refer to the Intel® Ethernet Fabric Suite Host Software User Guide.

For more details about Intel® Ethernet Fabric Suite and PSM3 provider, see Intel® Ethernet Fabric Suite documentation.

verbs

The verbs provider enables applications using OFI to be run over any verbs hardware (InfiniBand*, iWarp*, and so on). It uses the Linux Verbs API for network transport and provides a translation of OFI calls to appropriate verbs API calls. It uses librdmacm for communication management and libibverbs for other control and data transfer operations.

The verbs provider uses RxM utility provider to emulate FI_EP_RDM endpoint over verbs FI_EP_MSG endpoint by default. The verbs provider with FI_EP_RDM endpoint can be used instead of RxM by setting the FI_PROVIDER=^ofi_rxm runtime parameter.

The following runtime parameters can be used:

Name Description
FI_VERBS_INLINE_SIZE Define the maximum message size allowed for fi_inject and fi_tinject calls.  The default value is 64.
FI_VERBS_IFACE Define the prefix or the full name of the network interface associated with the verbs device. The default value is ib.
FI_VERBS_MR_CACHE_ENABLE

Enable Memory Registration caching. The default value is 0. Set this environment variable to enable the memory registration cache.

NOTE:
Cache usage substantially improves performance, but may lead to correctness issues.

Dependencies

The verbs provider requires libibverbs (v1.1.8 or newer) and librdmacm (v1.0.16 or newer). If you are compiling libfabric from source and want to enable verbs support, it is essential to have the matching header files for the above two libraries. If the libraries and header files are not in default paths, specify them in the CFLAGS, LDFLAGS, and LD_LIBRARY_PATH environment variables.

RxM

The RxM (RDM over MSG) provider (ofi_rxm) is a utility provider that supports FI_EP_RDM endpoint emulated over FI_EP_MSG endpoint of the core provider.

The RxM provider requires the core provider to support the following features:

  • MSG endpoints (FI_EP_MSG)
  • FI_MSG transport (to support data transfers)
  • FI_RMA transport (to support rendezvous protocol for large messages and RMA transfers)
  • FI_OPT_CM_DATA_SIZE of at least 24 bytes

The following runtime parameters can be used:

Name Description
FI_OFI_RXM_BUFFER_SIZE Define the transmit buffer size/inject size. Messages of smaller size are transmitted via an eager protocol and those above would be transmitted via a rendezvous protocol. Transmitted data is copied up to the specified size. By default, the size is 16k.  
FI_OFI_RXM_SAR_LIMIT Сontrol the RxM SAR (Segmentation аnd Reassembly) protocol. Messages of greater size are transmitted via rendezvous protocol.
FI_OFI_RXM_USE_SRX

Control the RxM receive path. If the variable is set to 1, the RxM uses Shared Receive Context of the core provider. The default value is 0.

NOTE:
Setting this variable to 1 improves memory consumption, but may increase small message latency as a side-effect.

efa

The efa provider enables applications to be run over AWS EFA hardware (Elastic Fabric Adapter).

Please refer to Amazon EC2 User Guide for OFI and Intel® MPI installation on EFA-enabled instances.