Purpose
This document describes the BKMs to be used in support of Intel® Cluster Tools including Intel® MPI Library, Intel® Trace Analyzer and Collector (ITAC) and MPI Performance Snapshot (MPS) on Cray clusters [1]. In the second installment we will present the results of testing these BKMs using NAS Parallel benchmarks application, Block Tri-diagonal solver (its multi-zone version NPB3.3.1-MZ) [2].
Intel® MPI in a Cray environment
Cray XC/XT clusters come equipped with the HPC software stack [3] that uses a proprietary ISV Application Acceleration (IAA) network layer [4] that relies on the lower level uGNI/DMAPP netmod. The IAA presents an Infiniband verbs interface to the higher level software such as different MPI implementations. The IAA then transports the data using the Cray Aries or Gemini networks. Intel® MPI Library can access IAA IB verbs interface (also called IBGNI) through Intel® MPI Library standard fabrics: either through DAPL or using OFA fabric to access IBGNI directly [1], e.g.:
export I_MPI_FABRICS=DAPL
Or
export I_MPI_FABRICS=OFA
This approach makes use of Intel® MPI Library on Cray systems transparent thanks to the fact that both Intel® MPI Library and Cray MPI participate in MPICH Binary Compatibility Initiative (ABI) [5]. Beginning with MPT 7.1.3, Cray MPICH supports ABI compatibility with Intel® MPI Library 5.0 (or newer) and ANL MPICH 3.1.1 and newer releases. At this time, Cray supports only the execution of applications built with Intel® or GNU compilers on Cray systems (in addition to Cray compilers) [5].
More recently, an OpenFabrics* Interface (OFI) library relying on Cray uGNI network layer, so-called uGNI OFI provider, has been developed [6, 7]. If the uGNI OFI provider is available on the cluster, it also can be used by Intel® MPI Library instead of DAPL or OFA fabrics:
export I_MPI_FABRICS=OFI
It is often the case that commonly used Intel “mpirun” script or Hydra process manager on Cray systems may conflict with the job scheduler or produce some other side effects. At the National Energy Research Scientific Computing Center (NERSC), the BKM is to use 3rd party Process Management Interface (PMI) library [8]. For example, to use the SLURM PMI:
$export I_MPI_PMI_LIBRARY=/path_to_SLURM_PMI_installation/libpmi.so
$srun -n 32 -c 8 ./mycode.exe
Employing these PMI related settings along with the previously mentioned fabrics settings enables seamless Intel® MPI Library runs on Cray clusters. When in doubt, one can print out the debug information by specifying, e.g., I_MPI_DEBUG=5 runtime knob.
ITAC/MPS tools: working with Intel® MPI and Cray MPI in Cray environment
ITAC and MPS can be used by preloading of LD_PRELOAD=libVT.so or LD_PRELOAD= ibmps_nopapi.so libraries correspondingly [1].
How to use ITAC
By default Cray “cc” compiler wrapper uses a static option. All libraries will be linked statically in this case. Hence we need to link libVT statically as well. To statically link ITAC library, export ITAC environment by running itacvars.sh and compile the program with these additional options
$cc -o test test.c -L$VT_LIB_DIR -lVT $VT_ADD_LIBS
Then run your application as usual and a trace will be collected. Please note that the trace file will be created each time. To use ITAC libraries dynamically with Cray MPI, please compile program with the dynamic libraries, e.g.
$cc –dynamic –o app_name app_name.c
A similar compilation method can be used with Intel® MPI Library
$mpiicc –o app_name app_name.c:
Then one can preload ITAC library
$export LD_PRELOAD=/path_to_ITAC_installation/intel64/slib/libVT.so
(Note that VT_ROOT runtime variable can be used to simplify run scripts). To access OpenMP* regions (in case of hybrid MPI and OpenMP applications), please use the additional knobs
$export INTEL_LIBITTNOTIFY64=/ path_to_ITAC_installation/intel64/slib /libVT.so
$export KMP_FORKJOIN_FRAMES_MODE=0
With these settings in place, one can use SLURM srun to run an application [9], e.g
$srun -n 8 -c 8 ./app_name
As an additional benefit, in case of dynamic linkage, it is possible to comment out LD_PRELOAD to disable tracing in job scripts.
How to use MPS
To get statistics from an application running with the Intel® MPI Library (this functionality is not available in case of Cray MPI) the recommendation is to use the MPS tool. First, please compile your program with Intel® MPI dynamic libraries (MPS doesn’t have a static version)
$mpiicc –o app_name app_name.c
Then preload MPS library (VT_ROOT or MPS_TOOL_DIR recommended to be used in this case)
$export LD_PRELOAD=/path_to_MPS_installation/intel64/slib/libmps_nopapi.so
Analogous to the ITAC case described above, to get OpenMP regions for hybrid codes
$export INTEL_LIBITTNOTIFY64=/path_to_MPS_installation/intel64/slib/libmps_nopapi.so
Also, it is possible to obtain information about OpenMP imbalance – this functionality is only available with the Intel® OpenMP library
$export KMP_FORKJOIN_FRAMES_MODE=3
Finally, running of an application can be accomplished with these lines added to the job script
$export I_MPI_STATS=20
$export I_MPI_STATS_COMPACT=1
$srun –n 8 -c 8 ./app_name
Intel® MPI Library then generates stats.txt file and MPI Performance Snapshot produces app_stat.txt. Running of “mps” tool (supplied with the ITAC distribution) using the following options on stats.txt and app_stat.txt files, would produce an HTML based report that can be opened in any common Web browser
$mps -a -F app_stat.txt stats.txt -g –O out.html
References
-
Intel® performance analysis tools, Intel® Trace Analyzer and Collector (ITAC) and MPI Performance Snapshot (MPS), documentation.
-
NAS Parallel Bechmarks http://www.nas.nasa.gov/publications/npb.html
-
HPC Software Requirements to Support an HPC Cluster Supercomputer. http://www.cray.com/sites/default/files/resources/WP-CCS-Software01-0413.pdf
-
For example, please see Cray online documentation about ISV Application Acceleration (IAA):
http://pubs.cray.com/#/Collaborate/00256453-FA/FA00256447/ISV%20Application%20Acceleration%20(IAA)
-
Cray Support of the MPICH ABI Compatibility Initiative. http://docs.cray.com/books/S-2544-704/S-2544-704.pdf
-
Howard Pritchard and Igor Gorodetsky. A uGNI-Based MPICH2 Nemesis Network Module for Cray XE Computer SYStems. https://cug.org/5-publications/proceedings_attendee_lists/CUG11CD/pages/1-program/final_program/Wednesday/13C-Pritchard-Paper.pdf
-
OpenFabrics Interface for Cray systems. https://github.com/ofi-cray
-
Running Executables Built with Intel® MPI. http://www.nersc.gov/users/computational-systems/cori/running-jobs/example-batch-scripts/#Intel_MPI
-
How to use SLURM* PMI with the Intel® MPI Library for Linux*? https://software.intel.com/en-us/articles/how-to-use-slurm-pmi-with-the-intel-mpi-library-for-Linux