Symptom
Intel® MPI Library with versions prior to 2021.5 show an immediate deadlock after startup.
Root Cause
In Red Hat Enterprise Linux 8.6 (and probably other recent Linux releases with comparable kernel versions) characteristics of pseudo files like “/sys/devices/system/node/node0/cpulist” have changed. Intel MPI relies on results from the ftell utility to allocate read buffer for sysfs files. Due to changes in the kernel, ftell cannot be used for this task anymore.
Intel MPI 2021.5 and 2021.6 don’t show the deadlock, but a complete fix is implemented in Intel MPI 2021.7. There is a potential risk with Intel MPI 2021.5 and 2021.6. In case of issues the workaround below may be applied to these versions.
Workaround
Customers who need to use versions of Intel MPI before 2021.5 may use a small workaround that overwrites the default c-lib ftell function with an adapted version that computes the file size of pseudo files in a backward compatible way. The workaround package contains the adapted ftell version and a Makefile for generating the “impi_patch.so” library. A short Readme.txt is also provided containing the steps to generate and apply the library. Once the library is generated, it can be applied by setting LD_PRELOAD inside the shell where mpirun will be executed.
$ export LD_PRELOAD=<path to libraray>/impi_patch.so
This will overwrite the ftell function with a new function that will compute the file size in a different way and provide an appropriate size value. For all regular files which show file size > 0 the default c-lib ftell is used. Since the patch is only necessary for the bootstrap of Intel MPI, we may revert the LD_PRELOAD changes for the actual MPI application with:
$ export I_MPI_GTOOL=”env LD_PRELOAD= :all”
This might look strange, but the first export of LD_PRELOAD will just act on the startup of Intel MPI and its proxy program. The I_MPI_GTOOL statement will reset the LD_PRELOAD to use the default c-lib ftell function for the actual MPI program. This can be tested using the test.c program contained in the workaround package (see Readme.txt).
Download the workaround patch.
Caveat
There may be nonstandard startup scenarios where the LD_PRELOAD variable is not propagated to all nodes used by the application. This happens when using flags like
“--launcher=ssh” or “--rsh=ssh”. In these situations it may help to put the export LD_PRELOAD statement into .bashrc.