Introduction
Getting started with either MPI or OpenMP* can be enough of a challenge, but mixing the two together adds another layer of complexity and considerations. This article is intended to help with those first few steps.
Requirements
- A compiler that supports C, C++, FORTRAN77, and/or Fortran90/95 and has an implementation of OpenMP*
- An MPI implementation containing runtime libraries and executables and development libraries and headers
In order to meet these requirements, I will be using the Intel® Parallel Studio XE Cluster Edition, which meets all of the requirements. I plan to update this with compilation instructions for other compilers and MPI implementations as I am able to test them.
Basic Considerations
When programming with either MPI or OpenMP* individually, there are some basics to consider. Where is a particular variable being stored? Which process/thread can access it? How do I get it to another process/thread? All of these concerns are magnified when using both together. What happens when a call is made to MPI_Send within a threaded region? Should only the master thread make MPI calls? Should they be completely segregated from each other? If I am sending everything to the master thread of the root process for output control, how do I get each thread of other processes to efficiently and effectively communicate with the root/master thread?
So, how should I actually go about doing this? Well, there is no easy answer. A method that gives great performance in one program could very well cause excessive slowdown for another. The sample code I have included is simply a Hello World program. For the purpose of illustration, I have designed these programs such that each thread of the root process outputs from an OpenMP* critical section. Once this section is completed, the master moves on to receive (via MPI) information from every thread of every other process. The root/master handles all output at this point.
Code
Let's take a look at a few key lines from the code here.
int required=MPI_THREAD_SERIALIZED; // Required level of MPI threading support
/* Each thread will call MPI routines, but these calls will be coordinated
to occur only one at a time within a process. */
int provided; // Provided level of MPI threading support
...
MPI_Init_thread(&argc, &argv, required, &provided);
Instead of MPI_Init, the initialization should be done through MPI_Init_thread when in a threaded program. There are two additional arguments needed, required
and provided
. First, required
is used to specify what level of threading support is needed. The actual level of threading support provided by the implementation is returned in provided
. These can be used to check that the implementation provides sufficient threading support for what is needed. The programmer decides what happens if insufficient support is provided. The values for the support levels are from the following values (from the MPI 2.2 Standard):
- MPI_THREAD_SINGLE Only one thread will execute.
- MPI_THREAD_FUNNELED The process may be multi-threaded, but the application must ensure that only the main thread makes MPI calls.
- MPI_THREAD_SERIALIZED The process may be multi-threaded, and multiple threads may make MPI calls, but only one at a time.
- MPI_THREAD_MULTIPLE Multiple threads may call MPI, with no restrictions.
// Check the threading support level
if (provided < required)
{
// Insufficient support, degrade to 1 thread and warn the user
if (rank == 0)
{
cout << "Warning: This MPI implementation provides insufficient" << " threading support." << endl;
}
omp_set_num_threads(1);
}
Here, I am comparing the provided support level with the support level I will need. If the implementation does not provide sufficient support, I force the program to only use one thread, guaranteeing serial behavior.
Compilation
When compiling, make certain that the MPI Libraries are linked and that the OpenMP* directives/pragmas are processed. Most MPI implementations include a wrapper that will automatically include the appropriate compiler flags. You will need to make certain that your PATH
and LD_LIBRARY_PATH
environment variables are set correctly for your system and implementation. If possible, make certain that you are using a multithreaded MPI library. In the Intel® MPI Library, this is done by adding the -mt_mpi
command line option. So, for Intel® Parallel Studio XE, here is my compile line for C++.
mpiicpc -mt_mpi -qopenmp hybrid_hello.cpp -o hybrid_hello
For the other languages, simply replace the source file and change the MPI compilation script used to match. If all goes as expected, everything should compile and link with no problems.
Running
At this point, you can run your program just like any other MPI program, remembering that if you need to set any OpenMP* environment variables, that should be done either on the command line (using -genv
) or before running. In this run, I did not set any environment variables controlling the behavior of MPI or OpenMP*, using defaults for everything.
mpirun -n 4 ./hybrid_hello
Your output should look similar to this:
Hello from thread 0 of 6 in rank 0 of 4 on localhost
Hello from thread 1 of 6 in rank 0 of 4 on localhost
Hello from thread 2 of 6 in rank 0 of 4 on localhost
Hello from thread 3 of 6 in rank 0 of 4 on localhost
Hello from thread 4 of 6 in rank 0 of 4 on localhost
Hello from thread 5 of 6 in rank 0 of 4 on localhost
Hello from thread 0 of 6 in rank 1 of 4 on localhost
Hello from thread 1 of 6 in rank 1 of 4 on localhost
Hello from thread 4 of 6 in rank 1 of 4 on localhost
Hello from thread 5 of 6 in rank 1 of 4 on localhost
Hello from thread 2 of 6 in rank 1 of 4 on localhost
Hello from thread 3 of 6 in rank 1 of 4 on localhost
Hello from thread 0 of 6 in rank 2 of 4 on localhost
Hello from thread 1 of 6 in rank 2 of 4 on localhost
Hello from thread 2 of 6 in rank 2 of 4 on localhost
Hello from thread 3 of 6 in rank 2 of 4 on localhost
Hello from thread 4 of 6 in rank 2 of 4 on localhost
Hello from thread 5 of 6 in rank 2 of 4 on localhost
Hello from thread 0 of 6 in rank 3 of 4 on localhost
Hello from thread 1 of 6 in rank 3 of 4 on localhost
Hello from thread 2 of 6 in rank 3 of 4 on localhost
Hello from thread 3 of 6 in rank 3 of 4 on localhost
Hello from thread 4 of 6 in rank 3 of 4 on localhost
Hello from thread 5 of 6 in rank 3 of 4 on localhost
The hostname should match your system(s), and the process and thread numbers will vary based on how many cores are available on the system(s) you are using and how many processes you specify.
Conclusions
Hopefully, this will help you get started with hybrid MPI/OpenMP* programming. There is a lot of potential here, but there are also a lot of pitfalls.