Visible to Intel only — GUID: GUID-D9124F81-EBD8-4031-8D4B-E85873BBEE86
Visible to Intel only — GUID: GUID-D9124F81-EBD8-4031-8D4B-E85873BBEE86
Measuring Communication and Computation Overlap
Semantics of nonblocking collective operations enables you to run inter-process communication in the background while performing computations. However, the actual overlap depends on the particular MPI library implementation. You can measure a potential overlap of communication and computation using IMB-NBC benchmarks. The general benchmark flow is as follows:
Measure the time needed for a pure communication call.
Start a nonblocking collective operation.
Start computation using the IMB_cpu_exploit function, as described in the IMB-IO Nonblocking Benchmarks chapter. To ensure correct measurement conditions, the computation time used by the benchmark is close to the pure communication time measured at step 1.
Wait for communication to finish using the MPI_Wait function.
Displaying Results
The timing values to interpret the overlap potential are as follows:
t_pure is the time of a pure communication operation, non-overlapping with CPU activity.
t_CPU is the time the IMB_cpu_exploit function takes to complete when run concurrently with the nonblocking communication operation.
t_ovrl is the time of the nonblocking communication operation takes to complete when run concurrently with a CPU activity.
-
If t_ovrl = max(t_pure,t_CPU), the processes are running with a perfect overlap.
If t_ovrl = t_pure+t_CPU, the processes are running with no overlap.
Since different processes in a collective operation may have different execution times, the timing values are taken for the process with the biggest t_ovrl execution time. The IMB-NBC result tables report the timings t_ovrl, t_pure, t_CPU and the estimated overlap in percent calculated by the following formula:
overlap = 100.*max(0,min(1, (t_pure+t_CPU-t_ovrl) / min(t_pure, t_CPU))
See Also