Visible to Intel only — GUID: GUID-4ECD27EB-AECA-45EB-B87E-6489BCD61536
Measuring Communication and Computation Overlap
Measuring Pure Communication Time
Iallgather
Iallgather_pure
Iallgatherv
Iallgatherv_pure
Iallreduce
Iallreduce_pure
Ialltoall
Ialltoall_pure
Ialltoallv
Ialltoallv_pure
Ibarrier
Ibarrier_pure
Ibcast
Ibcast_pure
Igather
Igather_pure
Igatherv
Igatherv_pure
Ireduce
Ireduce_pure
Ireduce_scatter
Ireduce_scatter_pure
Iscatter
Iscatter_pure
Iscatterv
Iscatterv_pure
Visible to Intel only — GUID: GUID-4ECD27EB-AECA-45EB-B87E-6489BCD61536
Sample 2 - IMB-MPI1 PingPing Allreduce
The following example shows the results of the PingPing
<..> -np 6 IMB-MPI1 pingping allreduce -map 2x3 -msglen Lengths -multi 0 Lengths file: 0 100 1000 10000 100000 1000000 #--------------------------------------------------- # Intel(R) MPI Benchmark Suite V3.2.2, MPI1 part #--------------------------------------------------- # Date : Thu Sep 4 13:26:03 2008 # Machine : x86_64 # System : Linux # Release : 2.6.9-42.ELsmp # Version : #1 SMP Wed Jul 12 23:32:02 EDT 2006 # MPI Version : 2.0 # MPI Thread Environment: MPI_THREAD_SINGLE # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # SECS_PER_SAMPLE (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # IMB-MPI1 pingping allreduce -map 3x2 -msglen Lengths # -multi 0 # Message lengths were user-defined # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # (Multi-)PingPing # (Multi-)Allreduce #-------------------------------------------------------------- # Benchmarking Multi-PingPing # ( 3 groups of 2 processes each running simultaneously ) # Group 0: 0 3 # # Group 1: 1 4 # # Group 2: 2 5 # #-------------------------------------------------------------- # bytes #rep.s t_min[μsec] t_max[μsec] t_avg[μsec] Mbytes/sec 0 1000 .. .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 #-------------------------------------------------------------- # Benchmarking Multi-Allreduce # ( 3 groups of 2 processes each running simultaneously ) # Group 0: 0 3 # # Group 1: 1 4 # # Group 2: 2 5 # #-------------------------------------------------------------- #bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec] 0 1000 .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 #-------------------------------------------------------------- # Benchmarking Allreduce # #processes = 4; rank order (rowwise): # 0 3 # # 1 4 # # ( 2 additional processes waiting in MPI_Barrier) #-------------------------------------------------------------- # bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec] 0 1000 .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 #-------------------------------------------------------------- # Benchmarking Allreduce # # processes = 6; rank order (rowwise): # 0 3 # # 1 4 # # 2 5 # #-------------------------------------------------------------- # bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec] 0 1000 .. .. .. 100 1000 1000 1000 10000 1000 100000 419 1000000 41 # All processes entering MPI_Finalize