Visible to Intel only — GUID: GUID-775AFEFC-3A6E-4BB9-B2C5-8D6ECC2B5B37
Measuring Communication and Computation Overlap
Measuring Pure Communication Time
Iallgather
Iallgather_pure
Iallgatherv
Iallgatherv_pure
Iallreduce
Iallreduce_pure
Ialltoall
Ialltoall_pure
Ialltoallv
Ialltoallv_pure
Ibarrier
Ibarrier_pure
Ibcast
Ibcast_pure
Igather
Igather_pure
Igatherv
Igatherv_pure
Ireduce
Ireduce_pure
Ireduce_scatter
Ireduce_scatter_pure
Iscatter
Iscatter_pure
Iscatterv
Iscatterv_pure
Visible to Intel only — GUID: GUID-775AFEFC-3A6E-4BB9-B2C5-8D6ECC2B5B37
Sample 1 - IMB-MPI1 PingPong Allreduce
The following example shows the results of the PingPong and Allreduce benchmark:
<..> np 2 IMB-MPI1 PingPong Allreduce #--------------------------------------------------- # Intel(R) MPI Benchmark Suite V3.2, MPI1 part #--------------------------------------------------- # Date : Thu Sep 4 13:20:07 2008 # Machine : x86_64 # System : Linux # Release : 2.6.9-42.ELsmp # Version : #1 SMP Wed Jul 12 23:32:02 EDT 2006 # MPI Version : 2.0 # MPI Thread Environment: MPI_THREAD_SINGLE # New default behavior from Version 3.2 on: # the number of iterations per message size is cut down # dynamically when a certain run time (per message size sample) # is expected to be exceeded. Time limit is defined by variable # SECS_PER_SAMPLE (=> IMB_settings.h) # or through the flag => -time # Calling sequence was: # ./IMB-MPI1 PingPong Allreduce # Minimum message length in bytes: 0 # Maximum message length in bytes: 4194304 # # MPI_Datatype : MPI_BYTE # MPI_Datatype for reductions : MPI_FLOAT # MPI_Op : MPI_SUM # # # List of Benchmarks to run: # PingPong # Allreduce #--------------------------------------------------- # Benchmarking PingPong # #processes = 2 #--------------------------------------------------- #bytes #repetitions t[μsec] Mbytes/sec 0 1000 .. .. 1 1000 2 1000 4 1000 8 1000 16 1000 32 1000 64 1000 128 1000 256 1000 512 1000 1024 1000 2048 1000 4096 1000 8192 1000 16384 1000 32768 1000 65536 640 131072 320 262144 160 524288 80 1048576 40 2097152 20 4194304 10 #------------------------------------------------------- # Benchmarking Allreduce # ( #processes = 2 ) #------------------------------------------------------- #bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec] 0 1000 .. .. .. 4 1000 8 1000 16 1000 32 1000 64 1000 128 1000 256 1000 512 1000 1024 1000 2048 1000 4096 1000 8192 1000 16384 1000 32768 1000 65536 640 131072 320 262144 160 524288 80 1048576 40 2097152 20 4194304 10 # All processes entering MPI_Finalize