Visible to Intel only — GUID: GUID-80527C6F-C7AC-44DF-BAEA-3DDD9990254E
Measuring Communication and Computation Overlap
Measuring Pure Communication Time
Iallgather
Iallgather_pure
Iallgatherv
Iallgatherv_pure
Iallreduce
Iallreduce_pure
Ialltoall
Ialltoall_pure
Ialltoallv
Ialltoallv_pure
Ibarrier
Ibarrier_pure
Ibcast
Ibcast_pure
Igather
Igather_pure
Igatherv
Igatherv_pure
Ireduce
Ireduce_pure
Ireduce_scatter
Ireduce_scatter_pure
Iscatter
Iscatter_pure
Iscatterv
Iscatterv_pure
Visible to Intel only — GUID: GUID-80527C6F-C7AC-44DF-BAEA-3DDD9990254E
Sample 1 - IMB-MPI1 PingPong Allreduce
The following example shows the results of the PingPong and Allreduce benchmark:
<..> np 2 IMB-MPI1 PingPong Allreduce
#---------------------------------------------------
# Intel(R) MPI Benchmark Suite V3.2, MPI1 part
#---------------------------------------------------
# Date : Thu Sep 4 13:20:07 2008
# Machine : x86_64
# System : Linux
# Release : 2.6.9-42.ELsmp
# Version : #1 SMP Wed Jul 12 23:32:02 EDT 2006
# MPI Version : 2.0
# MPI Thread Environment: MPI_THREAD_SINGLE
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# SECS_PER_SAMPLE (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# ./IMB-MPI1 PingPong Allreduce
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
# Allreduce
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[μsec] Mbytes/sec
0 1000 .. ..
1 1000
2 1000
4 1000
8 1000
16 1000
32 1000
64 1000
128 1000
256 1000
512 1000
1024 1000
2048 1000
4096 1000
8192 1000
16384 1000
32768 1000
65536 640
131072 320
262144 160
524288 80
1048576 40
2097152 20
4194304 10
#-------------------------------------------------------
# Benchmarking Allreduce
# ( #processes = 2 )
#-------------------------------------------------------
#bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec]
0 1000 .. .. ..
4 1000
8 1000
16 1000
32 1000
64 1000
128 1000
256 1000
512 1000
1024 1000
2048 1000
4096 1000
8192 1000
16384 1000
32768 1000
65536 640
131072 320
262144 160
524288 80
1048576 40
2097152 20
4194304 10
# All processes entering MPI_Finalize