Visible to Intel only — GUID: GUID-6C17B5E6-460C-4AED-A3E5-08F78F2A1189
Measuring Communication and Computation Overlap
Measuring Pure Communication Time
Iallgather
Iallgather_pure
Iallgatherv
Iallgatherv_pure
Iallreduce
Iallreduce_pure
Ialltoall
Ialltoall_pure
Ialltoallv
Ialltoallv_pure
Ibarrier
Ibarrier_pure
Ibcast
Ibcast_pure
Igather
Igather_pure
Igatherv
Igatherv_pure
Ireduce
Ireduce_pure
Ireduce_scatter
Ireduce_scatter_pure
Iscatter
Iscatter_pure
Iscatterv
Iscatterv_pure
Visible to Intel only — GUID: GUID-6C17B5E6-460C-4AED-A3E5-08F78F2A1189
Sample 2 - IMB-MPI1 PingPing Allreduce
The following example shows the results of the PingPing
<..>
-np 6 IMB-MPI1
pingping allreduce -map 2x3 -msglen Lengths -multi 0
Lengths file:
0
100
1000
10000
100000
1000000
#---------------------------------------------------
# Intel(R) MPI Benchmark Suite V3.2.2, MPI1 part
#---------------------------------------------------
# Date : Thu Sep 4 13:26:03 2008
# Machine : x86_64
# System : Linux
# Release : 2.6.9-42.ELsmp
# Version : #1 SMP Wed Jul 12 23:32:02 EDT 2006
# MPI Version : 2.0
# MPI Thread Environment: MPI_THREAD_SINGLE
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# SECS_PER_SAMPLE (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# IMB-MPI1 pingping allreduce -map 3x2 -msglen Lengths
# -multi 0
# Message lengths were user-defined
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# (Multi-)PingPing
# (Multi-)Allreduce
#--------------------------------------------------------------
# Benchmarking Multi-PingPing
# ( 3 groups of 2 processes each running simultaneously )
# Group 0: 0 3
#
# Group 1: 1 4
#
# Group 2: 2 5
#
#--------------------------------------------------------------
# bytes #rep.s t_min[μsec] t_max[μsec] t_avg[μsec] Mbytes/sec
0 1000 .. .. .. ..
100 1000
1000 1000
10000 1000
100000 419
1000000 41
#--------------------------------------------------------------
# Benchmarking Multi-Allreduce
# ( 3 groups of 2 processes each running simultaneously )
# Group 0: 0 3
#
# Group 1: 1 4
#
# Group 2: 2 5
#
#--------------------------------------------------------------
#bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec]
0 1000 .. .. ..
100 1000
1000 1000
10000 1000
100000 419
1000000 41
#--------------------------------------------------------------
# Benchmarking Allreduce
#
#processes = 4; rank order (rowwise):
# 0 3
#
# 1 4
#
# ( 2 additional processes waiting in MPI_Barrier)
#--------------------------------------------------------------
# bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec]
0 1000 .. .. ..
100 1000
1000 1000
10000 1000
100000 419
1000000 41
#--------------------------------------------------------------
# Benchmarking Allreduce
#
# processes = 6; rank order (rowwise):
# 0 3
#
# 1 4
#
# 2 5
#
#--------------------------------------------------------------
# bytes #repetitions t_min[μsec] t_max[μsec] t_avg[μsec]
0 1000 .. .. ..
100 1000
1000 1000
10000 1000
100000 419
1000000 41
# All processes entering MPI_Finalize