Tutorial

Using MPI Tuner for Intel® MPI Library on Linux* OS

Download PDF
ID 768652
Date 2/19/2016
Public

Task 2: Include Missing Values in the Default Parameters Grid during Cluster Tuning

The mpitune utility has a predefined range of variable values to be scanned. If you know that your applications use atypical layouts or data sizes, you can overwrite the mpitune defaults to run with a customized set.

Ensure you have write access to the <installdir>/<arch>/<etc> directory.

The mpitune utility uses *.xml files from <installdir>/<arch>/<etc> for its configuration. There are two main configuration files that describe what is tuned and how tuning is performed in the cluster -specific mode: options.xml and Benchmarks/imb.xml, respectively.

For example, if you would like to customize the tuning of the I_MPI_EAGER_THRESHOLD variable, see the bolded text below for appropriate changes.

options.xml:
...
    <option name="I_MPI_EAGER_THRESHOLD" type="global" group="collective" weight="1.0">
        <actions>
            <step order="1" storage="first">
                <additive>
                    <env name="I_MPI_FALLBACK_DEVICE" type="global" value="disable" />
                </additive>
                <range name="range_vars">int_range(8192:524288:*:2)</range> <!-- explicit range from 8k to 512k with power of 2 -->
                <format>@range_vars()</format>
                <result format="[msg_size]" limit="1" separator="" />
            </step>
        </actions>
        <requirements>
        <param name="hosts" value="2:2" /> <!--use 2 hosts -->
      <param name="perhost" value="1:1" /> <!-- with 1 process on host -->
      <param name="processes" value="2:2" /> <!-- and 2 processes total -->
      <param name="devices" value="shm:dapl,shm:tmi" /> <!-- for shim:dapl and shm tmi fabrics (I_MPI_FABRICS) -->
        </requirements>
        <result <!-- internal format description -->
            format="#first#"
            quotes="no"
            quotesInline="no"
        />
    </option>
...
Benchmarks/imb.xml:
    <test title="IMB Sendrecv" weight="1.0">
        <description>Sendrecv test from IMB benchmark for OUTPUT mode</description>
        <executable>"IMB-MPI1" -npmin %proc% -iter 5 -msglen @msglen_file() Sendrecv</executable>
        <function title="msglen_file">range_file(768:1536:+:256;"value[endl]")</function> <!-- msg len file of IMB with range: 768, 1024, 1280 and 1536 bytes -->
        <launch_line>%mpiexec%%globals%%locals%%executable%</launch_line>
        <requirements> <!-- values for requirements section are calculated as intersection with the same block from options.xml file. Results are in the mpitune schedule -->
            <param name="hosts" value="1:-1" />
            <param name="perhost" value="1:-1" />
            <param name="processes" value="2:-1" />
            <param name="devices" value="rdssm,rdma,shm,ssm,sock,shm:dapl,shm:tcp,dapl,tcp,shm,shm:ofa,shm:tmi,ofa,tm i" />
        </requirements>
        <options_filter filter="exclusive"> <!--this section enumerates options to tune by this benchmark-->
            <option type="global"name="I_MPI_EAGER_THRESHOLD" />
            <option type="global"name="I_MPI_INTRANODE_EAGER_THRESHOLD" />
        </options_filter>
        <result <!-- format to parse benchmark output -->
            source="brtime"
            paramGroup="4"
            paramTitle="t[usec]"
            paramTarget="min"
            paramLeftMarginGroup="2"
            paramRightMarginGroup="3"
            paramChooseMode="heaviest"
            paramDiffDelta="0.001"
            msgGroup="0"
            msgTitle="Bytes"
            iterationCompare="min"
            startline=".*(\#bytes\s+\#repetitions).*"
            dataline="\s+(\d+)\s+(\d+)\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)"
            solidatalines="1"
        />
    </test>
...
NOTE:

When you define a custom range for tuning the option, take the following parameter into account:

test->result->source

This parameter is described in the configuration file of the benchmark you used. For example, your explicitly defined range is used when thtime value is set. However, the brtime parameter requires that you set the bottom and upper boundaries, while all intermediate values of the range are calculated automatically.