Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

ID 766690
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Getting Started with Intel® GPU Optimized HPCG

To start working with the benchmark:

  1. On a cluster file system, unpack the Intel® GPU Optimized HPCG package to a directory accessible by all nodes. Read and accept the license as indicated in the readme.txt file included in the package.

  2. Change the directory to hpcg/hpcg_gpu/bin.

  3. Determine the prebuilt version of the benchmark that is best for your system or follow README.md instructions to build a version of the benchmark for your MPI implementation, including modifying and utilizing one of the provided example_*_runscript.sh files for building and running.

  4. Ensure that the Intel® oneAPI Math Kernel Library (oneMKL), Intel SYCL C++ Compiler, and MPI runtime environments have been set properly. You can do this using the vars.sh scripts that are included in those distributions or using oneAPI HPCG toolkit vars.sh scripts.

  5. Run the chosen version of the benchmark.

    For Intel® Data Center GPU Max Series, the PVC optimized versions perform best with one MPI process per GPU tile. The reference code (used for verifying correctness and convergence) utilizes OpenMP* threads, which should be divided up evenly between the various ranks on the node. Specifically, for a 128-node cluster with four Intel® Data Center GPU Max Series cards per node each with two sockets on a two-socket fourth-generation Platinum 8480+ with 56 cores per socket and 2 simultaneous multithreading (SMT) threads per core, run the executable as follows:

    #> export OMP_NUM_THREADS=28; export KMP_AFFINITY=granularity=fine,compact; export SYCL_QUEUE_THREAD_POOL_SIZE=26; mpiexec.hydra --genvall -np 1024 --ppn 8 -f ${nodefile} ./bin/xhpcg_impi_pvc --nx=320 --ny=320 --nz=320 --run-real-ref=1 --affinity-per-node=compact

  6. When the benchmark completes execution, which usually takes a few minutes, find the YAML file with official results in the current directory. The performance rating of the benchmarked system is in the last section of the file:

    HPCG result is VALID with a GFLOP/s rating of: [GFLOP/s]

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201