Getting Started with Intel Optimized HPCG

Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

Download PDF

ID 766690

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-DA52F2E7-7283-42E2-AA76-B8D6A1A5F671

View Details

Getting Started with Intel Optimized HPCG

To start working with the benchmark,

On a cluster file system, unpack the Intel Optimized HPCG package to a directory accessible by all nodes. Read and accept the license as indicated in the readme.txt file included in the package.
Change the directory to hpcg/bin.
Determine the prebuilt version of the benchmark that is best for your system or follow QUICKSTART instructions to build a version of the benchmark for your MPI implementation.
Ensure that Intel® oneAPI Math Kernel Library, Intel C/C++ Compiler and MPI run-time environments have been set properly. You can do this using the scriptsvars.sh, compilervars.sh, and mpivars.sh that are included in those distributions.
Run the chosen version of the benchmark.
- The Intel AVX and Intel AVX2 optimized versions perform best with one MPI process per socket and one OpenMP* thread per core skipping simultaneous multithreading (SMT) threads: set the affinity as KMP_AFFINITY=granularity=fine,compact,1,0. Specifically, for a 128-node cluster with two Intel® Xeon® Processor E5-2697 v4 per node, run the executable as follows:
```
#> mpiexec.hydra -n 
256 -ppn 2 env OMP_NUM_THREADS=18 
KMP_AFFINITY=granularity=fine,compact,1,0 
./bin/xhpcg_avx2 -n192
```
- The Intel® Xeon® Phi processor optimized version performs best with four MPI processes per processor and two threads for each processor core, with SMT turned on. Specifically, for a 128-node cluster with one Intel® Xeon® Phi processor 7250 per node, run the executable in this manner:
```
#> mpiexec.hydra -n 
512 -ppn 2 env OMP_NUM_THREADS=34 
KMP_AFFINITY=granularity=fine,compact,1,0 
./bin/xhpcg_knl -n160
```
When the benchmark completes execution, which usually takes a few minutes, find the YAML file with official results in the current directory. The performance rating of the benchmarked system is in the last section of the file:

HPCG result is VALID with a GFLOP/s rating of: [GFLOP/s]

Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Parent topic: Intel® Optimized High Performance Conjugate Gradient Benchmark

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

Getting Started with Intel Optimized HPCG