Visible to Intel only — GUID: GUID-DA52F2E7-7283-42E2-AA76-B8D6A1A5F671
Visible to Intel only — GUID: GUID-DA52F2E7-7283-42E2-AA76-B8D6A1A5F671
Getting Started with Intel Optimized HPCG
To start working with the benchmark,
On a cluster file system, unpack the Intel Optimized HPCG package to a directory accessible by all nodes. Read and accept the license as indicated in the readme.txt file included in the package.
Change the directory to hpcg/bin.
Determine the prebuilt version of the benchmark that is best for your system or follow QUICKSTART instructions to build a version of the benchmark for your MPI implementation.
Ensure that Intel® oneAPI Math Kernel Library (oneMKL), Intel C/C++ Compiler and MPI run-time environments have been set properly. You can do this using the scriptsvars.sh, compilervars.sh, and mpivars.sh that are included in those distributions.
Run the chosen version of the benchmark.
- The Intel AVX and Intel AVX2 optimized versions perform best with one MPI process per socket and one OpenMP* thread per core skipping simultaneous multithreading (SMT) threads: set the affinity as KMP_AFFINITY=granularity=fine,compact,1,0. Specifically, for a 128-node cluster with two Intel® Xeon® Processor E5-2697 v4 per node, run the executable as follows:
#> mpiexec.hydra -n 256 -ppn 2 env OMP_NUM_THREADS=18 KMP_AFFINITY=granularity=fine,compact,1,0 ./bin/xhpcg_avx2 -n192
- The Intel® Xeon® Phi processor optimized version performs best with four MPI processes per processor and two threads for each processor core, with SMT turned on. Specifically, for a 128-node cluster with one Intel® Xeon® Phi processor 7250 per node, run the executable in this manner:
#> mpiexec.hydra -n 512 -ppn 2 env OMP_NUM_THREADS=34 KMP_AFFINITY=granularity=fine,compact,1,0 ./bin/xhpcg_knl -n160
- The Intel AVX and Intel AVX2 optimized versions perform best with one MPI process per socket and one OpenMP* thread per core skipping simultaneous multithreading (SMT) threads: set the affinity as KMP_AFFINITY=granularity=fine,compact,1,0. Specifically, for a 128-node cluster with two Intel® Xeon® Processor E5-2697 v4 per node, run the executable as follows:
When the benchmark completes execution, which usually takes a few minutes, find the YAML file with official results in the current directory. The performance rating of the benchmarked system is in the last section of the file:
HPCG result is VALID with a GFLOP/s rating of: [GFLOP/s]
Product and Performance Information |
---|
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |