Configuring Parameters

Developer Guide

Developer Guide for Intel® oneAPI Math Kernel Library Linux*

Download PDF

ID 766690

Date 6/24/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-E87C4C39-0CCE-4B67-BC69-1854A4D8D257

The most significant parameters in HPL.dat are P, Q, NB, and N. Specify them as follows:

P and Q - the number of rows and columns in the process grid, respectively.

P*Q must be the number of MPI processes that HPL is using.

Choose P≤Q.
N – the problem size:
- For homogeneous runs, choose N divisible by NB*LCM(P,Q), where LCM is the least common multiple of the two numbers.
- For heterogeneous runs, see Heterogeneous Support in the Intel® Distribution for LINPACK* Benchmark for how to choose N.
NOTE:

Increasing N usually increases performance, but the size of N is bounded by memory. In general, you can compute the memory required to store the matrix (which does not count internal buffers) as 8*N*N/(P*Q) bytes, where N is the problem size and P and Q are the process grids in HPL.dat. A general rule is to choose a problem size that fills 80% of memory.

NB – the block size of the data distribution.

The table below shows the recommended values of NB and element sizes for the CPU version:

Processors	Intel® Distribution for LINPACK* Benchmark	Intel® Optimized HPL-AI* Benchmark
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) instructions	192	192
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions	384	384
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions with Intel® Deep Learning Boost and bfloat16	384	768
Intel® Xeon Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions with Intel® AMX bfloat16	384	1536
Element size	8 bytes	4 bytes

The table below shows the recommended values of NB and element sizes for the GPU version:

Processors	Intel® Distribution for LINPACK* Benchmark	Intel® Optimized HPL-AI* Benchmark
Intel® Data Center GPU Series	384	1152 or 1536
Element size	8 bytes	2 bytes

Parent topic: Intel® Distribution for LINPACK* Benchmark and Intel® Optimized HPL-AI* Benchmark