Visible to Intel only — GUID: GUID-D86624A3-467F-4431-B5B4-AE0A271D063B
Visible to Intel only — GUID: GUID-D86624A3-467F-4431-B5B4-AE0A271D063B
Overview of the Intel Optimized HPCG
The Intel® Optimized High Performance Conjugate Gradient Benchmark (Intel® Optimized HPCG) provides CPU- and GPU-optimized implementations of the HPCG benchmark (http://hpcg-benchmark.org). The CPU version is optimized for Intel® Xeon® processors with Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512) support. The GPU version is optimized for the Intel® Data Center GPU Max Series.
The HPCG Benchmark is intended to complement the High Performance LINPACK benchmark used in the TOP500 (http://www.top500.org) system ranking by providing a metric that better aligns with a broader set of important cluster applications.
The HPCG benchmark implementation is based on a 3-dimensional (3D) regular 27-point discretization of an elliptic partial differential equation. The implementation calls a 3D domain to fill a 3D virtual process grid for all the available MPI ranks. The HPCG benchmark uses the preconditioned conjugate gradient method (CG) to solve the intermediate systems of equations and incorporates a local and symmetric Gauss-Seidel preconditioning step that requires a triangular forward solve and a backward solve. A synthetic multi-grid V-cycle is used on each preconditioning step to make the benchmark better fit real-world applications. The HPCG benchmark implements matrix multiplication locally, with an initial halo exchange between neighboring processes. The benchmark exhibits irregular accesses to memory and fine-grain recursive computations that dominate many scientific workloads.
Intel® CPU Optimized HPCG Benchmark
The Intel® Optimized HPCG for CPUs benchmark contains source code of the HPCG v3.1 reference implementation with necessary modifications to include:
Intel® architecture optimizations
Prebuilt benchmark executables that link to Intel® oneAPI Math Kernel Library (oneMKL)
- Inspector-executor Sparse BLAS kernels for sparse matrix-vector multiplication (SpMV)
Sparse triangular solve (SpTRSV)
Symmetric Gauss-Seidel smoother (SYMGS)
The Intel® oneAPI Math Kernel Library Inspector-executor Sparse BLAS kernels SpMV, TRSV, and SYMGS are implemented using an inspector-executor model. The inspection step chooses the best algorithm for the input matrix and converts the matrix to a special internal representation to achieve high performance at the execution step.
Intel® GPU Optimized HPCG Benchmark
The Intel® GPU Optimized HPCG benchmark contains source code of the HPCG v3.1 reference implementation with necessary modifications to include:
Using SYCL and C++ languages for efficient host and device scheduling of kernels
Intel® GPU architecture optimizations
A symmetric permutation of the sparse matrix to enable more task parallelism in some of the key computation kernels like the Symmetric Gauss-Seidel smoother.
Conversion of the sparse matrix to an Ellpack Block Sparse (ESB) matrix format for more efficient vectorizable loads on the GPU hardware.
Core computation kernels written in SYCL and using the "Explicit SIMD" (ESIMD) SYCL Extension for lower level Intel GPU programming:
Sparse matrix-vector multiplication (SpMV)
Sparse triangular solve (SpTRSV)
Symmetric Gauss-Seidel smoother (SYMGS).
Use this package to evaluate the performance of distributed-memory systems based on the Intel® Data Center GPU Max Series family.
Product and Performance Information |
---|
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201 |