Benchmarking GEMM on Intel® Architecture Processors

ID 736724
Updated 7/5/2022
Version Latest
Public

author-image

By

An earlier version of the benchmark had a bug wherein all the entries of matrices were erroneously initialized with a constant value of -0.5 instead of random floating-point values. This is fixed now.

Introduction

Math libraries, such as the Intel® Math Kernel Library (Intel® MKL) and BLIS* framework, provide fast implementations for many frequently used math routines. In this article, we show how to measure the performance of SGEMM/DGEMM (single- and double-precision floating point GEMM) using the implementations provided by Intel® MKL and BLIS* framework.

 

Prerequisites

Intel® C Compiler

The instructions below assume you have obtained a version of Intel® C Compiler, installed it and run the appropriate script that sets up the appropriate environment variables.

source <INTEL_COMPILER_INSTALL_DIR>/linux/bin/compilervars.sh intel64

Intel® MKL

For Intel® MKL, we assume you have obtained a version of Intel® MKL, installed it and run the appropriate script that sets up the appropriate environment variables (such as $MKLROOT).

source <INTEL_MKL_INSTALL_DIR>/linux/mkl/bin/mklvars.sh intel64

Intel® MKL is available for download from here: Intel® MKL

BLIS* Framework

The latest version of the BLIS* framework is available for download on GitHub*: BLIS* Framework

During the installation process for BLIS* framework you can select among a preset range of configurations that target specific CPUs (such as "sandybridge", "haswell", "knl"). Note, that the primary branch of BLIS* does not contain a configuration that is specially configured for  the Intel® microarchitecture code name Skylake. However, you can experiment with the "skx" branch of the code that is available (branch: skx), which provides a configuration that is better tailored for Skylake (e.g., sets the -xCORE-AVX512 compilation flag). For Intel® microarchitecture code name Haswell, you may use the "haswell" option of the code.

Below, we show an example of using the "skx" option that is available in the branch linked above (for "haswell" simply substitute "skx" with "haswell" - the Haswell option is also available in the primary branch of BLIS*)

You can directly download as a zip file and uncompress or clone the git repository from GitHub*.

Then, run configure and make as follows:

./configure --enable-threading=omp CC=icc skx
make

The above will enable multithreading using OpenMP*, use the Intel® C Compiler and select the "skx" configuration folder from the BLIS* framework installation directory. The library libblis.a will be generated under <BLIS_DIR>/lib/skx (where <BLIS_DIR> is the path to the main folder of the BLIS* installation directory).

 

GEMM Benchmark

Description

The benchmark we use is available as an attachment with this article. It contains a source file, a Makefile, and a script to automate running the benchmark. We discuss these in more detail in the following sections in the context of building and running the benchmark.

Our benchmark is effectively a simple wrapper to repetitive calls to SGEMM or DGEMM. According to your choice during compilation, that would be:

  • The Intel® MKL or BLIS* framework version of the GEMM kernel.
  • Single-precision or double-precision GEMM (SGEMM/DGEMM).

Here is a high-level overview of what the benchmark code does:

  1. Takes as its only parameter the problem size N.
  2. Allocates matrices A, B, and C of size N x N, and initializes them with random data.
  3. Calls GEMM (sgemm/dgemm): One time for initialization, and then loops over consecutive calls to GEMM for a preset number of times (default = 4).
  4. Measures the execution time of the above loop, calculates and presents the performance (GFLOPS).

How to Build

To build the benchmark, simply use the provided Makefile.

Keep in mind:

  • The Intel® MKL version expects to find the $MKLROOT environment variable that should point to the Intel® MKL folder.
  • The BLIS* framework version expects to find the $BLISLIB environment variable that should point to the BLIS* folder where the BLIS* framework library (libblis.a) is located (e.g., <BLIS_DIR>/lib/skx for a build that uses the "skx" configuration).
  • For building the benchmark we use the Intel® C Compiler (icc), so make sure this is available in your environment.

(For more details on all the above, please, refer to the "Prerequisites" section of this article.)

To build the Intel® MKL version:

make mkl

To build the BLIS* framework version:

make blis

To build both the Intel® MKL and BLIS* framework versions:

make all

Running the make command as shown above will create the corresponding executables in the current folder (sgemmbench.mkl, dgemmbench.mkl, sgemmbench.blis, dgemmbench.blis).

How to Run

Once you have built the benchmark version you desire (Intel® MKL, BLIS* framework, or both), you can use the provided run-script (run.sh).

First, to make run.sh executable you need to run the following: 

chmod +x run.sh

The run-script runs the GEMM benchmark with a specified number of threads, problem size, and math library.

./run.sh <NUM_THREADS> <SIZE_N> <MATH_LIBRARY>

Where:

  • NUM_THREADS: Number of threads to run the benchmark with. For example for a dual socket Intel® Xeon® Gold 6148 with a total of 40 cores, you can select "40". The script would use the correct affinity (1 thread per core).
  • SIZE_N: The problem size with which to run the benchmark (e.g., 10000). If you want to run with a range of preselected values, you can use the word "all".
  • MATH_LIBRARY: Here you can select "mkl" for the Intel® MKL or "blis" for BLIS* framework.

Examples:

  • Run with 20 threads, use all the predefined problem sizes, and use the Intel® MKL version of GEMM.
./run.sh 20 all mkl
  • Run with 28 threads, use a problem size with 10000x10000 matrix, and use the BLIS* framework version of GEMM.
./run.sh 28 10000 blis

 

  Attachment   Size
gemm-benchmark.zip 9 kb