Measuring Performance with oneMKL Support Functions

Using oneMKL for Matrix Multiplication - C

Download PDF

ID 758506

Date 10/31/2024

Version

Public

Visible to Intel only — GUID: GUID-07526836-E352-498C-93C1-DBD535EF83A7

View Details

Measuring Performance with oneMKL Support Functions

oneMKL provides functions to measure performance. This provides a way of quantifying the performance improvement resulting from using oneMKL routines in this tutorial.

Measure Performance of dgemm

Use the dsecnd routine to return the elapsed CPU time in seconds.

NOTE:

The quick execution of the dgemm routine makes it difficult to measure its speed, even for an operation on a large matrix. For this reason, the exercises perform the multiplication multiple times. You should set the value of the LOOP_COUNT constant so that the total execution time is about one second.

/* C source code is found in dgemm_with_timing.c */

    printf (" Making the first run of matrix product using Intel(R) MKL dgemm function \n"
            " via CBLAS interface to get stable run time measurements \n\n");
    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 
                m, n, k, alpha, A, k, B, n, beta, C, n);

    printf (" Measuring performance of matrix product using Intel(R) MKL dgemm function \n"
            " via CBLAS interface \n\n");
    s_initial = dsecnd();
    for (r = 0; r < LOOP_COUNT; r++) {
        cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 
                    m, n, k, alpha, A, k, B, n, beta, C, n);
    }
    s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;

    printf (" == Matrix multiplication using Intel(R) MKL dgemm completed == \n"
            " == at %.5f milliseconds == \n\n", (s_elapsed * 1000));

Measure Performance Without Using dgemm

In order to show the improvement resulting from using dgemm, perform the same measurement, but use a triply-nested loop to multiply the matrices.

/* C source code is found in matrix_multiplication.c */

    printf (" Making the first run of matrix product using triple nested loop\n"
            " to get stable run time measurements \n\n");
    for (i = 0; i < m; i++) {
        for (j = 0; j < n; j++) {
            sum = 0.0;
            for (l = 0; l < k; l++)
                sum += A[k*i+l] * B[n*l+j];
            C[n*i+j] = sum;
        }
    }

    printf (" Measuring performance of matrix product using triple nested loop \n\n");
    s_initial = dsecnd();
    for (r = 0; r < LOOP_COUNT; r++) {
        for (i = 0; i < m; i++) {
            for (j = 0; j < n; j++) {
                sum = 0.0;
                for (l = 0; l < k; l++)
                    sum += A[k*i+l] * B[n*l+j];
                C[n*i+j] = sum;
            }
        }
    }
    s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;
    
    printf (" == Matrix multiplication using triple nested loop completed == \n"
            " == at %.5f milliseconds == \n\n", (s_elapsed * 1000));

Compare the results in the first exercise using dgemm to the results of the second exercise without using dgemm.

You can find more information about measuring oneMKL performance from the article "A simple example to measure the performance of an oneMKL function" in the Intel® oneAPI Math Kernel Library Knowledge Base.

Product and Performance Information
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex. Notice revision #20201201

Product and Performance Information

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.

Notice revision #20201201

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Using oneMKL for Matrix Multiplication - C

Measuring Performance with oneMKL Support Functions

Measure Performance of dgemm

Measure Performance Without Using dgemm

See Also