Measuring Effect of Threading on dgemm

Using oneMKL for Matrix Multiplication - C

Download PDF

ID 758506

Date 7/13/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-8DB79DF7-B853-46C9-8F46-C3782E0CA401

View Details

Measuring Effect of Threading on dgemm

By default, oneMKL uses n threads, where n is the number of physical cores on the system. By restricting the number of threads and measuring the change in performance of dgemm, this exercise shows how threading impacts performance.

Limit the Number of Cores Used for dgemm

This exercise uses mkl_set_num_threads to override the default number of threads and mkl_get_max_threads to determine the maximum number of threads.

/* C source code is found in dgemm_threading_effect_example.c */

    printf (" Finding max number of threads Intel(R) MKL can use for parallel runs \n\n");
    max_threads = mkl_get_max_threads();

    printf (" Running Intel(R) MKL from 1 to %i threads \n\n", max_threads);
    for (i = 1; i <= max_threads; i++) {
        for (j = 0; j < (m*n); j++)
            C[j] = 0.0;
        
        printf (" Requesting Intel(R) MKL to use %i thread(s) \n\n", i);
        mkl_set_num_threads(i);

        printf (" Making the first run of matrix product using Intel(R) MKL dgemm function \n"
                " via CBLAS interface to get stable run time measurements \n\n");
        cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 
                    m, n, k, alpha, A, k, B, n, beta, C, n);
        
        printf (" Measuring performance of matrix product using Intel(R) MKL dgemm function \n"
                " via CBLAS interface on %i thread(s) \n\n", i);
        s_initial = dsecnd();
        for (r = 0; r < LOOP_COUNT; r++) {
            cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 
                        m, n, k, alpha, A, k, B, n, beta, C, n);
        }
        s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;

        printf (" == Matrix multiplication using Intel(R) MKL dgemm completed ==\n"
                " == at %.5f milliseconds using %d thread(s) ==\n\n", (s_elapsed * 1000), i);
    }

Examine the results shown and notice that time to multiply the matrices decreases as the number of threads increases. If you try to run this exercise with more than the number of threads returned by mkl_get_max_threads, you might see performance degrade when you use more threads than physical cores.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Using oneMKL for Matrix Multiplication - C

Measuring Effect of Threading on dgemm

Limit the Number of Cores Used for dgemm

See Also