Developer Reference for Intel® oneAPI Math Kernel Library for C

ID 766684
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Examples of Using OpenMP* Threading for FFT Computation

The following sample program shows how to employ internal OpenMP* threading in Intel® oneAPI Math Kernel Library (oneMKL) for FFT computation.

To specify the number of threads inside Intel® oneAPI Math Kernel Library (oneMKL), use the following settings:

set MKL_NUM_THREADS = 1 for one-threaded mode;

set MKL_NUM_THREADS = 4 for multi-threaded mode.

Using oneMKL Internal Threading Mode (C Example)

 
/* C99 example */
#include "mkl_dfti.h"

float data[200][100];
DFTI_DESCRIPTOR_HANDLE fft = NULL;
MKL_LONG dim_sizes[2] = {200, 100};

/* ...put values into data[i][j] 0<=i<=199, 0<=j<=99 */

DftiCreateDescriptor(&fft, DFTI_SINGLE, DFTI_REAL, 2, dim_sizes);
DftiCommitDescriptor(fft);
DftiComputeForward(fft, data);
DftiFreeDescriptor(&fft);
 

The following Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” and Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” illustrate a parallel customer program with each descriptor instance used only in a single thread.

Specify the number of OpenMP threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” like this:

set MKL_NUM_THREADS = 1 for Intel® oneAPI Math Kernel Library (oneMKL) to work in the single-threaded mode (recommended);

set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.

Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region

Note that in this example, the program can be transformed to become single-threaded at the customer level but using parallel mode within Intel® oneAPI Math Kernel Library (oneMKL). To achieve this, you must set the parameter DFTI_NUMBER_OF_TRANSFORMS = 4 and to set the corresponding parameter DFTI_INPUT_DISTANCE = 5000.

/* C99 example */
#include "mkl_dfti.h"
#include <omp.h>
#define ARRAY_LEN(a) sizeof(a)/sizeof(a[0])

// 4 OMP threads, each does 2D FFT 50x100 points
MKL_Complex8 data[4][50][100];
int nth = ARRAY_LEN(data);
MKL_LONG dim_sizes[2] = {
    ARRAY_LEN(data[0]),
    ARRAY_LEN(data[0][0])
};  /* {50, 100} */
int th;

/* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */

// assume data is initialized and do 2D FFTs
#pragma omp parallel for shared(dim_sizes, data)
for (th = 0; th < nth; ++th)
{
    DFTI_DESCRIPTOR_HANDLE myFFT = NULL;

    DftiCreateDescriptor(&myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, dim_sizes);
    DftiCommitDescriptor(myFFT);
    DftiComputeForward(myFFT, data[th]);
    DftiFreeDescriptor(&myFFT);
}

Specify the number of OpenMP threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” like this:

set MKL_NUM_THREADS = 1 for Intel® oneAPI Math Kernel Library (oneMKL) to work in the single-threaded mode (obligatory);

set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.

Using Parallel Mode with Multiple Descriptors Initialized in One Thread

/* C99 example */
#include "mkl_dfti.h"
#include <omp.h>#
define ARRAY_LEN(a) sizeof(a)/sizeof(a[0])

// 4 OMP threads, each does 2D FFT 50x100 points
MKL_Complex8 data[4][50][100];
int nth = ARRAY_LEN(data);
MKL_LONG dim_sizes[2] = {
    ARRAY_LEN(data[0]),
    ARRAY_LEN(data[0][0])
};  /* {50, 100} */
DFTI_DESCRIPTOR_HANDLE FFT[ARRAY_LEN(data)];
int th;

/* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */

for (th = 0; th < nth; ++th)
    DftiCreateDescriptor(&FFT[th], DFTI_SINGLE, DFTI_COMPLEX, 2, dim_sizes);
for (th = 0; th < nth; ++th)
    DftiCommitDescriptor(FFT[th]);

// assume data is initialized and do 2D FFTs
#pragma omp parallel for shared(FFT, data)
for (th = 0; th < nth; ++th)
    DftiComputeForward(FFT[th], data[th]);

for (th = 0; th < nth; ++th)
    DftiFreeDescriptor(&FFT[th]);

Using Parallel Mode with a Common Descriptor

The following Example “Using Parallel Mode with a Common Descriptor” illustrates a parallel customer program with a common descriptor used in several threads.

#include "mkl_dfti.h"
#include <omp.h>
#define ARRAY_LEN(a) sizeof(a)/sizeof(a[0])

// 4 OMP threads, each does 2D FFT 50x100 points
MKL_Complex8 data[4][50][100];
int nth = ARRAY_LEN(data);
MKL_LONG len[2] = {ARRAY_LEN(data[0]), ARRAY_LEN(data[0][0])};
DFTI_DESCRIPTOR_HANDLE FFT;
int th;

/* ...put values into data[i][j][k] 0<=i<=3, 0<=j<=49, 0<=k<=99 */

DftiCreateDescriptor(&FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len);
DftiCommitDescriptor(FFT);

// assume data is initialized and do 2D FFTs
#pragma omp parallel for shared(FFT, data)
for (th = 0; th < nth; ++th)
    DftiComputeForward(FFT, data[th]);
DftiFreeDescriptor(&FFT);