Developer Reference for Intel® oneAPI Math Kernel Library for Fortran

ID 766686
Date 10/31/2024
Public
Document Table of Contents

Examples of Using OpenMP* Threading for FFT Computation

The following sample program shows how to employ internal OpenMP* threading in Intel® oneAPI Math Kernel Library (oneMKL) for FFT computation.

To specify the number of threads inside Intel® oneAPI Math Kernel Library (oneMKL), use the following settings:

set MKL_NUM_THREADS = 1 for one-threaded mode;

set MKL_NUM_THREADS = 4 for multi-threaded mode.

Using oneMKL Internal Threading Mode (C Example)

 
/* C99 example */
#include "mkl_dfti.h"

float data[200][100];
DFTI_DESCRIPTOR_HANDLE fft = NULL;
MKL_LONG dim_sizes[2] = {200, 100};

/* ...put values into data[i][j] 0<=i<=199, 0<=j<=99 */

DftiCreateDescriptor(&fft, DFTI_SINGLE, DFTI_REAL, 2, dim_sizes);
DftiCommitDescriptor(fft);
DftiComputeForward(fft, data);
DftiFreeDescriptor(&fft);
 

The following Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” and Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” illustrate a parallel customer program with each descriptor instance used only in a single thread.

Specify the number of OpenMP threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region” like this:

set MKL_NUM_THREADS = 1 for Intel® oneAPI Math Kernel Library (oneMKL) to work in the single-threaded mode (recommended);

set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.

Using Parallel Mode with Multiple Descriptors Initialized in a Parallel Region

Note that in this example, the program can be transformed to become single-threaded at the customer level but using parallel mode within Intel® oneAPI Math Kernel Library (oneMKL). To achieve this, you must set the parameter DFTI_NUMBER_OF_TRANSFORMS = 4 and to set the corresponding parameter DFTI_INPUT_DISTANCE = 5000.

program fft2d_private_descr_main
  use mkl_dfti

  integer nth, len(2)
! 4 OMP threads, each does 2D FFT 50x100 points
  parameter (nth = 4, len = (/50, 100/))
  complex x(len(2)*len(1), nth)

  type(dfti_descriptor), pointer :: myFFT
  integer th, myStatus

! assume x is initialized and do 2D FFTs
!$OMP PARALLEL DO SHARED(len, x) PRIVATE(myFFT, myStatus)
  do th = 1, nth
    myStatus = DftiCreateDescriptor (myFFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len)
    myStatus = DftiCommitDescriptor (myFFT)
    myStatus = DftiComputeForward (myFFT, x(:, th))
    myStatus = DftiFreeDescriptor (myFFT)
  end do
!$OMP END PARALLEL DO
end

Specify the number of OpenMP threads for Example “Using Parallel Mode with Multiple Descriptors Initialized in One Thread” like this:

set MKL_NUM_THREADS = 1 for Intel® oneAPI Math Kernel Library (oneMKL) to work in the single-threaded mode (obligatory);

set OMP_NUM_THREADS = 4 for the customer program to work in the multi-threaded mode.

Using Parallel Mode with Multiple Descriptors Initialized in One Thread

program fft2d_array_descr_main
  use mkl_dfti

  integer nth, len(2)
! 4 OMP threads, each does 2D FFT 50x100 points
  parameter (nth = 4, len = (/50, 100/))
  complex x(len(2)*len(1), nth)

  type thread_data
    type(dfti_descriptor), pointer :: FFT
  end type thread_data
  type(thread_data) :: workload(nth)

  integer th, status, myStatus

  do th = 1, nth
    status = DftiCreateDescriptor (workload(th)%FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len)
    status = DftiCommitDescriptor (workload(th)%FFT)
  end do
! assume x is initialized and do 2D FFTs
!$OMP PARALLEL DO SHARED(len, x, workload) PRIVATE(myStatus)
  do th = 1, nth
    myStatus = DftiComputeForward (workload(th)%FFT, x(:, th))
  end do
!$OMP END PARALLEL DO
  do th = 1, nth
    status = DftiFreeDescriptor (workload(th)%FFT)
  end do
end

Using Parallel Mode with a Common Descriptor

The following Example “Using Parallel Mode with a Common Descriptor” illustrates a parallel customer program with a common descriptor used in several threads.

 program fft2d_shared_descr_main
  use mkl_dfti

  integer nth, len(2)
! 4 OMP threads, each does 2D FFT 50x100 points
  parameter (nth = 4, len = (/50, 100/))
  complex x(len(2)*len(1), nth)
  type(dfti_descriptor), pointer :: FFT

  integer th, status, myStatus

  status = DftiCreateDescriptor (FFT, DFTI_SINGLE, DFTI_COMPLEX, 2, len)
  status = DftiCommitDescriptor (FFT)
! assume x is initialized and do 2D FFTs
!$OMP PARALLEL DO SHARED(len, x, FFT) PRIVATE(myStatus)
  do th = 1, nth
    myStatus = DftiComputeForward (FFT, x(:, th))
  end do
!$OMP END PARALLEL DO
  status = DftiFreeDescriptor (FFT)
end