Intel® Fortran Compiler Classic and Intel® Fortran Compiler Developer Guide and Reference

ID 767251
Date 6/24/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intel® Compiler Extension Routines to OpenMP*

The Intel® compiler implements the following group of routines as extensions to the OpenMP* runtime library:

  • Get and set the execution environment

  • Get and set the stack size for parallel threads

  • Memory allocation

  • Get and set the thread sleep time for the throughput execution mode

  • Target memory allocation

The Intel® extension routines described in this section can be used for low-level tuning to verify that the library code and application are functioning as intended. These routines are generally not recognized by other OpenMP-compliant compilers, which may cause the link stage to fail in the other compiler. To execute these OpenMP routines, use the /Qopenmp-stubs (Windows*) or -qopenmp-stubs (Linux*) option.

In most cases, environment variables can be used in place of the extension library routines. For example, the stack size of the parallel threads may be set using the OMP_STACKSIZE environment variable rather than the KMP_SET_STACKSIZE_S() library routine.

NOTE:

A runtime call to an Intel extension routine takes precedence over the corresponding environment variable setting.

Execution Environment

Function or Subroutine

Description

SUBROUTINE KMP_SET_DEFAULTS(STRING)
CHARACTER*(*) STRING

Sets OpenMP environment variables defined as a list of variables separated by "|" in the argument.

SUBROUTINE KMP_SET_LIBRARY_THROUGHPUT()

Sets execution mode to throughput, which is the default. Allows the application to determine the runtime environment. Use in multi-user environments.

SUBROUTINE KMP_SET_LIBRARY_TURNAROUND()

Sets execution mode to turnaround. Use in dedicated parallel (single user) environments.

SUBROUTINE KMP_SET_LIBRARY_SERIAL()

Sets execution mode to serial.

SUBROUTINE KMP_SET_LIBRARY(LIBNUM)
INTEGER (KIND=OMP_INTEGER_KIND) LIBNUM

Sets execution mode indicated by the value passed to the function. Valid values are:

  • 1 - serial mode

  • 2 - turnaround mode

  • 3 - throughput mode

Call this routine before the first parallel region is executed.

FUNCTION KMP_GET_LIBRARY()
INTEGER (KIND=OMP_INTEGER_KIND) KMP_GET_LIBRARY

Returns a value corresponding to the current execution mode:

  • 1 - serial

  • 2 - turnaround

  • 3 - throughput

Stack Size

Function or Subroutine

Description

FUNCTION KMP_GET_STACKSIZE_S()
INTEGER (KIND=KMP_SIZE_T_KIND) &
KMP_GET_STACKSIZE_S

Returns the number of bytes that will be allocated for each parallel thread to use as its private stack. This value can be changed with KMP_SET_STACKSIZE_S() routine, prior to the first parallel region or via the KMP_STACKSIZE environment variable.

FUNCTION KMP_GET_STACKSIZE()
INTEGER KMP_GET_STACKSIZE

Provided for backwards compatibility only. Use KMP_GET_STACKSIZE_S() routine for compatibility across different families of Intel processors.

SUBROUTINE KMP_SET_STACKSIZE_S(size)
INTEGER (KIND=KMP_SIZE_T_KIND) size

Sets to size the number of bytes that will be allocated for each parallel thread to use as its private stack. This value can also be set via the KMP_STACKSIZE environment variable. In order for KMP_SET_STACKSIZE_S() to have an effect, it must be called before the beginning of the first (dynamically executed) parallel region in the program.

SUBROUTINE KMP_SET_STACKSIZE_S(size)
INTEGER size

Provided for backward compatibility only. Use KMP_SET_STACKSIZE_S(size) for compatibility across different families of Intel® processors.

Memory Allocation

The Intel® compiler implements a group of memory allocation routines as an extension to the OpenMP runtime library to enable threads to allocate memory from a heap local to each thread. These routines are: KMP_MALLOC(), KMP_CALLOC(), and KMP_REALLOC().

The memory allocated by these routines must also be freed by the KMP_FREE() routine. While you can allocate memory in one thread and then free that memory in a different thread, this mode of operation incurs a slight performance penalty.

Working with the local heap might lead to improved application performance because synchronization is not required.

Function or Subroutine

Description

FUNCTION KMP_MALLOC(size)
INTEGER(KIND=KMP_POINTER_KIND)KMP_MALLOC
INTEGER(KIND=KMP_SIZE_T_KIND size

Allocates memory block of size bytes from thread-local heap.

FUNCTION KMP_CALLOC(nelem, elsize)
INTEGER(KIND=KMP_POINTER_KIND)KMP_CALLOC
INTEGER(KIND=KMP_SIZE_T_KIND) nelem
INTEGER(KIND=KMP_SIZE_T_KIND) elsize

Allocates array of nelem elements of size elsize from thread-local heap.

FUNCTION KMP_REALLOC(ptr, size)
INTEGER(KIND=KMP_POINTER_KIND)KMP_REALLOC
INTEGER(KIND=KMP_POINTER_KIND) ptr
INTEGER(KIND=KMP_SIZE_T_KIND) size

Reallocates memory block at address ptr and size bytes from thread-local heap.

SUBROUTINE KMP_FREE(ptr)
INTEGER (KIND=KMP_POINTER_KIND) ptr

Frees memory block at address ptr from thread-local heap.

The memory must have been previously allocated with KMP_MALLOC(), KMP_CALLOC(), or KMP_REALLOC().

Thread Sleep Time

In the throughput OpenMP* Support Libraries, threads wait for new parallel work at the ends of parallel regions, and then sleep, after a specified period of time. This time interval can be set by the KMP_BLOCKTIME environment variable or by the KMP_SET_BLOCKTIME() function.

Function

Description

FUNCTION KMP_GET_BLOCKTIME()
INTEGER KMP_GET_BLOCKTIME

Returns the number of milliseconds that a thread should wait, after completing the execution of a parallel region, before sleeping, as set either by the KMP_BLOCKTIME environment variable or by KMP_SET_BLOCKTIME().

FUNCTION KMP_SET_BLOCKTIME(msec)
INTEGER msec

Sets the number of milliseconds that a thread should wait, after completing the execution of a parallel region, before sleeping. This routine affects the block time setting for the calling thread and any OpenMP team threads formed by the calling thread. The routine does not affect the block time for any other threads.

Target Memory Allocation

This feature is only available for ifx.

Function Description

FUNCTION ompx_target_aligned_alloc (align, size, &
                     device_num) 
USE,INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, &
                     C_SIZE_T, C_INT
TYPE(C_PTR) ::  ompx_target_aligned_alloc
INTEGER(C_SIZE_T)  :: align
INTEGER(C_SIZE_T)  :: size
INTGER(C_INT) :: device_num

Allocates device memory that is aligned to the specified alignment argument align for the specified device device_num. The returned memory can be accessed only by the specified device.

FUNCTION ompx_target_aligned_alloc_device (align, &
                    size, device_num) 
USE,INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, &
                    C_SIZE_T, C_INT
TYPE(C_PTR) ::  ompx_target_aligned_alloc_device
INTEGER(C_SIZE_T)  :: align
INTEGER(C_SIZE_T)  :: size
INTGER(C_INT) :: device_num

Allocates device memory that is aligned to the specified alignment argument align for the specified device device_num. The returned memory can be accessed only by the specified device.

FUNCTION ompx_target_aligned_alloc_host (align, &
                    size, device_num) 
USE,INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, &
                    C_SIZE_T, C_INT
TYPE(C_PTR) ::  ompx_target_aligned_alloc_host
INTEGER(C_SIZE_T)  :: align
INTEGER(C_SIZE_T)  :: size
INTGER(C_INT) :: device_num

Allocates device memory that is aligned to the specified alignment argument align for the specified device device_num. The returned memory can be accessed by the host and all supported devices.

FUNCTION ompx_target_aligned_alloc_shared (align, &
                    size, device_num) 
USE,INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, &
                    C_SIZE_T, C_INT
TYPE(C_PTR) ::  ompx_target_aligned_alloc_shared
INTEGER(C_SIZE_T)  :: align
INTEGER(C_SIZE_T)  :: size
INTGER(C_INT) :: device_num

Allocates device memory that is aligned to the specified alignment argument align for the specified device device_num. The returned memory can be accessed by the host and the specified device.

FUNCTION &
   ompx_target_aligned_alloc_shared_with_hint (align, & 
                    size, access_hint, device_num) 
USE,INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, &
                    C_SIZE_T, C_INT
USE OMP_LIB_KINDS
TYPE(C_PTR) ::  ompx_target_aligned_alloc_shared
INTEGER(C_SIZE_T)  :: align
INTEGER(C_SIZE_T)  :: size
INTGER(OMP_INTEGER_KIND) :: access_hint
INTGER(OMP_INTEGER_KIND) :: device_num

Allocates device memory that is aligned to the specified alignment argument align for the specified device device_num with the specified access_hint. The returned memory can be accessed by the host and the specified device.

The following named constants are allowed for access_hint:

  • ompx_mem_hint_read_mostly

  • ompx_mem_hint_prefer_device

  • ompx_mem_hint_non_atomic_mostly

  • ompx_mem_hint_cached

  • ompx_mem_hint_uncached

FUNCTION omp_target_alloc_host (size, device_num)
USE, INTRINSIC :: ISO_C_BINDING 
TYPE(C_PTR)    :: omp_target_alloc_host
INTEGER(c_size_t) :: size 
INTEGER(c_int)    :: device_num 

Returns the address of a storage location that is size bytes in length allocated in host memory. The same pointer can be used to access the memory on the host and all supported devices.

If the allocation request fails, a null pointer is returned.

FUNCTION omp_target_alloc_device (size, device_num)
USE, INTRINSIC :: ISO_C_BINDING 
TYPE(C_PTR)    :: omp_target_alloc_device
INTEGER(c_size_t) :: size 
INTEGER(c_int)    :: device_num 

Returns the address of a storage allocation that is size bytes in length. Device allocations are owned by the device specified by device_num in device memory if present. Generally, the allocation can be accessed only by the device, but it can be copied to other device or host allocated memory.

If the allocation was not successful, a null pointer is returned.

FUNCTION omp_target_alloc_shared (size, device_num)
USE, INTRINSIC :: ISO_C_BINDING 
TYPE(C_PTR)    :: omp_target_alloc_shared
INTEGER(c_size_t) :: size 
INTEGER(c_int)    :: device_num 

Returns the address of a storage allocation that is size bytes in length. The same pointer may be used to access the memory on the host and the specified device. Shared allocations are shared by the host and the specified device, and are intended to migrate between the host and the device.

If the allocation was not successful, a null pointer is returned.

FUNCTION omp_target_realloc (ptr, size, device_num) 
USE, INTRINSIC :: ISO_C_BINDING, ONLY C_PTR, &  
TYPE (C_PTR)   :: ompx_target_realloc
TYPE (C_PTR),VALUE :: ptr
INTEGER(C_SIZE_T)  :: size           
INTEGER(C_INT)     :: device_num   

Deallocates the device memory specified with ptr and allocates a new device memory with the specified size in bytes for the given device device_num. The returned memory can be accessed only by the specified device.

The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and size argument.

FUNCTION ompx_target_realloc_device (ptr, size, &
                    device_num) 
USE,INTRINSIC :: ISO_C_BINDING, ONLY: C_PTR,  &
                    C_SIZE_T, C_INT
TYPE(C_PTR) :: ompx_target_realloc_device
TYPE(C_PTR) :: ptr 
INTEGER(C_SIZE_T) :: size   
INTEGER(C_INT)    :: device_num

Deallocates the device memory specified with ptr and allocates a new device memory with the specified size in bytes for the given device device_num. The returned memory can be accessed only by the specified device.

The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and size argument.

FUNCTION ompx_target_realloc_host (ptr, size,  &
                    device_num)
USE,INTRINSIC :: ISO_C_BINDING, ONLY: C_PTR,  &
                    C_SIZE_T, C_INT 
TYPE(C_PTR) :: ompx_target_realloc_host
TYPE(C_PTR) :: ptr 
 INTEGER(C_SIZE_T) :: size   
INTEGER(C_INT)     :: device_num         

Deallocates the device memory specified with ptr and allocates a new device memory with the specified size in bytes for the given device device_num. The returned memory can be accessed by the host and all supported devices.

The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and size argument.

FUNCTION ompx_target_realloc_shared (ptr, size, &
                   device_num)
USE,INTRINSIC :: ISO_C_BINDING, ONLY: C_PTR,  &
                   C_SIZE_T, C_INT 
TYPE(C_PTR) ::  ompx_target_realloc_shared
TYPE(C_PTR) :: ptr 
INTEGER(C_SIZE_T) :: size   
INTEGER(C_INT)     :: device_num                  

Deallocates the device memory specified with ptr and allocates a new device memory with the specified sizein bytes for the given device device_num. The returned memory can be accessed by the host and the specified device.

The contents of the new memory object are the same as that of the old object prior to deallocation up to the minimum size of old allocated size and size argument.

Target Offload

This feature is only available for ifx.

Function or Subroutine Description

FUNCTION omp_get_device_from_ptr (ptr)
USE OMP_LIB_KINDS
USE,INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR
INTEGER(OMP_INTEGER_KIND) :: omp_get_device_from_ptr
TYPE(C_PTR) :: ptr

Returns OpenMP device number which the specified device pointer ptr is allocated on. The function returns a valid OpenMP device number if successful; otherwise, a negative number.

FUNCTION ompx_get_num_subdevices (device_num, level)
USE,INTRINSIC :: ISO_C_BINDING, ONLY : C_INT
INTEGER(C_INT) ompx_get_num_subdevices
INTEGER(C_INT) :: device_num
INTEGER(C_INT :: level

Returns the number of subdevices supported by the given device ID (device_num) at the specified level.

FUNCTION ompx_target_register_host_pointer (ptr, size, &
                    device_num)  
USE omp_lib_kinds
USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR, C_SIZE_T
INTEGER(omp_integer_kind) ompx_target_rester_host_pointer
TYPE(C_PTR)  :: ptr
INTEGER(C_SIZE_T) :: size
INTEGER(omp_integer_kind) :: device_num

Registers the specified host pointer ptr for efficient memory copy between ptr and a device pointer allocated for device_num. The function returns a non-zero value if successful; otherwise, zero.

NOTE:

This is only available for Linux.

SUBROUTINE ompx_target_unregister_host_pointer (ptr, & 
                      device) 
USE omp_lib_kinds
USE,INTRINSIC  :: ISOC_C_BINDING : ONLY C_PTR
TYPE(C_PTR) :: ptr
INTEGER(omp_integer_kind)  :: device

Unregisters the specified host pointer ptr.

NOTE:

This is only available for Linux.