Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Intrinsics for FP Gather and Scatter Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_i32gather_pd, _mm512_mask_i32gather_pd

Gathers double-precision (64-bit) floating-point elements from memory with 32-bit integer indices.

VGATHERDPD

_mm512_i32gather_ps, _mm512_mask_i32gather_ps

Gathers single-precision (32-bit) vector elements from memory with 32-bit integer indices.

VGATHERDPS

_mm512_i32extgather_ps, _mm512_mask_i32extgather_ps

Up-converts single-precision (32-bit) floating-point elements from memory with 32-bit integer indices.

VGATHERDPS

_mm512_i64gather_pd, _mm512_mask_i64gather_pd

Gathers double-precision (64-bit) floating-point elements from memory with 64-bit integer indices.

VGATHERQPD

_mm512_i64gather_ps, _mm512_mask_i64gather_ps

Gathers single-precision (32-bit) vector elements from memory with 64-bit integer indices.

VGATHERQPS

_mm512_prefetch_i32gather_pd, _mm512_mask_prefetch_i32gather_pd

Gathers prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices.

VGATHERPF0DPD, VGATHERPF1DPD

_mm512_prefetch_i32gather_ps, _mm512_mask_prefetch_i32gather_ps

Gathers prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices.

VGATHERPF0DPS, VGATHERPF1DPS

_mm512_prefetch_i64gather_pd, _mm512_mask_prefetch_i64gather_pd

Gathers prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices.

VGATHERPF0QPD, VGATHERPF1QPD

_mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_ps

Gathers prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices.

VGATHERPF0QPS, VGATHERPF1QPS

_mm512_i32scatter_pd, _mm512_mask_i32scatter_pd

Scatters double-precision (64-bit) floating-point elements from memory with 32-bit integer indices.

VSCATTERDPD

_mm512_i32scatter_ps, _mm512_mask_i32scatter_ps

Scatters single-precision (32-bit) floating-point elements from memory with 32-bit integer indices.

VSCATTERDPD

_mm512_i32extscatter_ps, _mm512_mask_i32extscatter_ps

Down-converts single-precision (32-bit) floating-point elements from memory with 32-bit integer indices.

VSCATTERDPS

_mm512_i64scatter_pd, _mm512_mask_i64scatter_pd

Scatters double-precision (64-bit) floating-point elements from memory with 64-bit integer indices.

VSCATTERQPD

_mm512_i64scatter_ps, _mm512_mask_i64scatter_ps

Scatters single-precision (32-bit) floating-point elements from memory with 64-bit integer indices.

VSCATTERQPS

_mm512_prefetch_i32scatter_pd, _mm512_mask_prefetch_i32scatter_pd

Scatters prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices.

VSCATTERPF0DPD, VSCATTERPF1DPD

_mm512_prefetch_i32scatter_ps, _mm512_mask_prefetch_i32scatter_ps

Scatters prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices.

VSCATTERPF0DPS, VSCATTERPF1DPS

_mm512_prefetch_i64scatter_pd, _mm512_mask_prefetch_i64scatter_pd

Scatters prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices.

VSCATTERPF0QPD, VSCATTERPF1QPD

_mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_ps

Scatters prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices.

VSCATTERPF0QPS, VSCATTERPF1QPS


variable definition
vindex

a vector of indices

base_addr

a pointer to the base address in memory

scale

a compilation-time literal constant that is used as the vector indices scale. Possible values are 1, 2, 4, or 8.

k

mask used as a selector

a

first source vector element

src

source element to use based on the mask result

upconv

Where _MM_UPCONV_PS_ENUM is the following:

  • _MM_UPCONV_PS_NONE - no conversion

index

a vector containing indexes in memory mv

downconv

Where _MM_DOWNCONV_PS_ENUM is the following:

  • _MM_DOWNCONV_PS_NONE - no conversion

hint

Indicates which cache level to bring values into. _MM_HINT_ENUM is the following:

  • _MM_HINT_NONE 0x0 - Off


_mm512_i32gather_pd

__m512d _mm512_i32gather_pd (__m256i vindex, void const* base_addr, int scale)

Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32gather_pd

__m512d _mm512_mask_i32gather_pd (__m512d src, __mmask8 k, __m256i vindex, void const* base_addr, int scale)

Gathers double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i32gather_ps

__m512 _mm512_i32gather_ps (__m512i vindex, void const* base_addr, int scale)

Gathers single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32gather_ps

__m512 _mm512_mask_i32gather_ps (__m512 src, __mmask16 k, __m512i vindex, void const* base_addr, int scale)

Gathers single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i32extgather_ps

__m512 _mm512_i32extgather_ps (__m512i index, void const * mv, _MM_UPCONV_PS_ENUM upconv, int scale, int hint)

Up-converts 16 memory locations starting at location mv using packed 32-bit integer indices stored in index scaled by scale using upconv to single-precision (32-bit) floating-point elements and stores them in dst.


_mm512_mask_i32extgather_ps

__m512 _mm512_mask_i32extgather_ps (__m512 src, __mmask16 k, __m512i index, void const * mv, _MM_UPCONV_PS_ENUM upconv, int scale, int hint)

Up-converts 16 single-precision memory locations starting at location mv at packed 32-bit integer indices stored in index scaled by scale using upconv to single-precision (32-bit) floating-point elements and merges them with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i64gather_pd

__m512d _mm512_i64gather_pd (__m512i vindex, void const* base_addr, int scale)

Gathers double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64gather_pd

__m512d _mm512_mask_i64gather_pd (__m512d src, __mmask8 k, __m512i vindex, void const* base_addr, int scale)

Gathers double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_i64gather_ps

__m256 _mm512_i64gather_ps (__m512i vindex, void const* base_addr, int scale)

Gathers single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64gather_ps

__m256 _mm512_mask_i64gather_ps (__m256 src, __mmask8 k, __m512i vindex, void const* base_addr, int scale)

Gathers single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.


_mm512_prefetch_i32gather_pd

void _mm512_prefetch_i32gather_pd (__m256i vindex, void const* base_addr, int scale, int hint)

Prefetches double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i32gather_pd

void _mm512_mask_prefetch_i32gather_pd (__m256i vindex, __mmask8 mask, void const* base_addr, int scale, int hint)

Prefetches double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_prefetch_i32gather_ps

void _mm512_prefetch_i32gather_ps (__m512i index, void const* mv, int scale, int hint)

Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location mv at packed 32-bit integer indices stored in index (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i32gather_ps

void _mm512_mask_prefetch_i32gather_ps (__m512i vindex, __mmask16 mask, void const* base_addr, int scale, int hint)

Prefetches single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_prefetch_i64gather_pd

void _mm512_prefetch_i64gather_pd (__m512i vindex, void const* base_addr, int scale, int hint)

Prefetches double-precision (64-bit) floating-point elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i64gather_pd

void _mm512_mask_prefetch_i64gather_pd (__m512i vindex, __mmask8 mask, void const* base_addr, int scale, int hint)

Prefetches double-precision (64-bit) floating-point elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Prefetched elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_prefetch_i64gather_ps

void _mm512_prefetch_i32gather_pd (__m256i vindex, void const* base_addr, int scale, int hint)

Prefetches single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i64gather_ps

void _mm512_mask_prefetch_i64gather_ps (__m512i vindex, __mmask8 mask, void const* base_addr, int scale, int hint)

Prefetches single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_i32scatter_pd

void _mm512_i32scatter_pd (void* base_addr, __m256i vindex, __m512d a, int scale)

Scatters double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32scatter_pd

void _mm512_mask_i32scatter_pd (void* base_addr, __mmask8 k, __m256i vindex, __m512d a, int scale)

Scatters double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_i32scatter_ps

void _mm512_i32scatter_ps (void* base_addr, __m512i vindex, __m512 a, int scale)

Scatters single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i32scatter_ps

void _mm512_mask_i32scatter_ps (void* base_addr, __mmask16 k, __m512i vindex, __m512 a, int scale)

Scatters single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_i32extscatter_ps

void _mm512_i32extscatter_ps (void * mv, __m512i index, __m512 v1, _MM_DOWNCONV_PS_ENUM downconv, int scale, int hint)

Down-converts 16 packed single-precision (32-bit) floating-point elements in v1 and stores them in memory locations starting at location mv at packed 32-bit integer indices stored in index scaled by scale using downconv.


_mm512_mask_i32extscatter_ps

void _mm512_mask_i32extscatter_ps (void * mv, __mmask16 k, __m512i index, __m512 v1, _MM_DOWNCONV_PS_ENUM downconv, int scale, int hint)

Down-converts 16 packed single-precision (32-bit) floating-point elements in v1 according to downconv and stores them in memory locations starting at location mv at packed 32-bit integer indices stored in index scaled by scale using mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_i64scatter_pd

void _mm512_i64scatter_pd (void* base_addr, __m512i vindex, __m512d a, int scale)

Scatters double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64scatter_pd

void _mm512_mask_i64scatter_pd (void* base_addr, __mmask8 k, __m512i vindex, __m512d a, int scale)

Scatters double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_i64scatter_ps

void _mm512_i64scatter_ps (void* base_addr, __m512i vindex, __m256 a, int scale)

Scatters single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_i64scatter_ps

void _mm512_mask_i64scatter_ps (void* base_addr, __mmask8 k, __m512i vindex, __m256 a, int scale)

Scatters single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_prefetch_i32scatter_pd

void _mm512_prefetch_i32scatter_pd (void* base_addr, __m256i vindex, int scale, int hint)

Prefetches double-precision (64-bit) floating-point elements with intent to write using 32-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i32scatter_pd

extern void __cdecl _mm512_mask_prefetch_i32gather_pd(__m256i vindex, __mmask8 k, void const* base_addr, int scale, int hint);

Prefetches double-precision (64-bit) floating-point elements with intent to write using 32-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_prefetch_i32scatter_ps

void _mm512_prefetch_i32scatter_ps (void* mv, __m512i index, int scale, int hint)

Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location mv at packed 32-bit integer indices stored in index scaled by scale.


_mm512_mask_prefetch_i32scatter_ps

void _mm512_mask_prefetch_i32scatter_ps (void* mv, __mmask16 k, __m512i index, int scale, int hint)

Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location mv at packed 32-bit integer indices stored in index scaled by scale. Elements are brought into cache only when their corresponding mask bits in mask k are set.


_mm512_prefetch_i64scatter_pd

void _mm512_prefetch_i64scatter_pd (void* base_addr, __m512i vindex, int scale, int hint)

Prefetches double-precision (64-bit) floating-point elements with intent to write into memory using 64-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i64scatter_pd

void _mm512_mask_prefetch_i64scatter_pd (void* base_addr, __mmask8 mask, __m512i vindex, int scale, int hint)

Prefetches double-precision (64-bit) floating-point elements with intent to write into memory using 64-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.


_mm512_prefetch_i64scatter_ps

void _mm512_prefetch_i64scatter_ps (void* base_addr, __m512i vindex, int scale, int hint)

Prefetches single-precision (32-bit) floating-point elements with intent to write into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).


_mm512_mask_prefetch_i64scatter_ps

void _mm512_mask_prefetch_i64scatter_ps (void* base_addr, __mmask8 mask, __m512i vindex, int scale, int hint)

Prefetches single-precision (32-bit) floating-point elements with intent to write into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.