Visible to Intel only — GUID: GUID-38BF5FB9-68EF-4300-9E8D-67AB41CBC757
Visible to Intel only — GUID: GUID-38BF5FB9-68EF-4300-9E8D-67AB41CBC757
Intrinsics for FP Gather and Scatter Operations
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
Intrinsic Name |
Operation |
Corresponding |
---|---|---|
_mm512_i32gather_pd, _mm512_mask_i32gather_pd |
Gathers double-precision (64-bit) floating-point elements from memory with 32-bit integer indices. |
VGATHERDPD |
_mm512_i32gather_ps, _mm512_mask_i32gather_ps |
Gathers single-precision (32-bit) vector elements from memory with 32-bit integer indices. |
VGATHERDPS |
_mm512_i32extgather_ps, _mm512_mask_i32extgather_ps |
Up-converts single-precision (32-bit) floating-point elements from memory with 32-bit integer indices. |
VGATHERDPS |
_mm512_i64gather_pd, _mm512_mask_i64gather_pd |
Gathers double-precision (64-bit) floating-point elements from memory with 64-bit integer indices. |
VGATHERQPD |
_mm512_i64gather_ps, _mm512_mask_i64gather_ps |
Gathers single-precision (32-bit) vector elements from memory with 64-bit integer indices. |
VGATHERQPS |
_mm512_prefetch_i32gather_pd, _mm512_mask_prefetch_i32gather_pd |
Gathers prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices. |
VGATHERPF0DPD, VGATHERPF1DPD |
_mm512_prefetch_i32gather_ps, _mm512_mask_prefetch_i32gather_ps |
Gathers prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices. |
VGATHERPF0DPS, VGATHERPF1DPS |
_mm512_prefetch_i64gather_pd, _mm512_mask_prefetch_i64gather_pd |
Gathers prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices. |
VGATHERPF0QPD, VGATHERPF1QPD |
_mm512_prefetch_i64gather_ps, _mm512_mask_prefetch_i64gather_ps |
Gathers prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices. |
VGATHERPF0QPS, VGATHERPF1QPS |
_mm512_i32scatter_pd, _mm512_mask_i32scatter_pd |
Scatters double-precision (64-bit) floating-point elements from memory with 32-bit integer indices. |
VSCATTERDPD |
_mm512_i32scatter_ps, _mm512_mask_i32scatter_ps |
Scatters single-precision (32-bit) floating-point elements from memory with 32-bit integer indices. |
VSCATTERDPD |
_mm512_i32extscatter_ps, _mm512_mask_i32extscatter_ps |
Down-converts single-precision (32-bit) floating-point elements from memory with 32-bit integer indices. |
VSCATTERDPS |
_mm512_i64scatter_pd, _mm512_mask_i64scatter_pd |
Scatters double-precision (64-bit) floating-point elements from memory with 64-bit integer indices. |
VSCATTERQPD |
_mm512_i64scatter_ps, _mm512_mask_i64scatter_ps |
Scatters single-precision (32-bit) floating-point elements from memory with 64-bit integer indices. |
VSCATTERQPS |
_mm512_prefetch_i32scatter_pd, _mm512_mask_prefetch_i32scatter_pd |
Scatters prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices. |
VSCATTERPF0DPD, VSCATTERPF1DPD |
_mm512_prefetch_i32scatter_ps, _mm512_mask_prefetch_i32scatter_ps |
Scatters prefetch double-precision (64-bit) floating-point elements with 32-bit integer indices. |
VSCATTERPF0DPS, VSCATTERPF1DPS |
_mm512_prefetch_i64scatter_pd, _mm512_mask_prefetch_i64scatter_pd |
Scatters prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices. |
VSCATTERPF0QPD, VSCATTERPF1QPD |
_mm512_prefetch_i64scatter_ps, _mm512_mask_prefetch_i64scatter_ps |
Scatters prefetch double-precision (64-bit) floating-point elements with 64-bit integer indices. |
VSCATTERPF0QPS, VSCATTERPF1QPS |
variable | definition |
---|---|
vindex | a vector of indices |
base_addr | a pointer to the base address in memory |
scale | a compilation-time literal constant that is used as the vector indices scale. Possible values are 1, 2, 4, or 8. |
k | mask used as a selector |
a | first source vector element |
src | source element to use based on the mask result |
upconv | Where _MM_UPCONV_PS_ENUM is the following:
|
index | a vector containing indexes in memory mv |
downconv | Where _MM_DOWNCONV_PS_ENUM is the following:
|
hint | Indicates which cache level to bring values into. _MM_HINT_ENUM is the following:
|
_mm512_i32gather_pd
__m512d _mm512_i32gather_pd (__m256i vindex, void const* base_addr, int scale)
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i32gather_pd
__m512d _mm512_mask_i32gather_pd (__m512d src, __mmask8 k, __m256i vindex, void const* base_addr, int scale)
Gathers double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.
_mm512_i32gather_ps
__m512 _mm512_i32gather_ps (__m512i vindex, void const* base_addr, int scale)
Gathers single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i32gather_ps
__m512 _mm512_mask_i32gather_ps (__m512 src, __mmask16 k, __m512i vindex, void const* base_addr, int scale)
Gathers single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.
_mm512_i32extgather_ps
__m512 _mm512_i32extgather_ps (__m512i index, void const * mv, _MM_UPCONV_PS_ENUM upconv, int scale, int hint)
Up-converts 16 memory locations starting at location mv using packed 32-bit integer indices stored in index scaled by scale using upconv to single-precision (32-bit) floating-point elements and stores them in dst.
_mm512_mask_i32extgather_ps
__m512 _mm512_mask_i32extgather_ps (__m512 src, __mmask16 k, __m512i index, void const * mv, _MM_UPCONV_PS_ENUM upconv, int scale, int hint)
Up-converts 16 single-precision memory locations starting at location mv at packed 32-bit integer indices stored in index scaled by scale using upconv to single-precision (32-bit) floating-point elements and merges them with src using mask k. When the corresponding mask bit is not set, elements are copied from src.
_mm512_i64gather_pd
__m512d _mm512_i64gather_pd (__m512i vindex, void const* base_addr, int scale)
Gathers double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i64gather_pd
__m512d _mm512_mask_i64gather_pd (__m512d src, __mmask8 k, __m512i vindex, void const* base_addr, int scale)
Gathers double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.
_mm512_i64gather_ps
__m256 _mm512_i64gather_ps (__m512i vindex, void const* base_addr, int scale)
Gathers single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i64gather_ps
__m256 _mm512_mask_i64gather_ps (__m256 src, __mmask8 k, __m512i vindex, void const* base_addr, int scale)
Gathers single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with src using mask k. When the corresponding mask bit is not set, elements are copied from src.
_mm512_prefetch_i32gather_pd
void _mm512_prefetch_i32gather_pd (__m256i vindex, void const* base_addr, int scale, int hint)
Prefetches double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i32gather_pd
void _mm512_mask_prefetch_i32gather_pd (__m256i vindex, __mmask8 mask, void const* base_addr, int scale, int hint)
Prefetches double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_prefetch_i32gather_ps
void _mm512_prefetch_i32gather_ps (__m512i index, void const* mv, int scale, int hint)
Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location mv at packed 32-bit integer indices stored in index (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i32gather_ps
void _mm512_mask_prefetch_i32gather_ps (__m512i vindex, __mmask16 mask, void const* base_addr, int scale, int hint)
Prefetches single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_prefetch_i64gather_pd
void _mm512_prefetch_i64gather_pd (__m512i vindex, void const* base_addr, int scale, int hint)
Prefetches double-precision (64-bit) floating-point elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i64gather_pd
void _mm512_mask_prefetch_i64gather_pd (__m512i vindex, __mmask8 mask, void const* base_addr, int scale, int hint)
Prefetches double-precision (64-bit) floating-point elements from memory into cache level specified by hint using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Prefetched elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_prefetch_i64gather_ps
void _mm512_prefetch_i32gather_pd (__m256i vindex, void const* base_addr, int scale, int hint)
Prefetches single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i64gather_ps
void _mm512_mask_prefetch_i64gather_ps (__m512i vindex, __mmask8 mask, void const* base_addr, int scale, int hint)
Prefetches single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged with cache using mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_i32scatter_pd
void _mm512_i32scatter_pd (void* base_addr, __m256i vindex, __m512d a, int scale)
Scatters double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i32scatter_pd
void _mm512_mask_i32scatter_pd (void* base_addr, __mmask8 k, __m256i vindex, __m512d a, int scale)
Scatters double-precision (64-bit) floating-point elements from a into memory using 32-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_i32scatter_ps
void _mm512_i32scatter_ps (void* base_addr, __m512i vindex, __m512 a, int scale)
Scatters single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i32scatter_ps
void _mm512_mask_i32scatter_ps (void* base_addr, __mmask16 k, __m512i vindex, __m512 a, int scale)
Scatters single-precision (32-bit) floating-point elements from a into memory using 32-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_i32extscatter_ps
void _mm512_i32extscatter_ps (void * mv, __m512i index, __m512 v1, _MM_DOWNCONV_PS_ENUM downconv, int scale, int hint)
Down-converts 16 packed single-precision (32-bit) floating-point elements in v1 and stores them in memory locations starting at location mv at packed 32-bit integer indices stored in index scaled by scale using downconv.
_mm512_mask_i32extscatter_ps
void _mm512_mask_i32extscatter_ps (void * mv, __mmask16 k, __m512i index, __m512 v1, _MM_DOWNCONV_PS_ENUM downconv, int scale, int hint)
Down-converts 16 packed single-precision (32-bit) floating-point elements in v1 according to downconv and stores them in memory locations starting at location mv at packed 32-bit integer indices stored in index scaled by scale using mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_i64scatter_pd
void _mm512_i64scatter_pd (void* base_addr, __m512i vindex, __m512d a, int scale)
Scatters double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i64scatter_pd
void _mm512_mask_i64scatter_pd (void* base_addr, __mmask8 k, __m512i vindex, __m512d a, int scale)
Scatters double-precision (64-bit) floating-point elements from a into memory using 64-bit indices. 64-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_i64scatter_ps
void _mm512_i64scatter_ps (void* base_addr, __m512i vindex, __m256 a, int scale)
Scatters single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_i64scatter_ps
void _mm512_mask_i64scatter_ps (void* base_addr, __mmask8 k, __m512i vindex, __m256 a, int scale)
Scatters single-precision (32-bit) floating-point elements from a into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_prefetch_i32scatter_pd
void _mm512_prefetch_i32scatter_pd (void* base_addr, __m256i vindex, int scale, int hint)
Prefetches double-precision (64-bit) floating-point elements with intent to write using 32-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i32scatter_pd
extern void __cdecl _mm512_mask_prefetch_i32gather_pd(__m256i vindex, __mmask8 k, void const* base_addr, int scale, int hint);
Prefetches double-precision (64-bit) floating-point elements with intent to write using 32-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_prefetch_i32scatter_ps
void _mm512_prefetch_i32scatter_ps (void* mv, __m512i index, int scale, int hint)
Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location mv at packed 32-bit integer indices stored in index scaled by scale.
_mm512_mask_prefetch_i32scatter_ps
void _mm512_mask_prefetch_i32scatter_ps (void* mv, __mmask16 k, __m512i index, int scale, int hint)
Prefetches 16 single-precision (32-bit) floating-point elements in memory starting at location mv at packed 32-bit integer indices stored in index scaled by scale. Elements are brought into cache only when their corresponding mask bits in mask k are set.
_mm512_prefetch_i64scatter_pd
void _mm512_prefetch_i64scatter_pd (void* base_addr, __m512i vindex, int scale, int hint)
Prefetches double-precision (64-bit) floating-point elements with intent to write into memory using 64-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i64scatter_pd
void _mm512_mask_prefetch_i64scatter_pd (void* base_addr, __mmask8 mask, __m512i vindex, int scale, int hint)
Prefetches double-precision (64-bit) floating-point elements with intent to write into memory using 64-bit indices. 64-bit elements are brought into cache from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.
_mm512_prefetch_i64scatter_ps
void _mm512_prefetch_i64scatter_ps (void* base_addr, __m512i vindex, int scale, int hint)
Prefetches single-precision (32-bit) floating-point elements with intent to write into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale).
_mm512_mask_prefetch_i64scatter_ps
void _mm512_mask_prefetch_i64scatter_ps (void* base_addr, __mmask8 mask, __m512i vindex, int scale, int hint)
Prefetches single-precision (32-bit) floating-point elements with intent to write into memory using 64-bit indices. 32-bit elements are stored at addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale) subject to mask k. Elements are brought into cache only when their corresponding mask bits are set.