Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for Integer Load and Store Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_load_epi32, _mm512_mask_load_epi32, _mm512_maskz_load_epi32

Load packed int32 elements from memory

VMOVDQA32

_mm512_load_epi64, _mm512_mask_load_epi64, _mm512_maskz_load_epi64

Load packed int64 elements from memory

VMOVDQA64

_mm512_loadu_si512

Unaligned load of 512-bit scalar integer

VMOVDQU32

_mm512_mask_loadu_epi32, _mm512_maskz_loadu_epi32

Unaligned load of packed int32 elements

VMOVDQU32

_mm512_mask_loadu_epi64, _mm512_maskz_loadu_epi64

Unaligned load of packed int64 elements

VMOVDQU64

_mm512_stream_load_si512

Load double quadword using non-temporal aligned hint.

MOVNTDQA

_mm512_mask_storeu_epi64

Store unaligned packed int64 elements

VMOVDQU64

_mm512_stream_si512

Store packed integer values using non-temporal hint.

VMOVNTDQA


variable definition
k

writemask used as a selector

a

first source vector element

mem_addr

pointer to base address in memory

src

source element to use based on writemask result


_mm512_load_si512

extern __m512i __cdecl _mm512_load_si512(void const* mem_addr);

Load 512-bits of integer data from memory into destination.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_loadu_si512

extern __m512i __cdecl _mm512_loadu_si512(void const* mem_addr);

Load 512-bits of integer data from memory into destination.

mem_addr does not need to be aligned on any particular boundary.



_mm512_load_epi32

extern __m512i __cdecl _mm512_load_epi32(void const* mem_addr);

Load 512-bits (composed of sixteen packed 32-bit integers) from memory into destination.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_load_epi32

extern __m512i __cdecl _mm512_mask_load_epi32(__m512i src, __mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_maskz_load_epi32

extern __m512i __cdecl _mm512_maskz_load_epi32(__mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_load_epi64

extern __m512i __cdecl _mm512_load_epi64(void const* mem_addr);

Load 512-bits (composed of eight packed int64 elements ) from memory into destination.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_load_epi64

extern __m512i __cdecl _mm512_mask_load_epi64(__m512i src, __mmask8 k, void const* mem_addr);

Load packed int64 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_maskz_load_epi64

extern __m512i __cdecl _mm512_maskz_load_epi64(__mmask8 k, void const* mem_addr);

Load packed int64 elements from memory into destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_loadu_epi32

extern __m512i __cdecl _mm512_mask_loadu_epi32(__m512i src, __mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr does not need to be aligned on any particular boundary.



_mm512_maskz_loadu_epi32

extern __m512i __cdecl _mm512_maskz_loadu_epi32(__mmask16 k, void const* mem_addr);

Load packed int32 elements from memory into destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

mem_addr does not need to be aligned on any particular boundary.



_mm512_mask_loadu_epi64

extern __m512i __cdecl _mm512_mask_loadu_epi64(__m512i src, __mmask8 k, void const* mem_addr);

Load packed int64 elements from memory into destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

mem_addr does not need to be aligned on any particular boundary.



_mm512_stream_load_si512

extern __m512i __cdecl _mm512_stream_load_si512(void * mem_addr);

Load 512-bits of integer data from memory into destination using a non-temporal memory hint.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_store_epi32

extern void __cdecl _mm512_store_epi32(void* mem_addr, __m512i a);

Store 512-bits (composed of sixteen packed 32-bit integers) from a into memory.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_store_epi32

extern void __cdecl _mm512_mask_store_epi32(void* mem_addr, __mmask16 k, __m512i a);

Store packed int32 elements from a into memory using writemask k.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_store_si512

extern void __cdecl _mm512_store_si512(void* mem_addr, __m512i a);

Store 512-bits of integer data from a into memory.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_store_epi64

extern void __cdecl _mm512_store_epi64(void* mem_addr, __m512i a);

Store 512-bits (composed of eight packed int64 elements ) from a into memory.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_store_epi64

extern void __cdecl _mm512_mask_store_epi64(void* mem_addr, __mmask8 k, __m512i a);

Store packed int64 elements from a into memory using writemask k.

mem_addr must be aligned on a 64-byte boundary or a general-protection exception will be generated.



_mm512_mask_storeu_epi32

extern void __cdecl _mm512_mask_storeu_epi32(void* mem_addr, __mmask16 k, __m512i a);

Store packed int32 elements from a into memory using writemask k.

mem_addr does not need to be aligned on any particular boundary.



_mm512_mask_storeu_epi64

extern void __cdecl _mm512_mask_storeu_epi64(void* mem_addr, __mmask8 k, __m512i a);

Store packed int64 elements from a into memory using writemask k.

mem_addr does not need to be aligned on any particular boundary.



_mm512_storeu_si512

extern void __cdecl _mm512_storeu_si512(void* mem_addr, __m512i a);

Store 512-bits of integer data from a into memory.

mem_addr does not need to be aligned on any particular boundary.



_mm512_stream_si512

extern void __cdecl _mm512_stream_si512(void* mem_addr, __m512i a);

Store 512-bits of integer data from a into memory using a non-temporal memory hint.