Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Intrinsics for FP Insert and Extract Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_extractf32x4_ps, _mm512_mask_extractf32x4_ps, _mm512_maskz_extractf32x4_ps

Extract float32 values.

VEXTRACTF32X4

_mm512_extractf64x4_pd_mm512_mask_extractf64x4_pd, _mm512_maskz_extractf64x4_pd

Extract float64 values.

VEXTRACTF64X4

_mm_extract_ps

Extract packed float32 values.

EXTRACTPS

_mm512_getmant_pd, _mm512_mask_getmant_pd, _mm512_maskz_getmant_pd

_mm512_getmant_round_pd, _mm512_mask_getmant_round_pd, _mm512_maskz_getmant_round_pd

Extract float64 vector of normalized mantissas from float64 vector.

VGETMANTPD

_mm512_getmant_ps, _mm512_mask_getmant_ps, _mm512_maskz_getmant_ps

_mm512_getmant_round_ps, _mm512_mask_getmant_round_ps, _mm512_maskz_getmant_round_ps

Extract float32 vector of normalized mantissas from float32 vector.

VGETMANTPS

_mm512_getmant_ss, _mm512_mask_getmant_ss, _mm512_maskz_getmant_ss

_mm512_getmant_round_ss, _mm512_mask_getmant_round_ss, _mm512_maskz_getmant_round_ss

Extract float32 vector of normalized mantissas from float32 scalar.

VGETMANTSS

_mm512_getmant_sd, _mm512_mask_getmant_sd, _mm512_maskz_getmant_sd

_mm512_getmant_round_sd, _mm512_mask_getmant_round_sd, _mm512_maskz_getmant_round_sd

Extract float64 of normalized mantissas from float64 scalar.

VGETMANTSD

_mm512_insertf32x4, _mm512_mask_insertf32x4, _mm512_maskz_insertf32x4

Insert float32 values.

VINSERTF32X4

_mm512_insertf64x4, _mm512_mask_insertf64x4, _mm512_mask_insertf64x4

Insert float64 values.

VINSERTF64X4

_mm_insert_ps

Insert scalar float32 values.

VINSERTPS/INSERTPS


variable definition
k

writemask used as a selector

a

first source vector element

b

second source vector element

src

source element to use based on writemask result

imm

8-bit immediate integer specifies offset for destination

tmp

temporary storage location used during operation

interval

Where _MM_MANTISSA_NORM_ENUM can be one of the following:

  • _MM_MANT_NORM_1_2 - interval [1, 2)
  • _MM_MANT_NORM_p5_2 - interval [1.5, 2)
  • _MM_MANT_NORM_p5_1 - interval [1.5, 1)
  • _MM_MANT_NORM_p75_1p5 - interval [0.75, 1.5)

sign

Where _MM_MANTISSA_SIGN_ENUM can be one of the following:

  • _MM_MANT_SIGN_src - sign = sign(SRC)
  • _MM_MANT_SIGN_zero - sign = 0
  • _MM_MANT_SIGN_nan - DEST = NaN if sign(SRC) = 1

round

Rounding control values; these can be one of the following (along with the sae suppress all exceptions flag):

  • _MM_FROUND_TO_NEAREST_INT - rounds to nearest even
  • _MM_FROUND_TO_NEG_INF - rounds to negative infinity
  • _MM_FROUND_TO_POS_INF - rounds to positive infinity
  • _MM_FROUND_TO_ZERO - rounds to zero
  • _MM_FROUND_CUR_DIRECTION - rounds using default from MXCSR register


_mm512_extractf32x4_ps

extern __m128 __cdecl _mm512_extractf32x4_ps(__m512 a, int imm);

Extracts 128 bits (composed of four packed float32 elements) from a, selected with imm, and stores the result.


_mm512_mask_extractf32x4_ps

extern __m128 __cdecl _mm512_mask_extractf32x4_ps(__m128 src, __mmask8 k, __m512 a, int imm);

Extracts 128 bits (composed of four packed float32 elements) from a, selected with imm, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).


_mm512_maskz_extractf32x4_ps

extern __m128 __cdecl _mm512_maskz_extractf32x4_ps(__mmask8 k, __m512, int imm);

Extracts 128 bits (composed of four packed float32 elements) from a, selected with imm, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_extractf64x4_pd

extern __m256d __cdecl _mm512_extractf64x4_pd(__m512d a, int imm);

Extracts 256 bits (composed of four packed float64 elements) from a, selected with imm, and stores the result.


_mm512_mask_extractf64x4_pd

extern __m256d __cdecl _mm512_mask_extractf64x4_pd(__m256d src, __mmask8 k, __m512d a, int imm);

Extracts 256 bits (composed of four packed float64 elements) from a, selected with imm, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).


_mm512_maskz_extractf64x4_pd

extern __m256d __cdecl _mm512_maskz_extractf64x4_pd(__mmask8 k, __m512d a, int imm);

Extracts 256 bits (composed of four packed float64 elements) from a, selected with imm, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_insertf32x4

extern __m512 __cdecl _mm512_insertf32x4(__m512 a, __m128 b, int imm);

Copies a to destination, then inserts 128 bits (composed of four packed float32 elements) from b into destination at the location specified by imm.


_mm512_mask_insertf32x4

extern __m512 __cdecl _mm512_mask_insertf32x4(__m512 src, __mmask16 k, __m512 a, __m128 b, int imm);

Copies a to destination, then inserts 128 bits (composed of four packed float32 elements) from b into destination at the location specified by imm.


_mm512_maskz_insertf32x4

extern __m512 __cdecl _mm512_maskz_insertf32x4(__mmask16 k, __m512 a, __m128 b, int imm);

Copies a to tmp, then inserts 128 bits (composed of four packed float32 elements) from b into tmp at the location specified by imm. Stores tmp to destination using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_insertf64x4

extern __m512d __cdecl _mm512_insertf64x4(__m512d a, __m256d b, int imm);

Copies a to tmp, then inserts 128 bits (composed of four packed float32 elements) from b into tmp at the location specified by imm. Stores tmp to destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).



_mm512_mask_insertf64x4

extern __m512d __cdecl _mm512_mask_insertf64x4(__m512d src, __mmask8 k, __m512d a, __m256d b, int imm);

Copies a to destination, then inserts 256 bits (composed of four packed float64 elements) from b into destination at the location specified by imm.



_mm512_maskz_insertf64x4

extern __m512d __cdecl _mm512_maskz_insertf64x4(__mmask8 k, __m512d a, __m256d b, int imm);

Copies a to tmp, then inserts 256 bits (composed of four packed float64 elements) from b into tmp at the location specified by imm. Store tmp to destination using writemask k (elements are copied from src when the corresponding mask bit is not set).



_mm512_getmant_pd

extern __m512d __cdecl _mm512_getmant_pd(__m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float64 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_mask_getmant_pd

extern __m512d __cdecl _mm512_mask_getmant_pd(__m512d src, __mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float64 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_maskz_getmant_pd

extern __m512d __cdecl _mm512_maskz_getmant_pd(__mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float64 elements in a, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_getmant_round_pd

extern __m512d __cdecl _mm512_getmant_round_pd(__m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float64 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_mask_getmant_round_pd

extern __m512d __cdecl _mm512_mask_getmant_round_pd(__m512d src, __mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float64 elements in a, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_maskz_getmant_round_pd

extern __m512d __cdecl _mm512_maskz_getmant_round_pd(__mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float64 elements in a, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_getmant_ps

extern __m512 __cdecl _mm512_getmant_ps(__m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float32 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_mask_getmant_ps

extern __m512 __cdecl _mm512_mask_getmant_ps(__m512 src, __mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float32 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_maskz_getmant_ps

extern __m512 __cdecl _mm512_maskz_getmant_ps(__mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float32 elements in a, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_getmant_round_ps

extern __m512 __cdecl _mm512_getmant_round_ps(__m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float32 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_mask_getmant_round_ps

extern __m512 __cdecl _mm512_mask_getmant_round_ps(__m512 src, __mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float32 elements in a, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm512_maskz_getmant_round_ps

extern __m512 __cdecl _mm512_maskz_getmant_round_ps(__mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float32 elements in a, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_getmant_round_sd

extern __m128d __cdecl _mm_getmant_round_sd(__m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float64 element in a, stores the result in the lower destination element, and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_mask_getmant_round_sd

extern __m128d __cdecl _mm_mask_getmant_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float64 element in a, store the result in the lower destination element, and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_maskz_getmant_round_sd

extern __m128d __cdecl _mm_maskz_getmant_round_sd(__mmask8 k, __m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float64 element in a, stores the result in the lower destination element using writemask k (the element is copied from src when mask bit 0 is not set), and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_getmant_sd

extern __m128d __cdecl _mm_getmant_sd(__m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of the lower float64 element in a, store the result in the lower destination element, and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_mask_getmant_sd

extern __m128d __cdecl _mm_mask_getmant_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalize the mantissas of the lower float64 element in a, store the result in the lower destination element using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_maskz_getmant_sd

extern __m128d __cdecl _mm_maskz_getmant_sd(__mmask8 k, __m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of the lower float64 element in a, stores the result in the lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_getmant_round_ss

extern __m128 __cdecl _mm_getmant_round_ss(__m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element, and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_mask_getmant_round_ss

extern __m128 __cdecl _mm_mask_getmant_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element, and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_maskz_getmant_round_ss

extern __m128 __cdecl _mm_maskz_getmant_round_ss(__mmask8 k, __m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element using writemask k (the element is copied from src when mask bit 0 is not set), and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_getmant_ss

extern __m128 __cdecl _mm_getmant_ss(__m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element, and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_mask_getmant_ss

extern __m128 __cdecl _mm_mask_getmant_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element using writemask k (the element is copied from src when mask bit 0 is not set), and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.



_mm_maskz_getmant_ss

extern __m128 __cdecl _mm_maskz_getmant_ss(__mmask8 k, __m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.