Intrinsics for FP Insert and Extract Operations

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 7/13/2023

Version

Public

Visible to Intel only — GUID: GUID-9C4EC8FF-251F-4741-87FD-AF0DF8E9D3A5

View Details

Intrinsics for FP Insert and Extract Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

Intrinsic Name	Operation	Corresponding Intel® AVX-512 Instruction
`_mm512_extractf32x4_ps`, `_mm512_mask_extractf32x4_ps`, `_mm512_maskz_extractf32x4_ps`	Extract float32 values.	`VEXTRACTF32X4`
`_mm512_extractf64x4_pd_mm512_mask_extractf64x4_pd`, `_mm512_maskz_extractf64x4_pd`	Extract float64 values.	`VEXTRACTF64X4`
`_mm_extract_ps`	Extract packed float32 values.	`EXTRACTPS`
`_mm512_getmant_pd`, `_mm512_mask_getmant_pd`, `_mm512_maskz_getmant_pd` `_mm512_getmant_round_pd`, `_mm512_mask_getmant_round_pd`, `_mm512_maskz_getmant_round_pd`	Extract float64 vector of normalized mantissas from float64 vector.	`VGETMANTPD`
`_mm512_getmant_ps`, `_mm512_mask_getmant_ps`, `_mm512_maskz_getmant_ps` `_mm512_getmant_round_ps`, `_mm512_mask_getmant_round_ps`, `_mm512_maskz_getmant_round_ps`	Extract float32 vector of normalized mantissas from float32 vector.	`VGETMANTPS`
`_mm512_getmant_ss`, `_mm512_mask_getmant_ss`, `_mm512_maskz_getmant_ss` `_mm512_getmant_round_ss`, `_mm512_mask_getmant_round_ss`, `_mm512_maskz_getmant_round_ss`	Extract float32 vector of normalized mantissas from float32 scalar.	`VGETMANTSS`
`_mm512_getmant_sd`, `_mm512_mask_getmant_sd`, `_mm512_maskz_getmant_sd` `_mm512_getmant_round_sd`, `_mm512_mask_getmant_round_sd`, `_mm512_maskz_getmant_round_sd`	Extract float64 of normalized mantissas from float64 scalar.	`VGETMANTSD`
`_mm512_insertf32x4`, `_mm512_mask_insertf32x4`, `_mm512_maskz_insertf32x4`	Insert float32 values.	`VINSERTF32X4`
`_mm512_insertf64x4`, `_mm512_mask_insertf64x4`, `_mm512_mask_insertf64x4`	Insert float64 values.	`VINSERTF64X4`
`_mm_insert_ps`	Insert scalar float32 values.	`VINSERTPS`/`INSERTPS`

variable	definition
`k`	writemask used as a selector
`a`	first source vector element
`b`	second source vector element
`src`	source element to use based on writemask result
`imm`	8-bit immediate integer specifies offset for destination
`tmp`	temporary storage location used during operation
`interval`	Where `_MM_MANTISSA_NORM_ENUM` can be one of the following: `_MM_MANT_NORM_1_2` - interval [1, 2) `_MM_MANT_NORM_p5_2` - interval [1.5, 2) `_MM_MANT_NORM_p5_1` - interval [1.5, 1) `_MM_MANT_NORM_p75_1p5` - interval [0.75, 1.5)
`sign`	Where `_MM_MANTISSA_SIGN_ENUM` can be one of the following: `_MM_MANT_SIGN_src` - sign = sign(SRC) `_MM_MANT_SIGN_zero` - sign = 0 `_MM_MANT_SIGN_nan` - DEST = NaN if sign(SRC) = 1
`round`	Rounding control values; these can be one of the following (along with the `sae` suppress all exceptions flag): `_MM_FROUND_TO_NEAREST_INT` - rounds to nearest even `_MM_FROUND_TO_NEG_INF` - rounds to negative infinity `_MM_FROUND_TO_POS_INF` - rounds to positive infinity `_MM_FROUND_TO_ZERO` - rounds to zero `_MM_FROUND_CUR_DIRECTION` - rounds using default from MXCSR register

_mm512_extractf32x4_ps

extern __m128 __cdecl _mm512_extractf32x4_ps(__m512 a, int imm);

Extracts 128 bits (composed of four packed float32 elements) from a, selected with imm, and stores the result.

_mm512_mask_extractf32x4_ps

extern __m128 __cdecl _mm512_mask_extractf32x4_ps(__m128 src, __mmask8 k, __m512 a, int imm);

Extracts 128 bits (composed of four packed float32 elements) from a, selected with imm, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_maskz_extractf32x4_ps

extern __m128 __cdecl _mm512_maskz_extractf32x4_ps(__mmask8 k, __m512, int imm);

Extracts 128 bits (composed of four packed float32 elements) from a, selected with imm, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_extractf64x4_pd

extern __m256d __cdecl _mm512_extractf64x4_pd(__m512d a, int imm);

Extracts 256 bits (composed of four packed float64 elements) from a, selected with imm, and stores the result.

_mm512_mask_extractf64x4_pd

extern __m256d __cdecl _mm512_mask_extractf64x4_pd(__m256d src, __mmask8 k, __m512d a, int imm);

Extracts 256 bits (composed of four packed float64 elements) from a, selected with imm, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_maskz_extractf64x4_pd

extern __m256d __cdecl _mm512_maskz_extractf64x4_pd(__mmask8 k, __m512d a, int imm);

Extracts 256 bits (composed of four packed float64 elements) from a, selected with imm, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_insertf32x4

extern __m512 __cdecl _mm512_insertf32x4(__m512 a, __m128 b, int imm);

Copies a to destination, then inserts 128 bits (composed of four packed float32 elements) from b into destination at the location specified by imm.

_mm512_mask_insertf32x4

extern __m512 __cdecl _mm512_mask_insertf32x4(__m512 src, __mmask16 k, __m512 a, __m128 b, int imm);

Copies a to destination, then inserts 128 bits (composed of four packed float32 elements) from b into destination at the location specified by imm.

_mm512_maskz_insertf32x4

extern __m512 __cdecl _mm512_maskz_insertf32x4(__mmask16 k, __m512 a, __m128 b, int imm);

Copies a to tmp, then inserts 128 bits (composed of four packed float32 elements) from b into tmp at the location specified by imm. Stores tmp to destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_insertf64x4

extern __m512d __cdecl _mm512_insertf64x4(__m512d a, __m256d b, int imm);

Copies a to tmp, then inserts 128 bits (composed of four packed float32 elements) from b into tmp at the location specified by imm. Stores tmp to destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_mask_insertf64x4

extern __m512d __cdecl _mm512_mask_insertf64x4(__m512d src, __mmask8 k, __m512d a, __m256d b, int imm);

Copies a to destination, then inserts 256 bits (composed of four packed float64 elements) from b into destination at the location specified by imm.

_mm512_maskz_insertf64x4

extern __m512d __cdecl _mm512_maskz_insertf64x4(__mmask8 k, __m512d a, __m256d b, int imm);

Copies a to tmp, then inserts 256 bits (composed of four packed float64 elements) from b into tmp at the location specified by imm. Store tmp to destination using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_getmant_pd

extern __m512d __cdecl _mm512_getmant_pd(__m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float64 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm512_mask_getmant_pd

extern __m512d __cdecl _mm512_mask_getmant_pd(__m512d src, __mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

_mm512_maskz_getmant_pd

extern __m512d __cdecl _mm512_maskz_getmant_pd(__mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float64 elements in a, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm512_getmant_round_pd

extern __m512d __cdecl _mm512_getmant_round_pd(__m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

_mm512_mask_getmant_round_pd

extern __m512d __cdecl _mm512_mask_getmant_round_pd(__m512d src, __mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

_mm512_maskz_getmant_round_pd

extern __m512d __cdecl _mm512_maskz_getmant_round_pd(__mmask8 k, __m512d a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float64 elements in a, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm512_getmant_ps

extern __m512 __cdecl _mm512_getmant_ps(__m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float32 elements in a, and stores the result. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm512_mask_getmant_ps

extern __m512 __cdecl _mm512_mask_getmant_ps(__m512 src, __mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

_mm512_maskz_getmant_ps

extern __m512 __cdecl _mm512_maskz_getmant_ps(__mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of packed float32 elements in a, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm512_getmant_round_ps

extern __m512 __cdecl _mm512_getmant_round_ps(__m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

_mm512_mask_getmant_round_ps

extern __m512 __cdecl _mm512_mask_getmant_round_ps(__m512 src, __mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

_mm512_maskz_getmant_round_ps

extern __m512 __cdecl _mm512_maskz_getmant_round_ps(__mmask16 k, __m512 a, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of packed float32 elements in a, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set). This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_getmant_round_sd

extern __m128d __cdecl _mm_getmant_round_sd(__m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float64 element in a, stores the result in the lower destination element, and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_mask_getmant_round_sd

extern __m128d __cdecl _mm_mask_getmant_round_sd(__m128d src, __mmask8 k, __m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float64 element in a, store the result in the lower destination element, and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_maskz_getmant_round_sd

extern __m128d __cdecl _mm_maskz_getmant_round_sd(__mmask8 k, __m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float64 element in a, stores the result in the lower destination element using writemask k (the element is copied from src when mask bit 0 is not set), and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_getmant_sd

extern __m128d __cdecl _mm_getmant_sd(__m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

_mm_mask_getmant_sd

extern __m128d __cdecl _mm_mask_getmant_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalize the mantissas of the lower float64 element in a, store the result in the lower destination element using writemask k (the element is copied from src when mask bit 0 is not set), and copy the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_maskz_getmant_sd

extern __m128d __cdecl _mm_maskz_getmant_sd(__mmask8 k, __m128d a, __m128d b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of the lower float64 element in a, stores the result in the lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies the upper element from b to the upper destination element. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_getmant_round_ss

extern __m128 __cdecl _mm_getmant_round_ss(__m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element, and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_mask_getmant_round_ss

extern __m128 __cdecl _mm_mask_getmant_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

_mm_maskz_getmant_round_ss

extern __m128 __cdecl _mm_maskz_getmant_round_ss(__mmask8 k, __m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign, int round);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element using writemask k (the element is copied from src when mask bit 0 is not set), and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

_mm_getmant_ss

extern __m128 __cdecl _mm_getmant_ss(__m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

_mm_mask_getmant_ss

extern __m128 __cdecl _mm_mask_getmant_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

_mm_maskz_getmant_ss

extern __m128 __cdecl _mm_maskz_getmant_ss(__mmask8 k, __m128 a, __m128 b, _MM_MANTISSA_NORM_ENUM interval, _MM_MANTISSA_SIGN_ENUM sign);

Normalizes the mantissas of the lower float32 element in a, stores the result in the lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies the upper three packed elements from b to the upper destination elements. This intrinsic essentially calculates ±(2^k)*|x.significand|, where k depends on the interval range defined by interval and the sign depends on sign and the source sign.

Parent topic: Intrinsics for Insert and Extract Operations

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Intrinsics for FP Insert and Extract Operations