Visible to Intel only — GUID: GUID-02582DC6-0693-4062-9FF0-8207B93C88FB
Visible to Intel only — GUID: GUID-02582DC6-0693-4062-9FF0-8207B93C88FB
Intrinsics for FP Fused Multiply-Add (FMA) Operations
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
Intrinsic Name |
Operation |
Corresponding |
---|---|---|
_mm512_fmadd_pd, _mm512_mask3_fmadd_pd, _mm512_mask_fmadd_pd, _mm512_maskz_fmadd_pd _mm512_fmadd_round_pd, _mm512_mask3_fmadd_round_pd, _mm512_mask_fmadd_round_pd, _mm512_maskz_fmadd_round_pd |
Multiplies float64 element vector elements, then adds the intermediate result to float64 vector elements. |
VFMADD132PD |
_mm512_fmadd_ps, _mm512_mask3_fmadd_ps, _mm512_mask_fmadd_ps, _mm512_maskz_fmadd_ps _mm512_fmadd_round_ps, _mm512_mask3_fmadd_round_ps, _mm512_mask_fmadd_round_ps, _mm512_maskz_fmadd_round_ps |
Multiplies float32 element vector elements, then adds the intermediate result to float32 vector elements. |
VFMADD132PS |
_mm_mask3_fmadd_sd, _mm_mask_fmadd_sd, _mm_maskz_fmadd_sd _mm_mask3_fmadd_round_sd, _mm_mask_fmadd_round_sd, _mm_maskz_fmadd_round_sd |
Multiplies float64 element vector elements, then adds the intermediate result to float64 vector elements. |
VFMADD132SD |
_mm_mask3_fmadd_ss, _mm_mask_fmadd_ss, _mm_maskz_fmadd_ss _mm_mask3_fmadd_round_ss, _mm_mask_fmadd_round_ss, _mm_maskz_fmadd_round_ss |
Multiplies float32 element vector elements, then adds the intermediate result to float32 vector elements. |
VFMADD132SS |
_mm512_fmaddsub_pd, _mm512_mask3_fmaddsub_pd, _mm512_mask_fmaddsub_pd, _mm512_maskz_fmaddsub_pd _mm512_fmaddsub_round_pd, _mm512_mask3_fmaddsub_round_pd, _mm512_mask_fmaddsub_round_pd, _mm512_maskz_fmaddsub_round_pd |
Multiplies float64 element vector elements, then alternatively add and subtract to/from the intermediate result. |
VFMADDSUB132PD |
_mm512_fmaddsub_ps, _mm512_mask3_fmaddsub_ps, _mm512_mask_fmaddsub_ps, _mm512_maskz_fmaddsub_ps _mm512_fmaddsub_round_ps, _mm512_mask3_fmaddsub_round_ps, _mm512_mask_fmaddsub_round_ps, _mm512_maskz_fmaddsub_round_ps |
Multiplies float32 element vector elements, then alternatively add and subtract to/from the intermediate result. |
VFMADDSUB132PS |
_mm512_fmsub_pd, _mm512_mask3_fmsub_pd, _mm512_mask_fmsub_pd, _mm512_maskz_fmsub_pd _mm512_fmsub_round_pd, _mm512_mask3_fmsub_round_pd, _mm512_mask_fmsub_round_pd, _mm512_maskz_fmsub_round_pd |
Multiplies packed float64 element vector elements, then subtracts the intermediate result to float64 vector elements. |
VFMSUB132PD |
_mm512_fmsub_ps, _mm512_mask3_fmsub_ps, _mm512_mask_fmsub_ps, _mm512_maskz_fmsub_ps _mm512_fmsub_round_ps, _mm512_mask3_fmsub_round_ps, _mm512_mask_fmsub_round_ps, _mm512_maskz_fmsub_round_ps |
Multiplies packed float32 element vector elements, then subtracts the intermediate result to float32 vector elements. |
VFMSUB132PS |
_mm_mask3_fmsub_sd, _mm_mask_fmsub_sd, _mm_maskz_fmsub_sd _mm_mask3_fmsub_round_sd, _mm_mask_fmsub_round_sd, _mm_maskz_fmsub_round_sd |
Multiplies scalar float64 element vector elements, then subtracts the intermediate result to float64 vector elements. |
VFMSUB132SD |
_mm_mask3_fmsub_ss, _mm_mask_fmsub_ss, _mm_maskz_fmsub_ss _mm_mask3_fmsub_round_ss, _mm_mask_fmsub_round_ss, _mm_maskz_fmsub_round_ss |
Multiplies scalar float32 element vector elements, then subtracts the intermediate result to float32 vector elements. |
VFMSUB132SS |
_mm512_fmsubadd_pd, _mm512_mask3_fmsubadd_pd, _mm512_mask_fmsubadd_pd, _mm512_maskz_fmsubadd_pd _mm512_fmsubadd_round_pd, _mm512_mask3_fmsubadd_round_pd, _mm512_mask_fmsubadd_round_pd, _mm512_maskz_fmsubadd_round_pd |
Multiplies float64 element vector elements, then alternatively subtract and add to/from the intermediate result. |
VFMSUBADD132PD |
_mm512_fmsubadd_ps, _mm512_mask3_fmsubadd_ps, _mm512_mask_fmsubadd_ps, _mm512_maskz_fmsubadd_ps _mm512_fmsubadd_round_ps, _mm512_mask3_fmsubadd_round_ps, _mm512_mask_fmsubadd_round_ps, _mm512_maskz_fmsubadd_round_ps |
Multiplies float32 element vector elements, then alternatively subtract and add to/from the intermediate result. |
VFMSUBADD132PS |
_mm512_fnmadd_pd, _mm512_mask3_fnmadd_pd, _mm512_mask_fnmadd_pd, _mm512_maskz_fnmadd_pd _mm512_fnmadd_round_pd, _mm512_mask3_fnmadd_round_pd, _mm512_mask_fnmadd_round_pd, _mm512_maskz_fnmadd_round_pd |
Multiplies packed float64 element vector elements, then adds the negated intermediate result to float64 vector elements. |
VFNMADD132PD |
_mm512_fnmadd_ps, _mm512_mask3_fnmadd_ps, _mm512_maskz_fnmadd_ps, _mm512_mask_fnmadd_ps _mm512_fnmadd_round_ps, , _mm512_mask3_fnmadd_round_ps, _mm512_mask_fnmadd_round_ps, _mm512_maskz_fnmadd_round_ps |
Multiplies packed float32 element vector elements, then adds the negated intermediate result to float32 vector elements. |
VFNMADD132PS |
_mm_mask3_fnmadd_round_sd, _mm_mask_fnmadd_round_sd, _mm_maskz_fnmadd_round_sd _mm_maskz_fnmadd_sd, _mm_mask_fnmadd_sd, _mm_mask3_fnmadd_sd |
Multiplies scalar float64 element vector elements, then adds the negated intermediate result to float64 vector elements. |
VFNMADD132SD |
_mm_mask3_fnmadd_ss, _mm_mask_fnmadd_ss, _mm_maskz_fnmadd_ss _mm_mask3_fnmadd_round_ss, _mm_mask_fnmadd_round_ss, _mm_maskz_fnmadd_round_ss |
Multiplies scalar float32 element vector elements, then adds the negated intermediate result to float32 vector elements. |
VFNMADD132SS |
_mm512_fnmsub_pd, _mm512_mask3_fnmsub_pd, _mm512_mask_fnmsub_pd, _mm512_maskz_fnmsub_pd _mm512_fnmsub_round_pd, _mm512_mask3_fnmsub_round_pd, _mm512_mask_fnmsub_round_pd, _mm512_maskz_fnmsub_round_pd |
Multiplies packed float64 element vector elements, then subtracts the negated intermediate result to float64 vector elements. |
VFNMSUB132PD |
_mm512_fnmsub_ps, _mm512_mask3_fnmsub_ps, _mm512_maskz_fnmsub_ps, _mm512_mask_fnmsub_ps _mm512_fnmsub_round_ps, _mm512_mask3_fnmsub_round_ps, _mm512_maskz_fnmsub_round_ps, _mm512_mask_fnmsub_round_ps |
Multiplies packed float32 element vector elements, then subtracts the negated intermediate result to float32 vector elements. |
VFNMSUB132PS |
_mm_maskz_fnmsub_round_sd, _mm_mask_fnmsub_round_sd, _mm_mask3_fnmsub_round_sd _mm_mask_fnmsub_sd, _mm_mask3_fnmsub_sd, _mm_maskz_fnmsub_sd |
Multiplies scalar float64 element vector elements, then subtracts the negated intermediate result to float64 vector elements. |
VFNMSUB132SD |
_mm_maskz_fnmsub_round_ss, _mm_mask_fnmsub_round_ss, _mm_mask3_fnmsub_round_ss _mm_mask_fnmsub_ss, _mm_maskz_fnmsub_ss, _mm_mask3_fnmsub_ss |
Multiplies scalar float32 element vector elements, then subtracts the negated intermediate result to float32 vector elements. |
VFNMSUB132SS |
variable | definition |
---|---|
k | writemask used as a selector |
a | first source vector element |
b | second source vector element |
src | source element to use based on writemask result |
round | Rounding control values; these can be one of the following (along with the sae suppress all exceptions flag):
|
_mm512_fmadd_pd
extern __m512d __cdecl _mm512_fmadd_pd(__m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.
_mm512_mask_fmadd_pd
extern __m512d __cdecl _mm512_mask_fmadd_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmadd_pd
extern __m512d __cdecl _mm512_mask3_fmadd_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmadd_pd
extern __m512d __cdecl _mm512_maskz_fmadd_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmadd_round_pd
extern __m512d __cdecl _mm512_fmadd_round_pd(__m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.
_mm512_mask_fmadd_round_pd
extern __m512d __cdecl _mm512_mask_fmadd_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmadd_round_pd
extern __m512d __cdecl _mm512_mask3_fmadd_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmadd_round_pd
extern __m512d __cdecl _mm512_maskz_fmadd_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmadd_round_ps
extern __m512 __cdecl _mm512_fmadd_round_ps(__m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.
_mm512_mask_fmadd_round_ps
extern __m512 __cdecl _mm512_mask_fmadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmadd_round_ps
extern __m512 __cdecl _mm512_mask3_fmadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmadd_round_ps
extern __m512 __cdecl _mm512_maskz_fmadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, const int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result a using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmadd_ps
extern __m512 __cdecl _mm512_fmadd_ps(__m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.
_mm512_mask_fmadd_ps
extern __m512 __cdecl _mm512_mask_fmadd_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmadd_ps
extern __m512 __cdecl _mm512_mask3_fmadd_ps(__m512, __m512 b, __m512 c, __mmask16 k);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmadd_ps
extern __m512 __cdecl _mm512_maskz_fmadd_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmadd_round_ps
extern __m512 __cdecl _mm512_fmadd_round_ps(__m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result.
_mm512_mask_fmadd_round_ps
extern __m512 __cdecl _mm512_mask_fmadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmadd_round_ps
extern __m512 __cdecl _mm512_mask3_fmadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmadd_round_ps
extern __m512 __cdecl _mm512_maskz_fmadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the intermediate result to packed elements in c, and stores the result a using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmadd_sd
extern __m128d __cdecl _mm_mask_fmadd_sd(__m128d a, __mmask8 k, __m128d b, __m128d c);
Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fmadd_sd
extern __m128d __cdecl _mm_mask3_fmadd_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);
Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fmadd_sd
extern __m128d __cdecl _mm_maskz_fmadd_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);
Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fmadd_round_sd
extern __m128d __cdecl _mm_mask_fmadd_round_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, int round);
Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fmadd_round_sd
extern __m128d __cdecl _mm_mask3_fmadd_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);
Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fmadd_round_sd
extern __m128d __cdecl _mm_maskz_fmadd_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);
Multiplies lower float64 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fmadd_ss
extern __m128 __cdecl _mm_mask_fmadd_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fmadd_ss
extern __m128 __cdecl _mm_mask3_fmadd_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);
Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_maskz_fmadd_ss
extern __m128 __cdecl _mm_maskz_fmadd_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask_fmadd_round_ss
extern __m128 __cdecl _mm_mask_fmadd_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, int round);
Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fmadd_round_ss
extern __m128 __cdecl _mm_mask3_fmadd_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);
Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_maskz_fmadd_round_ss
extern __m128 __cdecl _mm_maskz_fmadd_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);
Multiplies lower float32 elements in a and b, and adds the intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm512_fmaddsub_pd
extern __m512d __cdecl _mm512_fmaddsub_pd(__m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.
_mm512_mask_fmaddsub_pd
extern __m512d __cdecl _mm512_mask_fmaddsub_pd(__m512d, __mmask8 k, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_pd
extern __m512d __cdecl _mm512_mask3_fmaddsub_pd(__m512d a, __m512d k, __m512d b, __mmask8 c);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_pd
extern __m512d __cdecl _mm512_maskz_fmaddsub_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmaddsub_round_pd
extern __m512d __cdecl _mm512_fmsubadd_round_pd(__m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.
_mm512_mask_fmaddsub_round_pd
extern __m512d __cdecl _mm512_mask_fmsubadd_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_round_pd
extern __m512d __cdecl _mm512_mask3_fmsubadd_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_round_pd
extern __m512d __cdecl _mm512_maskz_fmsubadd_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmaddsub_ps
extern __m512 __cdecl _mm512_fmaddsub_ps(__m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.
_mm512_mask_fmaddsub_ps
extern __m512 __cdecl _mm512_mask_fmaddsub_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_ps
extern __m512 __cdecl _mm512_mask3_fmaddsub_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_ps
extern __m512 __cdecl _mm512_maskz_fmaddsub_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmaddsub_round_ps
extern __m512 __cdecl _mm512_fmaddsub_round_ps(__m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result.
_mm512_mask_fmaddsub_round_ps
extern __m512 __cdecl _mm512_mask_fmaddsub_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmaddsub_round_ps
extern __m512 __cdecl _mm512_mask3_fmaddsub_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmaddsub_round_ps
extern __m512 __cdecl _mm512_maskz_fmaddsub_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmsub_pd
extern __m512d __cdecl _mm512_fmsub_pd(__m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.
_mm512_mask_fmsub_pd
extern __m512d __cdecl _mm512_mask_fmsub_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsub_pd
extern __m512d __cdecl _mm512_mask3_fmsub_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsub_pd
extern __m512d __cdecl _mm512_maskz_fmsub_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmsub_round_pd
extern __m512d __cdecl _mm512_fmsub_round_pd(__m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.
_mm512_mask_fmsub_round_pd
extern __m512d __cdecl _mm512_mask_fmsub_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsub_round_pd
extern __m512d __cdecl _mm512_mask3_fmsub_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsub_round_pd
extern __m512d __cdecl _mm512_maskz_fmsub_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmsub_ps
extern __m512 __cdecl _mm512_fmsub_ps(__m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.
_mm512_mask_fmsub_ps
extern __m512 __cdecl _mm512_mask_fmsub_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsub_ps
extern __m512 __cdecl _mm512_mask3_fmsub_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsub_ps
extern __m512 __cdecl _mm512_maskz_fmsub_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmsub_round_ps
extern __m512 __cdecl _mm512_fmsub_round_ps(__m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result.
_mm512_mask_fmsub_round_ps
extern __m512 __cdecl _mm512_mask_fmsub_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsub_round_ps
extern __m512 __cdecl _mm512_mask3_fmsub_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsub_round_ps
extern __m512 __cdecl _mm512_maskz_fmsub_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmsub_sd
extern __m128d __cdecl _mm_mask_fmsub_sd(__m128d a, __mmask8 k, __m128d b, __m128d c);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fmsub_sd
extern __m128d __cdecl _mm_mask3_fmsub_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fmsub_sd
extern __m128d __cdecl _mm_maskz_fmsub_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fmsub_round_sd
extern __m128d __cdecl _mm_mask_fmsub_round_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, int round);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fmsub_round_sd
extern __m128d __cdecl _mm_mask3_fmsub_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fmsub_round_sd
extern __m128d __cdecl _mm_maskz_fmsub_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fmsub_ss
extern __m128 __cdecl _mm_mask_fmsub_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fmsub_ss
extern __m128 __cdecl _mm_mask3_fmsub_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_maskz_fmsub_ss
extern __m128 __cdecl _mm_maskz_fmsub_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask_fmsub_round_ss
extern __m128 __cdecl _mm_mask_fmsub_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, int round);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fmsub_round_ss
extern __m128 __cdecl _mm_mask3_fmsub_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_maskz_fmsub_round_ss
extern __m128 __cdecl _mm_maskz_fmsub_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm512_fmsubadd_pd
extern __m512d __cdecl _mm512_fmsubadd_pd(__m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.
_mm512_mask_fmsubadd_pd
extern __m512d __cdecl _mm512_mask_fmsubadd_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_pd
extern __m512d __cdecl _mm512_mask3_fmsubadd_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_pd
extern __m512d __cdecl _mm512_maskz_fmsubadd_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result destination using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmsubadd_round_pd
extern __m512d __cdecl _mm512_fmaddsub_round_pd(__m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.
_mm512_mask_fmsubadd_round_pd
extern __m512d __cdecl _mm512_mask_fmaddsub_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_round_pd
extern __m512d __cdecl _mm512_mask3_fmaddsub_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_round_pd
extern __m512d __cdecl _mm512_maskz_fmaddsub_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmsubadd_ps
extern __m512 __cdecl _mm512_fmsubadd_ps(__m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.
_mm512_mask_fmsubadd_ps
extern __m512 __cdecl _mm512_mask_fmsubadd_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_ps
extern __m512 __cdecl _mm512_mask3_fmsubadd_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_ps
extern __m512 __cdecl _mm512_maskz_fmsubadd_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fmsubadd_round_ps
extern __m512 __cdecl _mm512_fmsubadd_round_ps(__m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result.
_mm512_mask_fmsubadd_round_ps
extern __m512 __cdecl _mm512_mask_fmsubadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fmsubadd_round_ps
extern __m512 __cdecl _mm512_mask3_fmsubadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fmsubadd_round_ps
extern __m512 __cdecl _mm512_maskz_fmsubadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fnmadd_pd
extern __m512d __cdecl _mm512_fnmadd_pd(__m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.
_mm512_mask_fnmadd_pd
extern __m512d __cdecl _mm512_mask_fnmadd_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_pd
extern __m512d __cdecl _mm512_mask3_fnmadd_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmadd_pd
extern __m512d __cdecl _mm512_maskz_fnmadd_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fnmadd_round_pd
extern __m512d __cdecl _mm512_fnmadd_round_pd(__m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.
_mm512_mask_fnmadd_round_pd
extern __m512d __cdecl _mm512_mask_fnmadd_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_round_pd
extern __m512d __cdecl _mm512_mask3_fnmadd_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmadd_round_pd
extern __m512d __cdecl _mm512_maskz_fnmadd_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fnmadd_ps
extern __m512 __cdecl _mm512_fnmadd_ps(__m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.
_mm512_mask_fnmadd_ps
extern __m512 __cdecl _mm512_mask_fnmadd_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_ps
extern __m512 __cdecl _mm512_mask3_fnmadd_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmadd_ps
extern __m512 __cdecl _mm512_maskz_fnmadd_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fnmadd_round_ps
extern __m512 __cdecl _mm512_fnmadd_round_ps(__m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result.
_mm512_mask_fnmadd_round_ps
extern __m512 __cdecl _mm512_mask_fnmadd_round_ps(__m512 a, __mmask16 k, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmadd_round_ps
extern __m512 __cdecl _mm512_mask3_fnmadd_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmadd_round_ps
extern __m512 __cdecl _mm512_maskz_fnmadd_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, adds the negated intermediate result to packed elements in c, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fnmadd_sd
extern __m128d __cdecl _mm_mask_fnmadd_sd(__m128d a, __mmask8 k, __m128d b, __m128d c);
Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fnmadd_sd
extern __m128d __cdecl _mm_mask3_fnmadd_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);
Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fnmadd_sd
extern __m128d __cdecl _mm_maskz_fnmadd_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);
Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fnmadd_round_sd
extern __m128d __cdecl _mm_mask_fnmadd_round_sd(__m128d a, __mmask8 k, __m128d b, __m128d c, int round);
Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fnmadd_round_sd
extern __m128d __cdecl _mm_mask3_fnmadd_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);
Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fnmadd_round_sd
extern __m128d __cdecl _mm_maskz_fnmadd_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);
Multiplies lower float64 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fnmadd_ss
extern __m128 __cdecl _mm_mask_fnmadd_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fnmadd_ss
extern __m128 __cdecl _mm_mask3_fnmadd_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);
Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_maskz_fnmadd_ss
extern __m128 __cdecl _mm_maskz_fnmadd_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask_fnmadd_round_ss
extern __m128 __cdecl _mm_mask_fnmadd_round_ss(__m128 a, __mmask8 k, __m128 b, __m128 c, int round);
Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from a when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fnmadd_round_ss
extern __m128 __cdecl _mm_mask3_fnmadd_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);
Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_maskz_fnmadd_round_ss
extern __m128 __cdecl _mm_maskz_fnmadd_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);
Multiplies lower float32 elements in a and b, and adds the negated intermediate result to lower element in c. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm512_fnmsub_pd
extern __m512d __cdecl _mm512_fnmsub_pd(__m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.
_mm512_mask_fnmsub_pd
extern __m512d __cdecl _mm512_mask_fnmsub_pd(__m512d a, __mmask8 k, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_pd
extern __m512d __cdecl _mm512_mask3_fnmsub_pd(__m512d a, __m512d b, __m512d c, __mmask8 k);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_pd
extern __m512d __cdecl _mm512_maskz_fnmsub_pd(__mmask8 k, __m512d a, __m512d b, __m512d c);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fnmsub_round_pd
extern __m512d __cdecl _mm512_fnmsub_round_pd(__m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.
_mm512_mask_fnmsub_round_pd
extern __m512d __cdecl _mm512_mask_fnmsub_round_pd(__m512d a, __mmask8 k, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_round_pd
extern __m512d __cdecl _mm512_mask3_fnmsub_round_pd(__m512d a, __m512d b, __m512d c, __mmask8 k, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_round_pd
extern __m512d __cdecl _mm512_maskz_fnmsub_round_pd(__mmask8 k, __m512d a, __m512d b, __m512d c, int round);
Multiplies packed float64 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fnmsub_ps
extern __m512 __cdecl _mm512_fnmsub_ps(__m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.
_mm512_mask_fnmsub_ps
extern __m512 __cdecl _mm512_mask_fnmsub_ps(__m512 a, __mmask16 k, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_ps
extern __m512 __cdecl _mm512_mask3_fnmsub_ps(__m512 a, __m512 b, __m512 c, __mmask16 k);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_ps
extern __m512 __cdecl _mm512_maskz_fnmsub_ps(__mmask16 k, __m512 a, __m512 b, __m512 c);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_fnmsub_round_ps
extern __m512 __cdecl _mm512_fnmsub_round_ps(__m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result.
_mm512_mask_fnmsub_round_ps
extern __m512 __cdecl _mm512_mask_fnmsub_round_ps(__m512 c, __mmask16 k, __m512 a, __m512 b, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_mask3_fnmsub_round_ps
extern __m512 __cdecl _mm512_mask3_fnmsub_round_ps(__m512 a, __m512 b, __m512 c, __mmask16 k, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm512_maskz_fnmsub_round_ps
extern __m512 __cdecl _mm512_maskz_fnmsub_round_ps(__mmask16 k, __m512 a, __m512 b, __m512 c, int round);
Multiplies packed float32 elements in a and b, subtracts packed elements in c from the negated intermediate result, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fnmsub_sd
extern __m128d __cdecl _mm_mask_fnmsub_sd(__m128d c, __mmask8 k, __m128d a, __m128d b);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fnmsub_sd
extern __m128d __cdecl _mm_mask3_fnmsub_sd(__m128d a, __m128d b, __m128d c, __mmask8 k);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fnmsub_sd
extern __m128d __cdecl _mm_maskz_fnmsub_sd(__mmask8 k, __m128d a, __m128d b, __m128d c);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fnmsub_ss
extern __m128 __cdecl _mm_mask_fnmsub_ss(__m128 c, __mmask8 k, __m128 a, __m128 b);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fnmsub_ss
extern __m128 __cdecl _mm_mask3_fnmsub_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element, and copies upper element from a to upper destination element using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fnmsub_ss
extern __m128 __cdecl _mm_maskz_fnmsub_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask_fnmsub_round_ss
extern __m128 __cdecl _mm_mask_fnmsub_round_ss(__m128 c, __mmask8 k, __m128 a, __m128 b, int round);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fnmsub_round_ss
extern __m128 __cdecl _mm_mask3_fnmsub_round_ss(__m128 a, __m128 b, __m128 c, __mmask8 k, int round);
Multiplies lower float32 elements in a and b, subtract lower element in c from the negated intermediate result, Stores the result in lower destination element, and copies upper element from a to upper destination element using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fnmsub_round_ss
extern __m128 __cdecl _mm_maskz_fnmsub_round_ss(__mmask8 k, __m128 a, __m128 b, __m128 c, int round);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask_fnmsub_round_sd
extern __m128d __cdecl _mm_mask_fnmsub_round_sd(__m128d c, __mmask8 k, __m128d a, __m128d b, int round);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask3_fnmsub_round_sd
extern __m128d __cdecl _mm_mask3_fnmsub_round_sd(__m128d a, __m128d b, __m128d c, __mmask8 k, int round);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_maskz_fnmsub_round_sd
extern __m128d __cdecl _mm_maskz_fnmsub_round_sd(__mmask8 k, __m128d a, __m128d b, __m128d c, int round);
Multiplies lower float64 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper element from a to upper destination element.
_mm_mask_fnmsub_ss
extern __m128 __cdecl _mm_mask_fnmsub_ss(__m128 a, __mmask8 k, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using writemask k (the element is copied from c when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.
_mm_mask3_fnmsub_ss
extern __m128 __cdecl _mm_mask3_fnmsub_ss(__m128 a, __m128 b, __m128 c, __mmask8 k);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element, and copies upper element from a to upper destination element using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fnmsub_ss
extern __m128 __cdecl _mm_maskz_fnmsub_ss(__mmask8 k, __m128 a, __m128 b, __m128 c);
Multiplies lower float32 elements in a and b, and subtracts lower element in c from the negated intermediate result. Stores the result in lower destination element using zeromask k (the element is zeroed out when mask bit 0 is not set), and copies upper three packed elements from a to upper destination elements.