Visible to Intel only — GUID: GUID-EE66A1CC-A73B-4236-AD24-F503777AD4AB
Visible to Intel only — GUID: GUID-EE66A1CC-A73B-4236-AD24-F503777AD4AB
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS Instructions
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS instruction intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
_mm512_4fmadd_ps
__mm512i _mm512_4fmadd_ps (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
Instructions: v4fmaddps zmm1, zmm2+3, m128
Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the result in c.
_mm512_mask_4fmadd_ps
__mm512i _mm512_mask_4fmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fmaddps zmm1 {k}, zmm2+3, m128
Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are copied from c when the corresponding mask bit is not set.
_mm512_maskz_4fmadd_ps
__mm512i _mm512 _maskz_4fmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fmaddps zmm {k}, zmm+3, m128
Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are zeroed out when the corresponding mask bit is not set.
_mm512_4fnmadd_ps
__mm512i _mm512_4fnmadd_ps (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
Instructions: v4fnmaddps zmm1, zmm2+3, m128
Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the result in c.
_mm512_mask_4fnmadd_ps
__mm512i _mm512_mask_4fnmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fnmaddps zmm1 {k}, zmm2+3, m128
Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are copied from c when the corresponding mask bit is not set.
_mm512_maskz_4fnmadd_ps
__mm512i _mm512_maskz_4fnmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fnmaddps zmm1 {k}, zmm2+3, m128
Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are zeroed out when the corresponding mask bit is not set.
_mm_4fmadd_ss
__mm512i _mm_4fmadd_ss (__m128 c, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
Instructions: v4fmaddss xmm1, xmm2+3, m128
Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the lower element result in c.
_mm_mask_4fmadd_ss
__mm512i _mm_mask_4fmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fmaddss xmm1 {k}, xmm2+3, m128
Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are copied from c when the corresponding mask bit is not set.
_mm_maskz_4fmadd_ss
__mm512i _mm_maskz_4fmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fmaddss xmm1 {k}, xmm2+3, m128
Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are zeroed out when the corresponding mask bit is not set.
_mm_4fnmadd_ss
__mm512i _mm_4fnmadd_ss (__m128 c, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
Instructions: v4fnmaddss xmm1, xmm2+3, m128
Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the lower element result in c.
_mm_mask_4fnmadd_ss
__mm512i _mm_mask_4fnmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fnmaddss xmm1 {k}, xmm2+3, m128
Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are copied from c when the corresponding mask bit is not set.
_mm_maskz_4fnmadd_ss
__mm512i _mm_maskz_4fnmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: v4fnmaddss xmm1 {k}, xmm2+3, m128
Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are zeroed out when the corresponding mask bit is not set.