Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS Instructions

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS instruction intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


_mm512_4fmadd_ps

__mm512i _mm512_4fmadd_ps (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator

Instructions: v4fmaddps zmm1, zmm2+3, m128

Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the result in c.



_mm512_mask_4fmadd_ps

__mm512i _mm512_mask_4fmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fmaddps zmm1 {k}, zmm2+3, m128

Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are copied from c when the corresponding mask bit is not set.



_mm512_maskz_4fmadd_ps

__mm512i _mm512 _maskz_4fmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fmaddps zmm {k}, zmm+3, m128

Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are zeroed out when the corresponding mask bit is not set.



_mm512_4fnmadd_ps

__mm512i _mm512_4fnmadd_ps (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator

Instructions: v4fnmaddps zmm1, zmm2+3, m128

Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the result in c.



_mm512_mask_4fnmadd_ps

__mm512i _mm512_mask_4fnmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fnmaddps zmm1 {k}, zmm2+3, m128

Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are copied from c when the corresponding mask bit is not set.



_mm512_maskz_4fnmadd_ps

__mm512i _mm512_maskz_4fnmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fnmaddps zmm1 {k}, zmm2+3, m128

Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are zeroed out when the corresponding mask bit is not set.



_mm_4fmadd_ss

__mm512i _mm_4fmadd_ss (__m128 c, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator

Instructions: v4fmaddss xmm1, xmm2+3, m128

Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the lower element result in c.



_mm_mask_4fmadd_ss

__mm512i _mm_mask_4fmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fmaddss xmm1 {k}, xmm2+3, m128

Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are copied from c when the corresponding mask bit is not set.



_mm_maskz_4fmadd_ss

__mm512i _mm_maskz_4fmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fmaddss xmm1 {k}, xmm2+3, m128

Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are zeroed out when the corresponding mask bit is not set.



_mm_4fnmadd_ss

__mm512i _mm_4fnmadd_ss (__m128 c, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator

Instructions: v4fnmaddss xmm1, xmm2+3, m128

Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the lower element result in c.



_mm_mask_4fnmadd_ss

__mm512i _mm_mask_4fnmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fnmaddss xmm1 {k}, xmm2+3, m128

Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are copied from c when the corresponding mask bit is not set.



_mm_maskz_4fnmadd_ss

__mm512i _mm_maskz_4fnmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: v4fnmaddss xmm1 {k}, xmm2+3, m128

Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are zeroed out when the corresponding mask bit is not set.