Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512)...

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 7/13/2023

Version

Public

Visible to Intel only — GUID: GUID-EE66A1CC-A73B-4236-AD24-F503777AD4AB

View Details

Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS Instructions

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS instruction intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

_mm512_4fmadd_ps

__mm512i _mm512_4fmadd_ps (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator

Instructions: v4fmaddps zmm1, zmm2+3, m128

Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the result in c.

_mm512_mask_4fmadd_ps

__mm512i _mm512_mask_4fmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fmaddps zmm1 {k}, zmm2+3, m128

Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are copied from c when the corresponding mask bit is not set.

_mm512_maskz_4fmadd_ps

__mm512i _mm512 _maskz_4fmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fmaddps zmm {k}, zmm+3, m128

Multiplies packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are zeroed out when the corresponding mask bit is not set.

_mm512_4fnmadd_ps

__mm512i _mm512_4fnmadd_ps (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator

Instructions: v4fnmaddps zmm1, zmm2+3, m128

Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the result in c.

_mm512_mask_4fnmadd_ps

__mm512i _mm512_mask_4fnmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fnmaddps zmm1 {k}, zmm2+3, m128

Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are copied from c when the corresponding mask bit is not set.

_mm512_maskz_4fnmadd_ps

__mm512i _mm512_maskz_4fnmadd_ps (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fnmaddps zmm1 {k}, zmm2+3, m128

Multiplies and negates packed single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the result in c. Elements are zeroed out when the corresponding mask bit is not set.

_mm_4fmadd_ss

__mm512i _mm_4fmadd_ss (__m128 c, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator

Instructions: v4fmaddss xmm1, xmm2+3, m128

Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the lower element result in c.

_mm_mask_4fmadd_ss

__mm512i _mm_mask_4fmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fmaddss xmm1 {k}, xmm2+3, m128

Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are copied from c when the corresponding mask bit is not set.

_mm_maskz_4fmadd_ss

__mm512i _mm_maskz_4fmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fmaddss xmm1 {k}, xmm2+3, m128

Multiplies the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are zeroed out when the corresponding mask bit is not set.

_mm_4fnmadd_ss

__mm512i _mm_4fnmadd_ss (__m128 c, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator

Instructions: v4fnmaddss xmm1, xmm2+3, m128

Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} by floating-point values pointed to by b and accumulates the lower element result in c.

_mm_mask_4fnmadd_ss

__mm512i _mm_mask_4fnmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fnmaddss xmm1 {k}, xmm2+3, m128

Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are copied from c when the corresponding mask bit is not set.

_mm_maskz_4fnmadd_ss

__mm512i _mm_maskz_4fnmadd_ss (__m128 c, __mmask8 k, __m128 a0, __m128 a1, __m128 a2, __m128 a3, __m128 * b)

variable	definition
an	first source block 4 vectors
b	pointer to the second source block
c	third source; accumulator
k	mask used as a selector

Instructions: v4fnmaddss xmm1 {k}, xmm2+3, m128

Multiplies and negates the lower packed scalar single-precision floating-point values from source register block {a0, a1, a2, a3} using mask k by floating-point values pointed to by b and accumulates the lower element result in c. Elements are zeroed out when the corresponding mask bit is not set.

Parent topic: Intrinsics

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4FMAPS Instructions