Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4VNNIW Instructions

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4VNNIW instruction intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


_mm512_4dpwssd_epi32

__mm512i _mm512_4dpwssd_epi32 (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator

Instructions: vp4dpwssd zmm1, zmm2+3, m128

Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation in c. The memory operand is sequentially selected in each of the four steps.



_mm512_mask_4dpwssd_epi32

__mm512i _mm512_mask_4dpwssd_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: vp4dpwssd zmm1 {k}, zmm2+3, m128

Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are copied from c when the corresponding mask bit is not set.



_mm512_maskz_4dpwssd_epi32

__mm512i _mm512_maskz_4dpwssd_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: vp4dpwssd zmm1 {k}, zmm2+3, m128

Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are zeroed out when the corresponding mask bit is not set.



_mm512_4dpwssds_epi32

__mm512i _mm512_4dpwssds_epi32 (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator

Instructions: vp4dpwssds zmm1, zmm2+3, m128

Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation and signed saturation in c. The memory operand is sequentially selected in each of the four steps.



_mm512_mask_4dpwssds_epi32

__mm512i _mm512_mask_4dpwssds_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: vp4dpwssds zmm1 {k}, zmm2+3, m128

Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation and signed saturation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are copied from c when the corresponding mask bit is not set.



_mm512_maskz_4dpwssds_epi32

__mm512i _mm512_maskz_4dpwssds_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable definition
an first source block 4 vectors
b pointer to the second source block
c third source; accumulator
k mask used as a selector

Instructions: vp4dpwssds zmm1 {k}, zmm2+3, m128

Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation and signed saturation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are zeroed out when the corresponding mask bit is not set.