Visible to Intel only — GUID: GUID-F275FDAC-5625-4A87-8773-39FA79F12A8A
Visible to Intel only — GUID: GUID-F275FDAC-5625-4A87-8773-39FA79F12A8A
Intrinsics for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4VNNIW Instructions
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) 4VNNIW instruction intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
_mm512_4dpwssd_epi32
__mm512i _mm512_4dpwssd_epi32 (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
Instructions: vp4dpwssd zmm1, zmm2+3, m128
Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation in c. The memory operand is sequentially selected in each of the four steps.
_mm512_mask_4dpwssd_epi32
__mm512i _mm512_mask_4dpwssd_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: vp4dpwssd zmm1 {k}, zmm2+3, m128
Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are copied from c when the corresponding mask bit is not set.
_mm512_maskz_4dpwssd_epi32
__mm512i _mm512_maskz_4dpwssd_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: vp4dpwssd zmm1 {k}, zmm2+3, m128
Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are zeroed out when the corresponding mask bit is not set.
_mm512_4dpwssds_epi32
__mm512i _mm512_4dpwssds_epi32 (__m512 c, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
Instructions: vp4dpwssds zmm1, zmm2+3, m128
Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation and signed saturation in c. The memory operand is sequentially selected in each of the four steps.
_mm512_mask_4dpwssds_epi32
__mm512i _mm512_mask_4dpwssds_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: vp4dpwssds zmm1 {k}, zmm2+3, m128
Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation and signed saturation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are copied from c when the corresponding mask bit is not set.
_mm512_maskz_4dpwssds_epi32
__mm512i _mm512_maskz_4dpwssds_epi32 (__m512 c, __mmask16 k, __m512 a0, __m512 a1, __m512 a2, __m512 a3, __m128 * b)
variable | definition |
---|---|
an | first source block 4 vectors |
b | pointer to the second source block |
c | third source; accumulator |
k | mask used as a selector |
Instructions: vp4dpwssds zmm1 {k}, zmm2+3, m128
Computes 4 vector source-block dot-products of two signed word operands with doubleword accumulation and signed saturation using mask k, with accumulation in c. The memory operand is sequentially selected in each of the four steps. Elements are zeroed out when the corresponding mask bit is not set.