Visible to Intel only — GUID: GUID-E70D11BD-217B-4E52-9C3F-1A9177658929
Visible to Intel only — GUID: GUID-E70D11BD-217B-4E52-9C3F-1A9177658929
Intrinsics for Arithmetic Operations
The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.
To use these intrinsics, include the immintrin.h file as follows:
#include <immintrin.h>
variable | definition |
---|---|
src | source element to use based on writemask result |
k | writemask used as a selector |
a | first source vector element |
b | second source vector element |
c | third source vector element |
_mm_mask_add_pd
__m128d _mm_mask_add_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddpd
Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_add_pd
__m128d _mm_maskz_add_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddpd
Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_add_pd
__m256d _mm256_mask_add_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddpd
Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_pd
__m256d _mm256_maskz_add_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddpd
Add packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_add_ps
__m128 _mm_mask_add_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddps
Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_add_ps
__m128 _mm_maskz_add_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddps
Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_add_ps
__m256 _mm256_mask_add_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddps
Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_ps
__m256 _mm256_maskz_add_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vaddps
Add packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_div_pd
__m128d _mm_mask_div_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivpd
Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_div_pd
__m128d _mm_maskz_div_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivpd
Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_div_pd
__m256d _mm256_mask_div_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivpd
Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_div_pd
__m256d _mm256_maskz_div_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivpd
Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_div_ps
__m128 _mm_mask_div_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivps
Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_div_ps
__m128 _mm_maskz_div_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivps
Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_div_ps
__m256 _mm256_mask_div_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivps
Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_div_ps
__m256 _mm256_maskz_div_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vdivps
Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmadd_pd
__m128d _mm_mask_fmadd_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmadd_pd
__m128d _mm_mask3_fmadd_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmadd_pd
__m128d _mm_maskz_fmadd_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmadd_pd
__m256d _mm256_mask_fmadd_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmadd_pd
__m256d _mm256_mask3_fmadd_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmadd_pd
__m256d _mm256_maskz_fmadd_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132pd, vfmadd213pd, vfmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmadd_ps
__m128 _mm_mask_fmadd_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmadd_ps
__m128 _mm_mask3_fmadd_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmadd_ps
__m128 _mm_maskz_fmadd_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmadd_ps
__m256 _mm256_mask_fmadd_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmadd_ps
__m256 _mm256_mask3_fmadd_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmadd_ps
__m256 _mm256_maskz_fmadd_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmadd132ps, vfmadd213ps, vfmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmaddsub_pd
__m128d _mm_mask_fmaddsub_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmaddsub_pd
__m128d _mm_mask3_fmaddsub_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmaddsub_pd
__m128d _mm_maskz_fmaddsub_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmaddsub_pd
__m256d _mm256_mask_fmaddsub_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmaddsub_pd
__m256d _mm256_mask3_fmaddsub_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmaddsub_pd
__m256d _mm256_maskz_fmaddsub_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132pd, vfmaddsub213pd, vfmaddsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmaddsub_ps
__m128 _mm_mask_fmaddsub_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmaddsub_ps
__m128 _mm_mask3_fmaddsub_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmaddsub_ps
__m128 _mm_maskz_fmaddsub_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmaddsub_ps
__m256 _mm256_mask_fmaddsub_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmaddsub_ps
__m256 _mm256_mask3_fmaddsub_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmaddsub_ps
__m256 _mm256_maskz_fmaddsub_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmaddsub132ps, vfmaddsub213ps, vfmaddsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively add and subtract packed elements in c to/from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmsub_pd
__m128d _mm_mask_fmsub_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmsub_pd
__m128d _mm_mask3_fmsub_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmsub_pd
__m128d _mm_maskz_fmsub_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmsub_pd
__m256d _mm256_mask_fmsub_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmsub_pd
__m256d _mm256_mask3_fmsub_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmsub_pd
__m256d _mm256_maskz_fmsub_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132pd, vfmsub213pd, vfmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmsub_ps
__m128 _mm_mask_fmsub_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmsub_ps
__m128 _mm_mask3_fmsub_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmsub_ps
__m128 _mm_maskz_fmsub_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmsub_ps
__m256 _mm256_mask_fmsub_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmsub_ps
__m256 _mm256_mask3_fmsub_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmsub_ps
__m256 _mm256_maskz_fmsub_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsub132ps, vfmsub213ps, vfmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmsubadd_pd
__m128d _mm_mask_fmsubadd_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmsubadd_pd
__m128d _mm_mask3_fmsubadd_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmsubadd_pd
__m128d _mm_maskz_fmsubadd_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmsubadd_pd
__m256d _mm256_mask_fmsubadd_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmsubadd_pd
__m256d _mm256_mask3_fmsubadd_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmsubadd_pd
__m256d _mm256_maskz_fmsubadd_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132pd, vfmsubadd213pd, vfmsubadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fmsubadd_ps
__m128 _mm_mask_fmsubadd_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fmsubadd_ps
__m128 _mm_mask3_fmsubadd_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fmsubadd_ps
__m128 _mm_maskz_fmsubadd_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fmsubadd_ps
__m256 _mm256_mask_fmsubadd_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fmsubadd_ps
__m256 _mm256_mask3_fmsubadd_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fmsubadd_ps
__m256 _mm256_maskz_fmsubadd_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfmsubadd132ps, vfmsubadd213ps, vfmsubadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, alternatively subtract and add packed elements in c from/to the intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fnmadd_pd
__m128d _mm_mask_fnmadd_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fnmadd_pd
__m128d _mm_mask3_fnmadd_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fnmadd_pd
__m128d _mm_maskz_fnmadd_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fnmadd_pd
__m256d _mm256_mask_fnmadd_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fnmadd_pd
__m256d _mm256_mask3_fnmadd_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fnmadd_pd
__m256d _mm256_maskz_fnmadd_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132pd, vfnmadd213pd, vfnmadd231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fnmadd_ps
__m128 _mm_mask_fnmadd_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fnmadd_ps
__m128 _mm_mask3_fnmadd_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fnmadd_ps
__m128 _mm_maskz_fnmadd_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fnmadd_ps
__m256 _mm256_mask_fnmadd_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fnmadd_ps
__m256 _mm256_mask3_fnmadd_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fnmadd_ps
__m256 _mm256_maskz_fnmadd_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmadd132ps, vfnmadd213ps, vfnmadd231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, add the negated intermediate result to packed elements in c, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fnmsub_pd
__m128d _mm_mask_fnmsub_pd(__m128d a, __mmask8 k, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fnmsub_pd
__m128d _mm_mask3_fnmsub_pd(__m128d a, __m128d b, __m128d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fnmsub_pd
__m128d _mm_maskz_fnmsub_pd(__mmask8 k, __m128d a, __m128d b, __m128d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fnmsub_pd
__m256d _mm256_mask_fnmsub_pd(__m256d a, __mmask8 k, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fnmsub_pd
__m256d _mm256_mask3_fnmsub_pd(__m256d a, __m256d b, __m256d c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fnmsub_pd
__m256d _mm256_maskz_fnmsub_pd(__mmask8 k, __m256d a, __m256d b, __m256d c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132pd, vfnmsub213pd, vfnmsub231pd
Multiply packed double-precision (64-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_fnmsub_ps
__m128 _mm_mask_fnmsub_ps(__m128 a, __mmask8 k, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_mask3_fnmsub_ps
__m128 _mm_mask3_fnmsub_ps(__m128 a, __m128 b, __m128 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm_maskz_fnmsub_ps
__m128 _mm_maskz_fnmsub_ps(__mmask8 k, __m128 a, __m128 b, __m128 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_fnmsub_ps
__m256 _mm256_mask_fnmsub_ps(__m256 a, __mmask8 k, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_mask3_fnmsub_ps
__m256 _mm256_mask3_fnmsub_ps(__m256 a, __m256 b, __m256 c, __mmask8 k)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using writemask k (elements are copied from c when the corresponding mask bit is not set).
_mm256_maskz_fnmsub_ps
__m256 _mm256_maskz_fnmsub_ps(__mmask8 k, __m256 a, __m256 b, __m256 c)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vfnmsub132ps, vfnmsub213ps, vfnmsub231ps
Multiply packed single-precision (32-bit) floating-point elements in a and b, subtract packed elements in c from the negated intermediate result, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_max_pd
__m128d _mm_mask_max_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_pd
__m128d _mm_maskz_max_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_pd
__m256d _mm256_mask_max_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_pd
__m256d _mm256_maskz_max_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_max_ps
__m128 _mm_mask_max_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_ps
__m128 _mm_maskz_max_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_ps
__m256 _mm256_mask_max_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_ps
__m256 _mm256_maskz_max_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmaxps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_min_pd
__m128d _mm_mask_min_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_pd
__m128d _mm_maskz_min_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_pd
__m256d _mm256_mask_min_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_pd
__m256d _mm256_maskz_min_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminpd
Compare packed double-precision (64-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_min_ps
__m128 _mm_mask_min_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_ps
__m128 _mm_maskz_min_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_ps
__m256 _mm256_mask_min_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_ps
__m256 _mm256_maskz_min_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vminps
Compare packed single-precision (32-bit) floating-point elements in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_mul_pd
__m128d _mm_mask_mul_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulpd
Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). RM.
_mm_maskz_mul_pd
__m128d _mm_maskz_mul_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulpd
Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mul_pd
__m256d _mm256_mask_mul_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulpd
Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mul_pd
__m256d _mm256_maskz_mul_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulpd
Multiply packed double-precision (64-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_mul_ps
__m128 _mm_mask_mul_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulps
Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). RM.
_mm_maskz_mul_ps
__m128 _mm_maskz_mul_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulps
Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mul_ps
__m256 _mm256_mask_mul_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulps
Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). RM.
_mm256_maskz_mul_ps
__m256 _mm256_maskz_mul_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vmulps
Multiply packed single-precision (32-bit) floating-point elements in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_rcp14_pd
__m128d _mm_mask_rcp14_pd(__m128d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14pd
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_maskz_rcp14_pd
__m128d _mm_maskz_rcp14_pd(__mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14pd
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_rcp14_pd
__m128d _mm_rcp14_pd(__m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14pd
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.
_mm256_mask_rcp14_pd
__m256d _mm256_mask_rcp14_pd(__m256d src, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14pd
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_maskz_rcp14_pd
__m256d _mm256_maskz_rcp14_pd(__mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14pd
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_rcp14_pd
__m256d _mm256_rcp14_pd(__m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14pd
Compute the approximate reciprocal of packed double-precision (64-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.
_mm_mask_rcp14_ps
__m128 _mm_mask_rcp14_ps(__m128 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14ps
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_maskz_rcp14_ps
__m128 _mm_maskz_rcp14_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14ps
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_rcp14_ps
__m128 _mm_rcp14_ps(__m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14ps
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.
_mm256_mask_rcp14_ps
__m256 _mm256_mask_rcp14_ps(__m256 src, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14ps
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_maskz_rcp14_ps
__m256 _mm256_maskz_rcp14_ps(__mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14ps
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_rcp14_ps
__m256 _mm256_rcp14_ps(__m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrcp14ps
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a, and return the results. The maximum relative error for this approximation is less than 2^-14.
_mm_mask_rsqrt14_pd
__m128d _mm_mask_rsqrt14_pd(__m128d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14pd
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_maskz_rsqrt14_pd
__m128d _mm_maskz_rsqrt14_pd(__mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14pd
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_mask_rsqrt14_pd
__m256d _mm256_mask_rsqrt14_pd(__m256d src, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14pd
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_maskz_rsqrt14_pd
__m256d _mm256_maskz_rsqrt14_pd(__mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14pd
Compute the approximate reciprocal square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_mask_rsqrt14_ps
__m128 _mm_mask_rsqrt14_ps(__m128 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14ps
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_maskz_rsqrt14_ps
__m128 _mm_maskz_rsqrt14_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14ps
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_mask_rsqrt14_ps
__m256 _mm256_mask_rsqrt14_ps(__m256 src, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14ps
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm256_maskz_rsqrt14_ps
__m256 _mm256_maskz_rsqrt14_ps(__mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vrsqrt14ps
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set). The maximum relative error for this approximation is less than 2^-14.
_mm_mask_sqrt_pd
__m128d _mm_mask_sqrt_pd(__m128d src, __mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtpd
Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sqrt_pd
__m128d _mm_maskz_sqrt_pd(__mmask8 k, __m128d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtpd
Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sqrt_pd
__m256d _mm256_mask_sqrt_pd(__m256d src, __mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtpd
Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sqrt_pd
__m256d _mm256_maskz_sqrt_pd(__mmask8 k, __m256d a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtpd
Compute the square root of packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_sqrt_ps
__m128 _mm_mask_sqrt_ps(__m128 src, __mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtps
Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sqrt_ps
__m128 _mm_maskz_sqrt_ps(__mmask8 k, __m128 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtps
Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sqrt_ps
__m256 _mm256_mask_sqrt_ps(__m256 src, __mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtps
Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sqrt_ps
__m256 _mm256_maskz_sqrt_ps(__mmask8 k, __m256 a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsqrtps
Compute the square root of packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_sub_pd
__m128d _mm_mask_sub_pd(__m128d src, __mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubpd
Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sub_pd
__m128d _mm_maskz_sub_pd(__mmask8 k, __m128d a, __m128d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubpd
Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sub_pd
__m256d _mm256_mask_sub_pd(__m256d src, __mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubpd
Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sub_pd
__m256d _mm256_maskz_sub_pd(__mmask8 k, __m256d a, __m256d b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubpd
Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_sub_ps
__m128 _mm_mask_sub_ps(__m128 src, __mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubps
Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sub_ps
__m128 _mm_maskz_sub_ps(__mmask8 k, __m128 a, __m128 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubps
Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sub_ps
__m256 _mm256_mask_sub_ps(__m256 src, __mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubps
Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sub_ps
__m256 _mm256_maskz_sub_ps(__mmask8 k, __m256 a, __m256 b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vsubps
Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_abs_epi8
__m128i _mm_mask_abs_epi8(__m128i src, __mmask16 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsb
Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_abs_epi8
__m128i _mm_maskz_abs_epi8(__mmask16 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsb
Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_abs_epi8
__m256i _mm256_mask_abs_epi8(__m256i src, __mmask32 k, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsb
Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_abs_epi8
__m256i _mm256_maskz_abs_epi8(__mmask32 k, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsb
Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_abs_epi8
__m512i _mm512_abs_epi8(__m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpabsb
Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value.
_mm512_mask_abs_epi8
__m512i _mm512_mask_abs_epi8(__m512i src, __mmask64 k, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpabsb
Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_abs_epi8
__m512i _mm512_maskz_abs_epi8(__mmask64 k, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpabsb
Compute the absolute value of packed 8-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_abs_epi32
__m128i _mm_mask_abs_epi32(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsd
Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_abs_epi32
__m128i _mm_maskz_abs_epi32(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsd
Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_abs_epi32
__m256i _mm256_mask_abs_epi32(__m256i src, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsd
Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_abs_epi32
__m256i _mm256_maskz_abs_epi32(__mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsd
Compute the absolute value of packed 32-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_abs_epi64
__m128i _mm_abs_epi64(__m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsq
Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value.
_mm_mask_abs_epi64
__m128i _mm_mask_abs_epi64(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsq
Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_abs_epi64
__m128i _mm_maskz_abs_epi64(__mmask8 k, __m128i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsq
Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_abs_epi64
__m256i _mm256_abs_epi64(__m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsq
Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value.
_mm256_mask_abs_epi64
__m256i _mm256_mask_abs_epi64(__m256i src, __mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsq
Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_abs_epi64
__m256i _mm256_maskz_abs_epi64(__mmask8 k, __m256i a)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpabsq
Compute the absolute value of packed 64-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_abs_epi16
__m128i _mm_mask_abs_epi16(__m128i src, __mmask8 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsw
Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_abs_epi16
__m128i _mm_maskz_abs_epi16(__mmask8 k, __m128i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsw
Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_abs_epi16
__m256i _mm256_mask_abs_epi16(__m256i src, __mmask16 k, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsw
Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_abs_epi16
__m256i _mm256_maskz_abs_epi16(__mmask16 k, __m256i a)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpabsw
Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_abs_epi16
__m512i _mm512_abs_epi16(__m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpabsw
Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value.
_mm512_mask_abs_epi16
__m512i _mm512_mask_abs_epi16(__m512i src, __mmask32 k, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpabsw
Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_abs_epi16
__m512i _mm512_maskz_abs_epi16(__mmask32 k, __m512i a)
CPUID Flags: AVX512BW
Instruction(s): vpabsw
Compute the absolute value of packed 16-bit integers in a, and store the unsigned results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_add_epi8
__m128i _mm_mask_add_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddb
Add packed 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_add_epi8
__m128i _mm_maskz_add_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddb
Add packed 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_add_epi8
__m256i _mm256_mask_add_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddb
Add packed 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_epi8
__m256i _mm256_maskz_add_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddb
Add packed 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_add_epi8
__m512i _mm512_add_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddb
Add packed 8-bit integers in a and b, and return the results.
_mm512_mask_add_epi8
__m512i _mm512_mask_add_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddb
Add packed 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_add_epi8
__m512i _mm512_maskz_add_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddb
Add packed 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_add_epi32
__m128i _mm_mask_add_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddd
Add packed 32-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_add_epi32
__m128i _mm_maskz_add_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddd
Add packed 32-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_add_epi32
__m256i _mm256_mask_add_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddd
Add packed 32-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_epi32
__m256i _mm256_maskz_add_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddd
Add packed 32-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_add_epi64
__m128i _mm_mask_add_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddq
Add packed 64-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_add_epi64
__m128i _mm_maskz_add_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddq
Add packed 64-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_add_epi64
__m256i _mm256_mask_add_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddq
Add packed 64-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_epi64
__m256i _mm256_maskz_add_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpaddq
Add packed 64-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_adds_epi8
__m128i _mm_mask_adds_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsb
Add packed 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_adds_epi8
__m128i _mm_maskz_adds_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsb
Add packed 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_adds_epi8
__m256i _mm256_mask_adds_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsb
Add packed 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_adds_epi8
__m256i _mm256_maskz_adds_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsb
Add packed 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_adds_epi8
__m512i _mm512_adds_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddsb
Add packed 8-bit integers in a and b using saturation, and return the results.
_mm512_mask_adds_epi8
__m512i _mm512_mask_adds_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddsb
Add packed 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_adds_epi8
__m512i _mm512_maskz_adds_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddsb
Add packed 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_adds_epi16
__m128i _mm_mask_adds_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsw
Add packed 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_adds_epi16
__m128i _mm_maskz_adds_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsw
Add packed 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_adds_epi16
__m256i _mm256_mask_adds_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsw
Add packed 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_adds_epi16
__m256i _mm256_maskz_adds_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddsw
Add packed 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_adds_epi16
__m512i _mm512_adds_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddsw
Add packed 16-bit integers in a and b using saturation, and return the results.
_mm512_mask_adds_epi16
__m512i _mm512_mask_adds_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddsw
Add packed 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_adds_epi16
__m512i _mm512_maskz_adds_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddsw
Add packed 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_adds_epu8
__m128i _mm_mask_adds_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusb
Add packed unsigned 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_adds_epu8
__m128i _mm_maskz_adds_epu8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusb
Add packed unsigned 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_adds_epu8
__m256i _mm256_mask_adds_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusb
Add packed unsigned 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_adds_epu8
__m256i _mm256_maskz_adds_epu8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusb
Add packed unsigned 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_adds_epu8
__m512i _mm512_adds_epu8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddusb
Add packed unsigned 8-bit integers in a and b using saturation, and return the results.
_mm512_mask_adds_epu8
__m512i _mm512_mask_adds_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddusb
Add packed unsigned 8-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_adds_epu8
__m512i _mm512_maskz_adds_epu8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddusb
Add packed unsigned 8-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_adds_epu16
__m128i _mm_mask_adds_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusw
Add packed unsigned 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_adds_epu16
__m128i _mm_maskz_adds_epu16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusw
Add packed unsigned 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_adds_epu16
__m256i _mm256_mask_adds_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusw
Add packed unsigned 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_adds_epu16
__m256i _mm256_maskz_adds_epu16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddusw
Add packed unsigned 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_adds_epu16
__m512i _mm512_adds_epu16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddusw
Add packed unsigned 16-bit integers in a and b using saturation, and return the results.
_mm512_mask_adds_epu16
__m512i _mm512_mask_adds_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddusw
Add packed unsigned 16-bit integers in a and b using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_adds_epu16
__m512i _mm512_maskz_adds_epu16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddusw
Add packed unsigned 16-bit integers in a and b using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_add_epi16
__m128i _mm_mask_add_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddw
Add packed 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_add_epi16
__m128i _mm_maskz_add_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddw
Add packed 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_add_epi16
__m256i _mm256_mask_add_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddw
Add packed 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_add_epi16
__m256i _mm256_maskz_add_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpaddw
Add packed 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_add_epi16
__m512i _mm512_add_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddw
Add packed 16-bit integers in a and b, and return the results.
_mm512_mask_add_epi16
__m512i _mm512_mask_add_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddw
Add packed 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_add_epi16
__m512i _mm512_maskz_add_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpaddw
Add packed 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_avg_epu8
__m128i _mm_mask_avg_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgb
Average packed unsigned 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_avg_epu8
__m128i _mm_maskz_avg_epu8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgb
Average packed unsigned 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_avg_epu8
__m256i _mm256_mask_avg_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgb
Average packed unsigned 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_avg_epu8
__m256i _mm256_maskz_avg_epu8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgb
Average packed unsigned 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_avg_epu8
__m512i _mm512_avg_epu8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpavgb
Average packed unsigned 8-bit integers in a and b, and return the results.
_mm512_mask_avg_epu8
__m512i _mm512_mask_avg_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpavgb
Average packed unsigned 8-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_avg_epu8
__m512i _mm512_maskz_avg_epu8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpavgb
Average packed unsigned 8-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_avg_epu16
__m128i _mm_mask_avg_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgw
Average packed unsigned 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_avg_epu16
__m128i _mm_maskz_avg_epu16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgw
Average packed unsigned 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_avg_epu16
__m256i _mm256_mask_avg_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgw
Average packed unsigned 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_avg_epu16
__m256i _mm256_maskz_avg_epu16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpavgw
Average packed unsigned 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_avg_epu16
__m512i _mm512_avg_epu16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpavgw
Average packed unsigned 16-bit integers in a and b, and return the results.
_mm512_mask_avg_epu16
__m512i _mm512_mask_avg_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpavgw
Average packed unsigned 16-bit integers in a and b, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_avg_epu16
__m512i _mm512_maskz_avg_epu16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpavgw
Average packed unsigned 16-bit integers in a and b, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_maddubs_epi16
__m128i _mm_mask_maddubs_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddubsw
Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_maddubs_epi16
__m128i _mm_maskz_maddubs_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddubsw
Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_maddubs_epi16
__m256i _mm256_mask_maddubs_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddubsw
Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_maddubs_epi16
__m256i _mm256_maskz_maddubs_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddubsw
Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_maddubs_epi16
__m512i _mm512_maddubs_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaddubsw
Vertically multiply each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value.
_mm512_mask_maddubs_epi16
__m512i _mm512_mask_maddubs_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaddubsw
Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_maddubs_epi16
__m512i _mm512_maskz_maddubs_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaddubsw
Multiply packed unsigned 8-bit integers in a by packed signed 8-bit integers in b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_madd_epi16
__m128i _mm_mask_madd_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddwd
Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_madd_epi16
__m128i _mm_maskz_madd_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddwd
Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_madd_epi16
__m256i _mm256_mask_madd_epi16(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddwd
Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_madd_epi16
__m256i _mm256_maskz_madd_epi16(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaddwd
Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_madd_epi16
__m512i _mm512_madd_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaddwd
Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value.
_mm512_mask_madd_epi16
__m512i _mm512_mask_madd_epi16(__m512i src, __mmask16 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaddwd
Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_madd_epi16
__m512i _mm512_maskz_madd_epi16(__mmask16 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaddwd
Multiply packed 16-bit integers in a and b, producing intermediate 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the saturated results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_max_epi8
__m128i _mm_mask_max_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsb
Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epi8
__m128i _mm_maskz_max_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsb
Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_epi8
__m256i _mm256_mask_max_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsb
Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epi8
__m256i _mm256_maskz_max_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsb
Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_max_epi8
__m512i _mm512_mask_max_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxsb
Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_max_epi8
__m512i _mm512_maskz_max_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxsb
Compare packed 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_max_epi8
__m512i _mm512_max_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxsb
Compare packed 8-bit integers in a and b, and store packed maximum values in the return value.
_mm_mask_max_epi32
__m128i _mm_mask_max_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsd
Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epi32
__m128i _mm_maskz_max_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsd
Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_epi32
__m256i _mm256_mask_max_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsd
Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epi32
__m256i _mm256_maskz_max_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsd
Compare packed 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_max_epi64
__m128i _mm_mask_max_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsq
Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epi64
__m128i _mm_maskz_max_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsq
Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_max_epi64
__m128i _mm_max_epi64(__m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsq
Compare packed 64-bit integers in a and b, and store packed maximum values in the return value.
_mm256_mask_max_epi64
__m256i _mm256_mask_max_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsq
Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epi64
__m256i _mm256_maskz_max_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsq
Compare packed 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_max_epi64
__m256i _mm256_max_epi64(__m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxsq
Compare packed 64-bit integers in a and b, and store packed maximum values in the return value.
_mm_mask_max_epi16
__m128i _mm_mask_max_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsw
Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epi16
__m128i _mm_maskz_max_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsw
Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_epi16
__m256i _mm256_mask_max_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsw
Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epi16
__m256i _mm256_maskz_max_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxsw
Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_max_epi16
__m512i _mm512_mask_max_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxsw
Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_max_epi16
__m512i _mm512_maskz_max_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxsw
Compare packed 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_max_epi16
__m512i _mm512_max_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxsw
Compare packed 16-bit integers in a and b, and store packed maximum values in the return value.
_mm_mask_max_epu8
__m128i _mm_mask_max_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxub
Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epu8
__m128i _mm_maskz_max_epu8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxub
Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_epu8
__m256i _mm256_mask_max_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxub
Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epu8
__m256i _mm256_maskz_max_epu8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxub
Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_max_epu8
__m512i _mm512_mask_max_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxub
Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_max_epu8
__m512i _mm512_maskz_max_epu8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxub
Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_max_epu8
__m512i _mm512_max_epu8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxub
Compare packed unsigned 8-bit integers in a and b, and store packed maximum values in the return value.
_mm_mask_max_epu32
__m128i _mm_mask_max_epu32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxud
Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epu32
__m128i _mm_maskz_max_epu32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxud
Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_epu32
__m256i _mm256_mask_max_epu32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxud
Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epu32
__m256i _mm256_maskz_max_epu32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxud
Compare packed unsigned 32-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_max_epu64
__m128i _mm_mask_max_epu64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxuq
Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epu64
__m128i _mm_maskz_max_epu64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxuq
Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_max_epu64
__m128i _mm_max_epu64(__m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxuq
Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value.
_mm256_mask_max_epu64
__m256i _mm256_mask_max_epu64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxuq
Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epu64
__m256i _mm256_maskz_max_epu64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxuq
Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_max_epu64
__m256i _mm256_max_epu64(__m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmaxuq
Compare packed unsigned 64-bit integers in a and b, and store packed maximum values in the return value.
_mm_mask_max_epu16
__m128i _mm_mask_max_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxuw
Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_max_epu16
__m128i _mm_maskz_max_epu16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxuw
Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_max_epu16
__m256i _mm256_mask_max_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxuw
Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_max_epu16
__m256i _mm256_maskz_max_epu16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmaxuw
Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_max_epu16
__m512i _mm512_mask_max_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxuw
Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_max_epu16
__m512i _mm512_maskz_max_epu16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxuw
Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_max_epu16
__m512i _mm512_max_epu16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmaxuw
Compare packed unsigned 16-bit integers in a and b, and store packed maximum values in the return value.
_mm_mask_min_epi8
__m128i _mm_mask_min_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsb
Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epi8
__m128i _mm_maskz_min_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsb
Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_epi8
__m256i _mm256_mask_min_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsb
Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epi8
__m256i _mm256_maskz_min_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsb
Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_min_epi8
__m512i _mm512_mask_min_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminsb
Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_min_epi8
__m512i _mm512_maskz_min_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminsb
Compare packed 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_min_epi8
__m512i _mm512_min_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminsb
Compare packed 8-bit integers in a and b, and store packed minimum values in the return value.
_mm_mask_min_epi32
__m128i _mm_mask_min_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsd
Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epi32
__m128i _mm_maskz_min_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsd
Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_epi32
__m256i _mm256_mask_min_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsd
Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epi32
__m256i _mm256_maskz_min_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsd
Compare packed 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_min_epi64
__m128i _mm_mask_min_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsq
Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epi64
__m128i _mm_maskz_min_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsq
Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_min_epi64
__m128i _mm_min_epi64(__m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsq
Compare packed 64-bit integers in a and b, and store packed minimum values in the return value.
_mm256_mask_min_epi64
__m256i _mm256_mask_min_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsq
Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epi64
__m256i _mm256_maskz_min_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsq
Compare packed 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_min_epi64
__m256i _mm256_min_epi64(__m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminsq
Compare packed 64-bit integers in a and b, and store packed minimum values in the return value.
_mm_mask_min_epi16
__m128i _mm_mask_min_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsw
Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epi16
__m128i _mm_maskz_min_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsw
Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_epi16
__m256i _mm256_mask_min_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsw
Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epi16
__m256i _mm256_maskz_min_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminsw
Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_min_epi16
__m512i _mm512_mask_min_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminsw
Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_min_epi16
__m512i _mm512_maskz_min_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminsw
Compare packed 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_min_epi16
__m512i _mm512_min_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminsw
Compare packed 16-bit integers in a and b, and store packed minimum values in the return value.
_mm_mask_min_epu8
__m128i _mm_mask_min_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminub
Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epu8
__m128i _mm_maskz_min_epu8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminub
Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_epu8
__m256i _mm256_mask_min_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminub
Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epu8
__m256i _mm256_maskz_min_epu8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminub
Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_min_epu8
__m512i _mm512_mask_min_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminub
Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_min_epu8
__m512i _mm512_maskz_min_epu8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminub
Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_min_epu8
__m512i _mm512_min_epu8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminub
Compare packed unsigned 8-bit integers in a and b, and store packed minimum values in the return value.
_mm_mask_min_epu32
__m128i _mm_mask_min_epu32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminud
Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epu32
__m128i _mm_maskz_min_epu32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminud
Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_epu32
__m256i _mm256_mask_min_epu32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminud
Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epu32
__m256i _mm256_maskz_min_epu32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminud
Compare packed unsigned 32-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_min_epu64
__m128i _mm_mask_min_epu64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminuq
Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epu64
__m128i _mm_maskz_min_epu64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminuq
Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_min_epu64
__m128i _mm_min_epu64(__m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminuq
Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value.
_mm256_mask_min_epu64
__m256i _mm256_mask_min_epu64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminuq
Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epu64
__m256i _mm256_maskz_min_epu64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminuq
Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_min_epu64
__m256i _mm256_min_epu64(__m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpminuq
Compare packed unsigned 64-bit integers in a and b, and store packed minimum values in the return value.
_mm_mask_min_epu16
__m128i _mm_mask_min_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminuw
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_min_epu16
__m128i _mm_maskz_min_epu16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminuw
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_min_epu16
__m256i _mm256_mask_min_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminuw
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_min_epu16
__m256i _mm256_maskz_min_epu16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpminuw
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_min_epu16
__m512i _mm512_mask_min_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminuw
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_min_epu16
__m512i _mm512_maskz_min_epu16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminuw
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_min_epu16
__m512i _mm512_min_epu16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpminuw
Compare packed unsigned 16-bit integers in a and b, and store packed minimum values in the return value.
_mm_mask_mul_epi32
__m128i _mm_mask_mul_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuldq
Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mul_epi32
__m128i _mm_maskz_mul_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuldq
Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mul_epi32
__m256i _mm256_mask_mul_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuldq
Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mul_epi32
__m256i _mm256_maskz_mul_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuldq
Multiply the low 32-bit integers from each packed 64-bit element in a and b, and store the signed 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_mulhrs_epi16
__m128i _mm_mask_mulhrs_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhrsw
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mulhrs_epi16
__m128i _mm_maskz_mulhrs_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhrsw
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mulhrs_epi16
__m256i _mm256_mask_mulhrs_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhrsw
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mulhrs_epi16
__m256i _mm256_maskz_mulhrs_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhrsw
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_mulhrs_epi16
__m512i _mm512_mask_mulhrs_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhrsw
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_mulhrs_epi16
__m512i _mm512_maskz_mulhrs_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhrsw
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mulhrs_epi16
__m512i _mm512_mulhrs_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhrsw
Multiply packed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to the return value.
_mm_mask_mulhi_epu16
__m128i _mm_mask_mulhi_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhuw
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mulhi_epu16
__m128i _mm_maskz_mulhi_epu16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhuw
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mulhi_epu16
__m256i _mm256_mask_mulhi_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhuw
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mulhi_epu16
__m256i _mm256_maskz_mulhi_epu16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhuw
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_mulhi_epu16
__m512i _mm512_mask_mulhi_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhuw
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_mulhi_epu16
__m512i _mm512_maskz_mulhi_epu16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhuw
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mulhi_epu16
__m512i _mm512_mulhi_epu16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhuw
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value.
_mm_mask_mulhi_epi16
__m128i _mm_mask_mulhi_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mulhi_epi16
__m128i _mm_maskz_mulhi_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mulhi_epi16
__m256i _mm256_mask_mulhi_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mulhi_epi16
__m256i _mm256_maskz_mulhi_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmulhw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_mulhi_epi16
__m512i _mm512_mask_mulhi_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_mulhi_epi16
__m512i _mm512_maskz_mulhi_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mulhi_epi16
__m512i _mm512_mulhi_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmulhw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in the return value.
_mm_mask_mullo_epi32
__m128i _mm_mask_mullo_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmulld
Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mullo_epi32
__m128i _mm_maskz_mullo_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmulld
Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mullo_epi32
__m256i _mm256_mask_mullo_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmulld
Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mullo_epi32
__m256i _mm256_maskz_mullo_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmulld
Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_mullo_epi64
__m128i _mm_mask_mullo_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mullo_epi64
__m128i _mm_maskz_mullo_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mullo_epi64
__m128i _mm_mullo_epi64(__m128i a, __m128i b)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value.
_mm256_mask_mullo_epi64
__m256i _mm256_mask_mullo_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mullo_epi64
__m256i _mm256_maskz_mullo_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mullo_epi64
__m256i _mm256_mullo_epi64(__m256i a, __m256i b)
CPUID Flags: AVX512DQ, AVX512VL
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value.
_mm512_mask_mullo_epi64
__m512i _mm512_mask_mullo_epi64(__m512i src, __mmask8 k, __m512i a, __m512i b)
CPUID Flags: AVX512DQ
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_mullo_epi64
__m512i _mm512_maskz_mullo_epi64(__mmask8 k, __m512i a, __m512i b)
CPUID Flags: AVX512DQ
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mullo_epi64
__m512i _mm512_mullo_epi64(__m512i a, __m512i b)
CPUID Flags: AVX512DQ
Instruction(s): vpmullq
Multiply the packed 64-bit integers in a and b, producing intermediate 128-bit integers, and store the low 64 bits of the intermediate integers in the return value.
_mm_mask_mullo_epi16
__m128i _mm_mask_mullo_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmullw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mullo_epi16
__m128i _mm_maskz_mullo_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmullw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mullo_epi16
__m256i _mm256_mask_mullo_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmullw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mullo_epi16
__m256i _mm256_maskz_mullo_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpmullw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_mullo_epi16
__m512i _mm512_mask_mullo_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmullw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_mullo_epi16
__m512i _mm512_maskz_mullo_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmullw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mullo_epi16
__m512i _mm512_mullo_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpmullw
Multiply the packed 16-bit integers in a and b, producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in the return value.
_mm_mask_mul_epu32
__m128i _mm_mask_mul_epu32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuludq
Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_mul_epu32
__m128i _mm_maskz_mul_epu32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuludq
Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_mul_epu32
__m256i _mm256_mask_mul_epu32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuludq
Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_mul_epu32
__m256i _mm256_maskz_mul_epu32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpmuludq
Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and store the unsigned 64-bit results in the return value using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_sub_epi8
__m128i _mm_mask_sub_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubb
Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sub_epi8
__m128i _mm_maskz_sub_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubb
Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sub_epi8
__m256i _mm256_mask_sub_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubb
Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sub_epi8
__m256i _mm256_maskz_sub_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubb
Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_sub_epi8
__m512i _mm512_mask_sub_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubb
Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_sub_epi8
__m512i _mm512_maskz_sub_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubb
Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_sub_epi8
__m512i _mm512_sub_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubb
Subtract packed 8-bit integers in b from packed 8-bit integers in a, and return the results.
_mm_mask_sub_epi32
__m128i _mm_mask_sub_epi32(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubd
Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sub_epi32
__m128i _mm_maskz_sub_epi32(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubd
Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sub_epi32
__m256i _mm256_mask_sub_epi32(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubd
Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sub_epi32
__m256i _mm256_maskz_sub_epi32(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubd
Subtract packed 32-bit integers in b from packed 32-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_sub_epi64
__m128i _mm_mask_sub_epi64(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubq
Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sub_epi64
__m128i _mm_maskz_sub_epi64(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubq
Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sub_epi64
__m256i _mm256_mask_sub_epi64(__m256i src, __mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubq
Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sub_epi64
__m256i _mm256_maskz_sub_epi64(__mmask8 k, __m256i a, __m256i b)
CPUID Flags: AVX512F, AVX512VL
Instruction(s): vpsubq
Subtract packed 64-bit integers in b from packed 64-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_mask_subs_epi8
__m128i _mm_mask_subs_epi8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsb
Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_subs_epi8
__m128i _mm_maskz_subs_epi8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsb
Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_subs_epi8
__m256i _mm256_mask_subs_epi8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsb
Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_subs_epi8
__m256i _mm256_maskz_subs_epi8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsb
Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_subs_epi8
__m512i _mm512_mask_subs_epi8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubsb
Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_subs_epi8
__m512i _mm512_maskz_subs_epi8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubsb
Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_subs_epi8
__m512i _mm512_subs_epi8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubsb
Subtract packed 8-bit integers in b from packed 8-bit integers in a using saturation, and return the results.
_mm_mask_subs_epi16
__m128i _mm_mask_subs_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsw
Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_subs_epi16
__m128i _mm_maskz_subs_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsw
Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_subs_epi16
__m256i _mm256_mask_subs_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsw
Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_subs_epi16
__m256i _mm256_maskz_subs_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubsw
Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_subs_epi16
__m512i _mm512_mask_subs_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubsw
Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_subs_epi16
__m512i _mm512_maskz_subs_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubsw
Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_subs_epi16
__m512i _mm512_subs_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubsw
Subtract packed 16-bit integers in b from packed 16-bit integers in a using saturation, and return the results.
_mm_mask_subs_epu8
__m128i _mm_mask_subs_epu8(__m128i src, __mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusb
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_subs_epu8
__m128i _mm_maskz_subs_epu8(__mmask16 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusb
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_subs_epu8
__m256i _mm256_mask_subs_epu8(__m256i src, __mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusb
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_subs_epu8
__m256i _mm256_maskz_subs_epu8(__mmask32 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusb
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_subs_epu8
__m512i _mm512_mask_subs_epu8(__m512i src, __mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubusb
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_subs_epu8
__m512i _mm512_maskz_subs_epu8(__mmask64 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubusb
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_subs_epu8
__m512i _mm512_subs_epu8(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubusb
Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation, and return the results.
_mm_mask_subs_epu16
__m128i _mm_mask_subs_epu16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusw
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_subs_epu16
__m128i _mm_maskz_subs_epu16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusw
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_subs_epu16
__m256i _mm256_mask_subs_epu16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusw
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_subs_epu16
__m256i _mm256_maskz_subs_epu16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubusw
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_subs_epu16
__m512i _mm512_mask_subs_epu16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubusw
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_subs_epu16
__m512i _mm512_maskz_subs_epu16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubusw
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_subs_epu16
__m512i _mm512_subs_epu16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubusw
Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation, and return the results.
_mm_mask_sub_epi16
__m128i _mm_mask_sub_epi16(__m128i src, __mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubw
Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm_maskz_sub_epi16
__m128i _mm_maskz_sub_epi16(__mmask8 k, __m128i a, __m128i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubw
Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_mask_sub_epi16
__m256i _mm256_mask_sub_epi16(__m256i src, __mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubw
Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm256_maskz_sub_epi16
__m256i _mm256_maskz_sub_epi16(__mmask16 k, __m256i a, __m256i b)
CPUID Flags: AVX512BW, AVX512VL
Instruction(s): vpsubw
Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_mask_sub_epi16
__m512i _mm512_mask_sub_epi16(__m512i src, __mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubw
Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using writemask k (elements are copied from src when the corresponding mask bit is not set).
_mm512_maskz_sub_epi16
__m512i _mm512_maskz_sub_epi16(__mmask32 k, __m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubw
Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_sub_epi16
__m512i _mm512_sub_epi16(__m512i a, __m512i b)
CPUID Flags: AVX512BW
Instruction(s): vpsubw
Subtract packed 16-bit integers in b from packed 16-bit integers in a, and return the results.
_mm_madd52hi_epu64
__m128i _mm_madd52hi_epu64(__m128i a, __m128i b, __m128i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.
_mm_mask_madd52hi_epu64
__m128i _mm_mask_madd52hi_epu64(__m128i a, __mmask8 k, __m128i b, __m128i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_maskz_madd52hi_epu64
__m128i _mm_maskz_madd52hi_epu64(__mmask8 k, __m128i a, __m128i b, __m128i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_madd52hi_epu64
__m256i _mm256_madd52hi_epu64(__m256i a, __m256i b, __m256i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.
_mm256_mask_madd52hi_epu64
__m256i _mm256_mask_madd52hi_epu64(__m256i a, __mmask8 k, __m256i b, __m256i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_maskz_madd52hi_epu64
__m256i _mm256_maskz_madd52hi_epu64(__mmask8 k, __m256i a, __m256i b, __m256i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_madd52hi_epu64
__m512i _mm512_madd52hi_epu64(__m512i a, __m512i b, __m512i c);
CPUID Flags: AVX512IFMA52
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.
_mm512_mask_madd52hi_epu64
__m512i _mm512_mask_madd52hi_epu64(__m512i a, __mmask8 k, __m512i b, __m512i c);
CPUID Flags: AVX512IFMA52
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_maskz_madd52hi_epu64
__m512i _mm512_maskz_madd52hi_epu64(__mmask8 k, __m512i a, __m512i b, __m512i c);
CPUID Flags: AVX512IFMA52
Instruction(s): vpmadd52huq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the high 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm_madd52lo_epu64
__m128i _mm_madd52lo_epu64(__m128i a, __m128i b, __m128i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.
_mm_mask_madd52lo_epu64
__m128i _mm_mask_madd52lo_epu64(__m128i a, __mmask8 k, __m128i b, __m128i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm_maskz_madd52lo_epu64
__m128i _mm_maskz_madd52lo_epu64(__mmask8 k, __m128i a, __m128i b, __m128i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm256_madd52lo_epu64
__m256i _mm256_madd52lo_epu64(__m256i a, __m256i b, __m256i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.
_mm256_mask_madd52lo_epu64
__m256i _mm256_mask_madd52lo_epu64(__m256i a, __mmask8 k, __m256i b, __m256i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm256_maskz_madd52lo_epu64
__m256i _mm256_maskz_madd52lo_epu64(__mmask8 k, __m256i a, __m256i b, __m256i c);
CPUID Flags: AVX512IFMA52, AVX512VL
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).
_mm512_madd52lo_epu64
__m512i _mm512_madd52lo_epu64(__m512i a, __m512i b, __m512i c);
CPUID Flags: AVX512IFMA52
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result.
_mm512_mask_madd52lo_epu64
__m512i _mm512_mask_madd52lo_epu64(__m512i a, __mmask8 k, __m512i b, __m512i c);
CPUID Flags: AVX512IFMA52
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using writemask k (elements are copied from a when the corresponding mask bit is not set).
_mm512_maskz_madd52lo_epu64
__m512i _mm512_maskz_madd52lo_epu64(__mmask8 k, __m512i a, __m512i b, __m512i c);
CPUID Flags: AVX512IFMA52
Instruction(s): vpmadd52luq
Multiply packed unsigned 52-bit integers in each 64-bit element of b and c to form a 104-bit intermediate result. Add the low 52-bit unsigned integer from the intermediate result with the corresponding unsigned 64-bit integer in a, and return the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).