Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Intrinsics for FP Reduction Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>


Intrinsic Name

Operation

Corresponding
Intel® AVX-512 Instruction

_mm512_reduce_add_pd, _mm512_mask_reduce_add_pd

Reduce float64 elements by addition.

None.

_mm512_reduce_add_ps, _mm512_mask_reduce_add_ps

Reduce float32 elements by addition.

None.

_mm512_reduce_max_pd, _mm512_mask_reduce_max_pd

Reduce float64 elements by maximum.

None.

_mm512_reduce_max_ps, _mm512_mask_reduce_max_ps

Reduce float32 elements by maximum.

None.

_mm512_reduce_min_pd, _mm512_mask_reduce_min_pd

Reduce float64 elements by minimum.

None.

_mm512_reduce_min_ps, _mm512_mask_reduce_min_ps

Reduce float32 elements by minimum.

None.

_mm512_reduce_mul_pd, _mm512_mask_reduce_mul_pd

Reduce float64 elements by multiplication.

None.

_mm512_reduce_mul_ps, _mm512_mask_reduce_mul_ps

Reduce float32 elements by multiplication.

None.

variable definition
k

writemask

a

first source vector element


_mm512_reduce_add_pd

extern double __cdecl _mm512_reduce_add_pd(__m512d a);

Reduces packed float64 elements in a by addition.

Returns the sum of all elements in a.



_mm512_mask_reduce_add_pd

extern double __cdecl _mm512_mask_reduce_add_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by addition using writemask k.

Returns the sum of all active elements in a.



_mm512_reduce_add_ps

extern float __cdecl _mm512_reduce_add_ps(__m512 a);

Reduces packed float32 elements in a by addition.

Returns the sum of all elements in a.



_mm512_mask_reduce_add_ps

extern float __cdecl _mm512_mask_reduce_add_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by addition using writemask k.

Returns the sum of all active elements in a.



_mm512_reduce_max_pd

extern double __cdecl _mm512_reduce_max_pd(__m512d a);

Reduces packed float64 elements in a by maximum.

Returns the maximum of all elements in a.



_mm512_mask_reduce_max_pd

extern double __cdecl _mm512_mask_reduce_max_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by maximum, using writemask k.

Returns the maximum of all active elements in a.



_mm512_reduce_max_ps

extern float __cdecl _mm512_reduce_max_ps(__m512 a);

Reduces packed float32 elements in a by maximum.

Returns the maximum of all elements in a.



_mm512_mask_reduce_max_ps

extern float __cdecl _mm512_mask_reduce_max_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by maximum, using writemask k.

Returns the maximum of all active elements in a.



_mm512_reduce_min_pd

extern double __cdecl _mm512_reduce_min_pd(__m512d a);

Reduces packed float64 elements in a by minimum.

Returns the minimum of all elements in a.



_mm512_mask_reduce_min_pd

extern double __cdecl _mm512_mask_reduce_min_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by minimum, using writemask k.

Returns the minimum of all active elements in a.



_mm512_reduce_min_ps

extern float __cdecl _mm512_reduce_min_ps(__m512 a);

Reduces packed float32 elements in a by minimum.

Returns the minimum of all elements in a.



_mm512_mask_reduce_min_ps

extern float __cdecl _mm512_mask_reduce_min_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by minimum, using writemask k.

Returns the minimum of all active elements in a.



_mm512_reduce_mul_pd

extern double __cdecl _mm512_reduce_mul_pd(__m512d a);

Reduces packed float64 elements in a by multiplication.

Returns the product of all elements in a.



_mm512_mask_reduce_mul_pd

extern double __cdecl _mm512_mask_reduce_mul_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by multiplication, using writemask k.

Returns the product of all active elements in a.



_mm512_reduce_mul_ps

extern float __cdecl _mm512_reduce_mul_ps(__m512 a);

Reduces packed float32 elements in a by multiplication.

Returns the product of all elements in a.



_mm512_mask_reduce_mul_ps

extern float __cdecl _mm512_mask_reduce_mul_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by multiplication, using writemask k.

Returns the product of all active elements in a.