Intrinsics for FP Reduction Operations

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Intrinsics for FP Reduction Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

Intrinsic Name	Operation	Corresponding Intel® AVX-512 Instruction
`_mm512_reduce_add_pd`, `_mm512_mask_reduce_add_pd`	Reduce float64 elements by addition.	None.
`_mm512_reduce_add_ps`, `_mm512_mask_reduce_add_ps`	Reduce float32 elements by addition.	None.
`_mm512_reduce_max_pd`, `_mm512_mask_reduce_max_pd`	Reduce float64 elements by maximum.	None.
`_mm512_reduce_max_ps`, `_mm512_mask_reduce_max_ps`	Reduce float32 elements by maximum.	None.
`_mm512_reduce_min_pd`, `_mm512_mask_reduce_min_pd`	Reduce float64 elements by minimum.	None.
`_mm512_reduce_min_ps`, `_mm512_mask_reduce_min_ps`	Reduce float32 elements by minimum.	None.
`_mm512_reduce_mul_pd`, `_mm512_mask_reduce_mul_pd`	Reduce float64 elements by multiplication.	None.
`_mm512_reduce_mul_ps`, `_mm512_mask_reduce_mul_ps`	Reduce float32 elements by multiplication.	None.

variable	definition
`k`	writemask
`a`	first source vector element

_mm512_reduce_add_pd

extern double __cdecl _mm512_reduce_add_pd(__m512d a);

Reduces packed float64 elements in a by addition.

Returns the sum of all elements in a.

_mm512_mask_reduce_add_pd

extern double __cdecl _mm512_mask_reduce_add_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by addition using writemask k.

Returns the sum of all active elements in a.

_mm512_reduce_add_ps

extern float __cdecl _mm512_reduce_add_ps(__m512 a);

Reduces packed float32 elements in a by addition.

Returns the sum of all elements in a.

_mm512_mask_reduce_add_ps

extern float __cdecl _mm512_mask_reduce_add_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by addition using writemask k.

Returns the sum of all active elements in a.

_mm512_reduce_max_pd

extern double __cdecl _mm512_reduce_max_pd(__m512d a);

Reduces packed float64 elements in a by maximum.

Returns the maximum of all elements in a.

_mm512_mask_reduce_max_pd

extern double __cdecl _mm512_mask_reduce_max_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by maximum, using writemask k.

Returns the maximum of all active elements in a.

_mm512_reduce_max_ps

extern float __cdecl _mm512_reduce_max_ps(__m512 a);

Reduces packed float32 elements in a by maximum.

Returns the maximum of all elements in a.

_mm512_mask_reduce_max_ps

extern float __cdecl _mm512_mask_reduce_max_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by maximum, using writemask k.

Returns the maximum of all active elements in a.

_mm512_reduce_min_pd

extern double __cdecl _mm512_reduce_min_pd(__m512d a);

Reduces packed float64 elements in a by minimum.

Returns the minimum of all elements in a.

_mm512_mask_reduce_min_pd

extern double __cdecl _mm512_mask_reduce_min_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by minimum, using writemask k.

Returns the minimum of all active elements in a.

_mm512_reduce_min_ps

extern float __cdecl _mm512_reduce_min_ps(__m512 a);

Reduces packed float32 elements in a by minimum.

Returns the minimum of all elements in a.

_mm512_mask_reduce_min_ps

extern float __cdecl _mm512_mask_reduce_min_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by minimum, using writemask k.

Returns the minimum of all active elements in a.

_mm512_reduce_mul_pd

extern double __cdecl _mm512_reduce_mul_pd(__m512d a);

Reduces packed float64 elements in a by multiplication.

Returns the product of all elements in a.

_mm512_mask_reduce_mul_pd

extern double __cdecl _mm512_mask_reduce_mul_pd(__mmask8 k, __m512d a);

Reduces packed float64 elements in a by multiplication, using writemask k.

Returns the product of all active elements in a.

_mm512_reduce_mul_ps

extern float __cdecl _mm512_reduce_mul_ps(__m512 a);

Reduces packed float32 elements in a by multiplication.

Returns the product of all elements in a.

_mm512_mask_reduce_mul_ps

extern float __cdecl _mm512_mask_reduce_mul_ps(__mmask16 k, __m512 a);

Reduces packed float32 elements in a by multiplication, using writemask k.

Returns the product of all active elements in a.

Parent topic: Intrinsics for Reduction Operations

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Intrinsics for FP Reduction Operations