Intrinsics for FP Shuffle Operations

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-24D3F6DC-C911-4445-9766-F866590D9DFC

View Details

Intrinsics for FP Shuffle Operations

The prototypes for Intel® Advanced Vector Extensions 512 (Intel® AVX-512) intrinsics are located in the zmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

Intrinsic Name	Operation	Corresponding Intel® AVX-512 Instruction
`_mm512_shuffle_pd`, `_mm512_mask_shuffle_pd`, `_mm512_maskz_shuffle_pd`	Shuffle float64 values.	`VSHUFPD`
`_mm512_shuffle_ps`, `_mm512_mask_shuffle_ps`, `_mm512_maskz_shuffle_ps`	Shuffle float32 values.	`VSHUFPS`
`_mm512_shuffle_f64x2`, `_mm512_mask_shuffle_f64x2`, `_mm512_maskz_shuffle_f64x2`	Shuffle float64 values and store using mask.	`VSHUFF64X2`
`_mm512_shuffle_f32x4`, `_mm512_mask_shuffle_f32x4`, `_mm512_maskz_shuffle_f32x4`	Shuffle float32 values and store using mask.	`VSHUFF32X4`

variable	definition
`k`	writemask used as a selector
`a`	first source vector element
`b`	second source vector element
`src`	source element to use based on writemask result
`imm`	vector element selector

_mm512_shuffle_f32x4

extern __m512 __cdecl _mm512_shuffle_f32x4(__m512 a, __m512 b, const int imm);

Shuffles four float32 elements from a and b, selected by imm, and stores the result.

_mm512_mask_shuffle_f32x4

extern __m512 __cdecl _mm512_mask_shuffle_f32x4(__m512 src, __mmask16 k, __m512 a, __m512 b, const int imm);

Shuffles four float32 elements from a and b, selected by imm, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_maskz_shuffle_f32x4

extern __m512 __cdecl _mm512_maskz_shuffle_f32x4(__mmask16 k, __m512 a, __m512 b, const int imm);

Shuffles four float32 elements from a and b, selected by imm, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_shuffle_f64x2

extern __m512d __cdecl _mm512_shuffle_f64x2(__m512d a, __m512d b, const int imm);

Shuffles 128-bits (composed of two float64 elements from a and b, selected by imm, and stores the result.

_mm512_mask_shuffle_f64x2

extern __m512d __cdecl _mm512_mask_shuffle_f64x2(__m512d src, __mmask8 k, __m512d a, __m512d b, const int imm);

Shuffles 128-bits (composed of two float64 elements from a and b, selected by imm, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_maskz_shuffle_f64x2

extern __m512d __cdecl _mm512_maskz_shuffle_f64x2(__mmask8 k, __m512d a, __m512d b, const int imm);

Shuffles 128-bits (composed of two float64 elements from a and b, selected by imm, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_shuffle_pd

extern __m512d __cdecl _mm512_shuffle_pd(__m512d a, __m512d b, const int imm);

Shuffles float64 elements from vectors a and b within 128-bit lanes using the control in imm, and stores the result.

_mm512_mask_shuffle_pd

extern __m512d __cdecl _mm512_mask_shuffle_pd(__m512d src, __mmask8 k, __m512d a, __m512d b, const int imm);

Shuffle float64 elements from vectors a and b within 128-bit lanes using the control in imm, and stores the result using writemask k (elements are copied from src when the corresponding mask bit is not set).

_mm512_maskz_shuffle_pd

extern __m512d __cdecl _mm512_maskz_shuffle_pd(__mmask8 k, __m512d a, __m512d b, const int imm);

Shuffle float64 elements from vectors a and b within 128-bit lanes using the control in imm, and stores the result using zeromask k (elements are zeroed out when the corresponding mask bit is not set).

_mm512_shuffle_ps

extern __m512 __cdecl _mm512_shuffle_ps(__m512 a, __m512 b, const int imm);

Shuffles float32 elements from vectors a and b within 128-bit lanes using the control in imm, and stores the result.

_mm512_mask_shuffle_ps

extern __m512 __cdecl _mm512_mask_shuffle_ps(__m512 src, __mmask16 k, __m512 a, __m512 b, const int imm);

Shuffle float32 elements from vectors a and b within 128-bit lanes using the control in imm, and stores the result using writemask k.

Elements are copied from src when the corresponding mask bit is not set.

_mm512_maskz_shuffle_ps

extern __m512 __cdecl _mm512_maskz_shuffle_ps(__mmask16 k, __m512 a, __m512 b, const int imm);

Shuffle float32 elements from vectors a and b within 128-bit lanes using the control in imm, and stores the result using zeromask k.

Elements are zeroed out when the corresponding mask bit is not set.

Parent topic: Intrinsics for Shuffle Operations

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Intrinsics for FP Shuffle Operations