Integer Intrinsics

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 7/13/2023

Version

Public

Integer Intrinsics

The prototypes for Intel® Streaming SIMD Extensions (Intel® SSE) intrinsics for integer operations are in the xmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

The results of each intrinsic operation are placed in registers. The information about what is placed in each register appears in the tables below, in the detailed explanation of each intrinsic. R, R0, R1, ..., R7 represent the registers in which results are placed.

Before using these intrinsics, you must empty the multimedia state for the MMX™ technology register. See The EMMS Instruction: Why You Need It for more details.

Intrinsic Name	Operation	Corresponding Intel® SSE Instruction
_mm_extract_pi16	Extract one of four words	PEXTRW
_mm_insert_pi16	Insert word	PINSRW
_mm_max_pi16	Compute maximum	PMAXSW
_mm_max_pu8	Compute maximum, unsigned	PMAXUB
_mm_min_pi16	Compute minimum	PMINSW
_mm_min_pu8	Compute minimum, unsigned	PMINUB
_mm_movemask_pi8	Create eight-bit mask	PMOVMSKB
_mm_mulhi_pu16	Multiply, return high bits	PMULHUW
_mm_shuffle_pi16	Return a combination of four words	PSHUFW
_mm_maskmove_si64	Conditional Store	MASKMOVQ
_mm_avg_pu8	Compute rounded average	PAVGB
_mm_avg_pu16	Compute rounded average	PAVGW
_mm_sad_pu8	Compute sum of absolute differences	PSADBW

_mm_extract_pi16

int _mm_extract_pi16(__m64 a, int n);

Extracts one of the four words of a. The selector n must be an immediate.

R
(n==0) ? a0 : ( (n==1) ? a1 : ( (n==2) ? a2 : a3 ) )

_mm_insert_pi16

__m64 _mm_insert_pi16(__m64 a, int d, int n);

Inserts word d into one of four words of a. The selector n must be an immediate.

R0	R1	R2	R3
(n==0) ? d : a0;	(n==1) ? d : a1;	(n==2) ? d : a2;	(n==3) ? d : a3;

_mm_max_pi16

__m64 _mm_max_pi16(__m64 a, __m64 b);

Computes the element-wise maximum of the words in a and b.

R0	R1	R2	R3
min(a0, b0)	min(a1, b1)	min(a2, b2)	min(a3, b3)

_mm_max_pu8

__m64 _mm_max_pu8(__m64 a, __m64 b);

Computes the element-wise maximum of the unsigned bytes in a and b.

R0	R1	...	R7
min(a0, b0)	min(a1, b1)	...	min(a7, b7)

_mm_min_pi16

__m64 _mm_min_pi16(__m64 a, __m64 b);

Computes the element-wise minimum of the words in a and b.

R0	R1	R2	R3
min(a0, b0)	min(a1, b1)	min(a2, b2)	min(a3, b3)

_mm_min_pu8

__m64 _mm_min_pu8(__m64 a, __m64 b);

Computes the element-wise minimum of the unsigned bytes in a and b.

R0	R1	...	R7
min(a0, b0)	min(a1, b1)	...	min(a7, b7)

_mm_movemask_pi8

__m64 _mm_movemask_pi8(__m64 b);

Creates an 8-bit mask from the most significant bits of the bytes in a.

R
sign(a7)<<7 \| sign(a6)<<6 \|... \| sign(a0)

_mm_mulhi_pu16

__m64 _mm_mulhi_pu16(__m64 a, __m64 b);

Multiplies the unsigned words in a and b, returning the upper 16 bits of the 32-bit intermediate results.

R0	R1	R2	R3
hiword(a0 * b0)	hiword(a1 * b1)	hiword(a2 * b2)	hiword(a3 * b3)

_mm_shuffle_pi16

__m64 _mm_shuffle_pi16(__m64 a, int n);

Returns a combination of the four words of a. The selector n must be an immediate.

R0	R1	R2	R3
word (n&0x3) of a	word ((n>>2)&0x3) of a	word ((n>>4)&0x3) of a	word ((n>>6)&0x3) of a

_mm_maskmove_si64

void _mm_maskmove_si64(__m64 d, __m64 n, char *p);

Conditionally stores byte elements of d to address p. The high bit of each byte in the selector p determines whether the corresponding byte in d will be stored.

if (sign(n0))	if (sign(n1))	...	if (sign(n7))
p[0] := d0	p[1] := d1	...	p[7] := d7

_mm_avg_pu8

__m64 _mm_avg_pu8(__m64 a, __m64 b);

Computes the (rounded) averages of the unsigned bytes in a and b.

R0	R1	...	R7
(t >> 1) \| (t & 0x01), where t = (unsigned char)a0 + (unsigned char)b0	(t >> 1) \| (t & 0x01), where t = (unsigned char)a1 + (unsigned char)b1	...	((t >> 1) \| (t & 0x01)), where t = (unsigned char)a7 + (unsigned char)b7

_mm_avg_pu16

__m64 _mm_avg_pu16(__m64 a, __m64 b);

Computes the (rounded) averages of the unsigned short in a and b.

R0	R1	...	R7
(t >> 1) \| (t & 0x01), where t = (unsigned int)a0 + (unsigned int)b0	(t >> 1) \| (t & 0x01), where t = (unsigned int)a1 + (unsigned int)b1	...	(t >> 1) \| (t & 0x01), where t = (unsigned int)a7 + (unsigned int)b7

_mm_sad_pu8

__m64 _mm_sad_pu8(__m64 a, __m64 b);

Computes the sum of the absolute differences of the unsigned bytes in a and b, returning the value in the lower word. The upper three words are cleared.

R0	R1	R2	R3
abs(a0-b0) +... + abs(a7-b7)	0	0	0

Parent topic: Intrinsics for Intel® Streaming SIMD Extensions (Intel® SSE)

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Integer Intrinsics

_mm_extract_pi16

_mm_insert_pi16

_mm_max_pi16

_mm_max_pu8

_mm_min_pi16

_mm_min_pu8

_mm_movemask_pi8

_mm_mulhi_pu16

_mm_shuffle_pi16

_mm_maskmove_si64

_mm_avg_pu8

_mm_avg_pu16

_mm_sad_pu8