Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Packed Arithmetic Intrinsics (MMX™ technology)

This topic summarizes the MMX™ technology packed arithmetic intrinsics.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

Intrinsic Name

Operation

Corresponding
MMX™ Instruction

_mm_add_pi8

Addition

PADDB

_mm_add_pi16

Addition

PADDW

_mm_add_pi32

Addition

PADDD

_mm_adds_pi8

Addition

PADDSB

_mm_adds_pi16

Addition

PADDSW

_mm_adds_pu8

Addition

PADDUSB

_mm_adds_pu16

Addition

PADDUSW

_mm_sub_pi8

Subtraction

PSUBB

_mm_sub_pi16

Subtraction

PSUBW

_mm_sub_pi32

Subtraction

PSUBD

_mm_subs_pi8

Subtraction

PSUBSB

_mm_subs_pi16

Subtraction

PSUBSW

_mm_subs_pu8

Subtraction

PSUBUSB

_mm_subs_pu16

Subtraction

PSUBUSW

_mm_madd_pi16

Multiply and add

PMADDWD

_mm_mulhi_pi16

Multiplication

PMULHW

_mm_mullo_pi16

Multiplication

PMULLW

_mm_add_pi8

__m64 _mm_add_pi8(__m64 m1, __m64 m2);

Add the eight 8-bit values in m1 to the eight 8-bit values in m2.

_mm_add_pi16

__m64 _mm_add_pi16(__m64 m1, __m64 m2);

Add the four 16-bit values in m1 to the four 16-bit values in m2.

_mm_add_pi32

__m64 _mm_add_pi32(__m64 m1, __m64 m2);

Add the two 32-bit values in m1 to the two 32-bit values in m2.

_mm_adds_pi8

__m64 _mm_adds_pi8(__m64 m1, __m64 m2);

Add the eight signed 8-bit values in m1 to the eight signed 8-bit values in m2 using saturating arithmetic.

_mm_adds_pi16

__m64 _mm_adds_pi16(__m64 m1, __m64 m2);

Add the four signed 16-bit values in m1 to the four signed 16-bit values in m2 using saturating arithmetic.

_mm_adds_pu8

__m64 _mm_adds_pu8(__m64 m1, __m64 m2);

Add the eight unsigned 8-bit values in m1 to the eight unsigned 8-bit values in m2 and using saturating arithmetic.

_mm_adds_pu16

__m64 _mm_adds_pu16(__m64 m1, __m64 m2);

Add the four unsigned 16-bit values in m1 to the four unsigned 16-bit values in m2 using saturating arithmetic.

_mm_sub_pi8

__m64 _mm_sub_pi8(__m64 m1, __m64 m2);

Subtract the eight 8-bit values in m2 from the eight 8-bit values in m1.

_mm_sub_pi16

__m64 _mm_sub_pi16(__m64 m1, __m64 m2);

Subtract the four 16-bit values in m2 from the four 16-bit values in m1.

_mm_sub_pi32

__m64 _mm_sub_pi32(__m64 m1, __m64 m2);

Subtract the two 32-bit values in m2 from the two 32-bit values in m1.

_mm_subs_pi8

__m64 _mm_subs_pi8(__m64 m1, __m64 m2);

Subtract the eight signed 8-bit values in m2 from the eight signed 8-bit values in m1 using saturating arithmetic.

_mm_subs_pi16

__m64 _mm_subs_pi16(__m64 m1, __m64 m2);

Subtract the four signed 16-bit values in m2 from the four signed 16-bit values in m1 using saturating arithmetic.

_mm_subs_pu8

__m64 _mm_subs_pu8(__m64 m1, __m64 m2);

Subtract the eight unsigned 8-bit values in m2 from the eight unsigned 8-bit values in m1 using saturating arithmetic.

_mm_subs_pu16

__m64 _mm_subs_pu16(__m64 m1, __m64 m2);

Subtract the four unsigned 16-bit values in m2 from the four unsigned 16-bit values in m1 using saturating arithmetic.

_mm_madd_pi16

__m64 _mm_madd_pi16(__m64 m1, __m64 m2);

Multiply four 16-bit values in m1 by four 16-bit values in m2 producing four 32-bit intermediate results, which are then summed by pairs to produce two 32-bit results.

_mm_mulhi_pi16

__m64 _mm_mulhi_pi16(__m64 m1, __m64 m2);

Multiply four signed 16-bit values in m1 by four signed 16-bit values in m2 and produce the high 16 bits of the four results.

_mm_mullo_pi16

__m64 _mm_mullo_pi16(__m64 m1, __m64 m2);

Multiply four 16-bit values in m1 by four 16-bit values in m2 and produce the low 16 bits of the four results.