Arithmetic Intrinsics

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 3/31/2023

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-FA1F2EF0-414D-474B-8417-E42AC8B8331A

View Details

Arithmetic Intrinsics

The prototypes for Intel® Streaming SIMD Extensions (Intel® SSE) intrinsics for arithmetic operations are in the xmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

The results of each intrinsic operation are placed in a register. This register is illustrated for each intrinsic with R0-R3. R0, R1, R2, and R3 each represent one of the four 32-bit pieces of the result register.

Intrinsic Name	Operation	Corresponding Intel® SSE Instruction
_mm_add_ss	Addition	ADDSS
_mm_add_ps	Addition	ADDPS
_mm_sub_ss	Subtraction	SUBSS
_mm_sub_ps	Subtraction	SUBPS
_mm_mul_ss	Multiplication	MULSS
_mm_mul_ps	Multiplication	MULPS
_mm_div_ss	Division	DIVSS
_mm_div_ps	Division	DIVPS
_mm_sqrt_ss	Squared Root	SQRTSS
_mm_sqrt_ps	Squared Root	SQRTPS
_mm_rcp_ss	Reciprocal	RCPSS
_mm_rcp_ps	Reciprocal	RCPPS
_mm_rsqrt_ss	Reciprocal Squared Root	RSQRTSS
_mm_rsqrt_ps	Reciprocal Squared Root	RSQRTPS
_mm_min_ss	Computes Minimum	MINSS
_mm_min_ps	Computes Minimum	MINPS
_mm_max_ss	Computes Maximum	MAXSS
_mm_max_ps	Computes Maximum	MAXPS

_mm_add_ss

__m128 _mm_add_ss(__m128 a, __m128 b);

Adds the lower single-precision, floating-point (FP) values of a and b; the upper three single-precision FP values are passed through from a.

R0	R1	R2	R3
a0 + b0	a1	a2	a3

_mm_add_ps

__m128 _mm_add_ps(__m128 a, __m128 b);

Adds the four single-precision FP values of a and b.

R0	R1	R2	R3
a0 +b0	a1 + b1	a2 + b2	a3 + b3

_mm_sub_ss

__m128 _mm_sub_ss(__m128 a, __m128 b);

Subtracts the lower single-precision FP values of a and b. The upper three single-precision FP values are passed through from a.

R0	R1	R2	R3
a0 - b0	a1	a2	a3

_mm_sub_ps

__m128 _mm_sub_ps(__m128 a, __m128 b);

Subtracts the four single-precision FP values of a and b.

R0	R1	R2	R3
a0 - b0	a1 - b1	a2 - b2	a3 - b3

_mm_mul_ss

__m128 _mm_mul_ss(__m128 a, __m128 b);

Multiplies the lower single-precision FP values of a and b; the upper three single-precision FP values are passed through from a.

R0	R1	R2	R3
a0 * b0	a1	a2	a3

_mm_mul_ps

__m128 _mm_mul_ps(__m128 a, __m128 b);

Multiplies the four single-precision FP values of a and b.

R0	R1	R2	R3
a0 * b0	a1 * b1	a2 * b2	a3 * b3

_mm_div_ss

__m128 _mm_div_ss(__m128 a, __m128 b);

Divides the lower single-precision FP values of a and b; the upper three single-precision FP values are passed through from a.

R0	R1	R2	R3
a0 / b0	a1	a2	a3

_mm_div_ps

__m128 _mm_div_ps(__m128 a, __m128 b);

Divides the four single-precision FP values of a and b.

R0	R1	R2	R3
a0 / b0	a1 / b1	a2 / b2	a3 / b3

_mm_sqrt_ss

__m128 _mm_sqrt_ss(__m128 a);

Computes the square root of the lower single-precision FP value of a ; the upper three single-precision FP values are passed through.

R0	R1	R2	R3
sqrt(a0)	a1	a2	a3

_mm_sqrt_ps

__m128 _mm_sqrt_ps(__m128 a);

Computes the square roots of the four single-precision FP values of a.

R0	R1	R2	R3
sqrt(a0)	sqrt(a1)	sqrt(a2)	sqrt(a3)

_mm_rcp_ss

__m128 _mm_rcp_ss(__m128 a);

Computes the approximation of the reciprocal of the lower single-precision FP value of a; the upper the single-precision FP values are passed through.

R0	R1	R2	R3
recip(a0)	a1	a2	a3

_mm_rcp_ps

__m128 _mm_rcp_ps(__m128 a);

Computes the approximations of reciprocals of the four single-precision FP values of a.

R0	R1	R2	R3
recip(a0)	recip(a1)	recip(a2)	recip(a3)

_mm_rsqrt_ss

__m128 _mm_rsqrt_ss(__m128 a);

Computes the approximation of the reciprocal of the square root of the lower single-precision FP value of a; the upper three single-precision FP values are passed through.

R0	R1	R2	R3
recip(sqrt(a0))	a1	a2	a3

_mm_rsqrt_ps

__m128 _mm_rsqrt_ps(__m128 a);

Computes the approximations of the reciprocals of the square roots of the four single-precision FP values of a.

R0	R1	R2	R3
recip(sqrt(a0))	recip(sqrt(a1))	recip(sqrt(a2))	recip(sqrt(a3))

_mm_min_ss

__m128 _mm_min_ss(__m128 a, __m128 b);

Computes the minimum of the lower single-precision FP values of a and b; the upper three single-precision FP values are passed through from a.

R0	R1	R2	R3
min(a0, b0)	a1	a2	a3

_mm_min_ps

__m128 _mm_min_ps(__m128 a, __m128 b);

Computes the minimum of the four single-precision FP values of a and b.

R0	R1	R2	R3
min(a0, b0)	min(a1, b1)	min(a2, b2)	min(a3, b3)

_mm_max_ss

__m128 _mm_max_ss(__m128 a, __m128 b);

Computes the maximum of the lower single-precision FP values of a and b; the upper three single-precision FP values are passed through from a.

R0	R1	R2	R3
max(a0, b0)	a1	a2	a3

_mm_max_ps

__m128 _mm_max_ps(__m128 a, __m128 b);

Computes the maximum of the four single-precision FP values of a and b.

R0	R1	R2	R3
max(a0, b0)	max(a1, b1)	max(a2, b2)	max(a3, b3)

Parent topic: Intrinsics for Intel® Streaming SIMD Extensions (Intel® SSE)

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Arithmetic Intrinsics

_mm_add_ss

_mm_add_ps

_mm_sub_ss

_mm_sub_ps

_mm_mul_ss

_mm_mul_ps

_mm_div_ss

_mm_div_ps

_mm_sqrt_ss

_mm_sqrt_ps

_mm_rcp_ss

_mm_rcp_ps

_mm_rsqrt_ss

_mm_rsqrt_ps

_mm_min_ss

_mm_min_ps

_mm_max_ss

_mm_max_ps