Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Load Intrinsics

The prototypes for Intel® Streaming SIMD Extensions (Intel® SSE) intrinsics for load operations are in the xmmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

The results of each intrinsic operation are placed in a register. This register is illustrated for each intrinsic with R0-R3. R0, R1, R2, and R3 each represent one of the four 32-bit pieces of the result register.

Intrinsic Name

Operation

Corresponding
Intel® SSE Instruction

_mm_loadh_pi

Load high

MOVHPS reg, mem

_mm_loadl_pi

Load low

MOVLPS reg, mem

_mm_load_ss

Load the low value and clear the three high values

MOVSS

_mm_load1_ps

Load one value into all four words

MOVSS + Shuffling

_mm_load_ps

Load four values, address aligned

MOVAPS

_mm_loadu_ps

Load four values, address unaligned

MOVUPS

_mm_loadr_ps

Load four values in reverse

MOVAPS + Shuffling

_mm_loadh_pi

__m128 _mm_loadh_pi(__m128 a, __m64 const *p);

Sets the upper two SP FP values with 64 bits of data loaded from the address p; the lower two values are passed through from a.

R0

R1

R2

R3

a0

a1

*p0

*p1

_mm_loadl_pi

__m128 _mm_loadl_pi(__m128 a, __m64 const *p);

Sets the lower two SP FP values with 64 bits of data loaded from the address p; the upper two values are passed through from a.

R0

R1

R2

R3

a0

a1

*p0

*p1

R0

R1

R2

R3

*p0

*p1

a2

a3

_mm_load_ss

__m128 _mm_load_ss(float * p);

Loads a SP FP value into the low word and clears the upper three words.

R0

R1

R2

R3

*p

0.0

0.0

0.0

_mm_load1_ps

__m128 _mm_load1_ps(float * p);

Loads a SP FP value, copying it into all four words.

R0

R1

R2

R3

*p

*p

*p

*p

_mm_load_ps

__m128 _mm_load_ps(float * p);

Loads four SP FP values. The address must be 16-byte-aligned.

R0

R1

R2

R3

p[0]

p[1]

p[2]

p[3]

_mm_loadu_ps

__m128 _mm_loadu_ps(float * p);

Loads four SP FP values. The address need not be 16-byte-aligned.

R0

R1

R2

R3

p[0]

p[1]

p[2]

p[3]

_mm_loadr_ps

__m128 _mm_loadr_ps(float * p);

Loads four SP FP values in reverse order. The address must be 16-byte-aligned.

R0

R1

R2

R3

p[3]

p[2]

p[1]

p[0]