Load Intrinsics

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 7/13/2023

Version

Public

Load Intrinsics

Intel® Streaming SIMD Extensions 2 (Intel® SSE2) intrinsics for floating-point load operations are listed in this topic. The prototypes for Intel® SSE2 intrinsics are in the emmintrin.h header file.

To use these intrinsics, include the immintrin.h file as follows:

#include <immintrin.h>

The load and set operations are similar in that both initialize __m128d data. However, the set operations take a double argument and are intended for initialization with constants, while the load operations take a double pointer argument and are intended to mimic the instructions for loading data from memory.

The results of each intrinsic operation are placed in a register. The information about what is placed in each register appears in the tables below, in the detailed explanation for each intrinsic. For each intrinsic, the resulting register is represented by R0 and R1, where R0 and R1 each represent one piece of the result register.

Intrinsic Name	Operation	Corresponding Intel® SSE2 Instruction
_mm_load_pd	Loads two DP FP values	MOVAPD
_mm_load1_pd	Loads a single DP FP value, copying to both elements	MOVSD + shuffling
_mm_loadr_pd	Loads two DP FP values in reverse order	MOVAPD + shuffling
_mm_loadu_pd	Loads two DP FP values	MOVUPD
_mm_load_sd	Loads a DP FP value, sets upper DP FP to zero	MOVSD
_mm_loadh_pd	Loads a DP FP value as the upper DP FP value of the result	MOVHPD
_mm_loadl_pd	Loads a DP FP value as the lower DP FP value of the result	MOVLPD

_mm_load_pd

__m128d _mm_load_pd(double const*dp);

Loads two DP FP values. The address p must be 16-byte aligned.

R0	R1
p[0]	p[1]

_mm_load1_pd

__m128d _mm_load1_pd(double const*dp);

Loads a single DP FP value, copying to both elements. The address p need not be 16-byte aligned.

R0	R1
*p	*p

_mm_loadr_pd

__m128d _mm_loadr_pd(double const*dp);

Loads two DP FP values in reverse order. The address p must be 16-byte aligned.

R0	R1
p[1]	p[0]

_mm_loadu_pd

__m128d _mm_loadu_pd(double const*dp);

Loads two DP FP values. The address p need not be 16-byte aligned.

R0	R1
p[0]	p[1]

_mm_load_sd

__m128d _mm_load_sd(double const*dp);

Loads a DP FP value. The upper DP FP is set to zero. The address p need not be 16-byte aligned.

R0	R1
*p	0.0

_mm_loadh_pd

__m128d _mm_loadh_pd(__m128d a, double const*dp);

Loads a DP FP value as the upper DP FP value of the result. The lower DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0	R1
a0	*p

_mm_loadl_pd

__m128d _mm_loadl_pd(__m128d a, double const*dp);

Loads a DP FP value as the lower DP FP value of the result. The upper DP FP value is passed through from a. The address p need not be 16-byte aligned.

R0	R1
*p	a1

Parent topic: Floating-Point Intrinsics

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

Load Intrinsics

_mm_load_pd

_mm_load1_pd

_mm_loadr_pd

_mm_loadu_pd

_mm_load_sd

_mm_loadh_pd

_mm_loadl_pd