C++ Classes and SIMD Operations

Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

Download PDF

ID 767253

Date 3/22/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

C++ Classes and SIMD Operations

Use of C++ classes for SIMD operations allows for operating on arrays or vectors of data in a single operation. Consider the addition of two vectors, A and B, where each vector contains four elements. Using an integer vector class, the elements A[i] and B[i] from each array are summed in the typical method of adding elements using a loop example snippet below.


int a[4], b[4], c[4]; 
for (i=0; i<4; i++) /* needs four iterations */ 
c[i] = a[i] + b[i]; /* computes c[0], c[1], c[2], c[3] */

The following example shows the same results using one operation with an integer class, showing the SIMD method of adding elements using Ivec classes.


Is16vec4 ivecA, ivecB, ivec C; /*needs one iteration*/ 
ivecC = ivecA + ivecB; /*computes ivecC0, ivecC1, ivecC2, ivecC3 */

Available Classes

The C++ SIMD classes provide parallelism, which is not easily implemented using typical mechanisms of C++. The following table shows how the C++ classes use the SIMD classes and libraries.

SIMD Vector Classes

Instruction Set	Class	Signedness	Data Type	Size	Elements	Header File
MMX™ Technology	I64vec1	unspecified	__m64	64	1	ivec.h
	I32vec2	unspecified	int	32	2	ivec.h
	Is32vec2	signed	int	32	2	ivec.h
	Iu32vec2	unsigned	int	32	2	ivec.h
	I16vec4	unspecified	short	16	4	ivec.h
	Is16vec4	signed	short	16	4	ivec.h
	Iu16vec4	unsigned	short	16	4	ivec.h
	I8vec8	unspecified	char	8	8	ivec.h
	Is8vec8	signed	char	8	8	ivec.h
	Iu8vec8	unsigned	char	8	8	ivec.h
Intel® Streaming SIMD Extensions (Intel® SSE)	F32vec4	unspecified	float	32	4	fvec.h
Intel® Streaming SIMD Extensions (Intel® SSE)	F32vec1	unspecified	float	32	1	fvec.h
Intel® Streaming SIMD Extensions 2 (Intel® SSE2)	F64vec2	unspecified	double	64	2	dvec.h
	I128vec1	unspecified	__m128i	128	1	dvec.h
	I64vec2	unspecified	long int	64	2	dvec.h
	I32vec4	unspecified	int	32	4	dvec.h
	Is32vec4	signed	int	32	4	dvec.h
	Iu32vec4	unsigned	int	32	4	dvec.h
	I16vec8	unspecified	int	16	8	dvec.h
	Is16vec8	signed	int	16	8	dvec.h
	Iu16vec8	unsigned	int	16	8	dvec.h
	I8vec16	unspecified	char	8	16	dvec.h
	Is8vec16	signed	char	8	16	dvec.h
	Iu8vec16	unsigned	char	8	16	dvec.h
Intel® Advanced Vector Extensions (Intel® AVX)	F32vec8	unspecified	float	32	8	dvec.h
Intel® Advanced Vector Extensions (Intel® AVX)	F64vec4	unspecified	double	64	4	dvec.h
Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Foundation	F32vec16	unspecified	float	32	16	dvec.h
	F64vec8	unspecified	double	64	8	dvec.h
	M512vec	unspecified	__m512i	512	1	dvec.h
	I32vec16	unspecified	int	32	16	dvec.h
	Is32vec16	signed	int	32	16	dvec.h
	Iu32vec16	unsigned	int	32	16	dvec.h
	I64vec8	unspecified	long int	64	8	dvec.h
	Is64vec8	signed	long int	64	8	dvec.h
	Iu64vec8	unsigned	long int	64	8	dvec.h
Intel® AVX-512 Byte and Word Instructions (BWI)	I16vec32	unspecified	int	16	32	dvec.h
	Is16vec32	signed	int	16	32	dvec.h
	Iu16vec32	unsigned	int	16	32	dvec.h
	I8vec64	unspecified	int	8	64	dvec.h
	Is8vec64	signed	int	8	64	dvec.h
	Iu8vec64	unsigned	int	8	64	dvec.h

Most classes contain similar functionality for all data types and are represented by all available intrinsics. However, some capabilities do not translate from one data type to another without suffering from poor performance, and are therefore excluded from individual classes.

NOTE:

Intrinsics that take immediate values and cannot be expressed easily in classes are not implemented. For example:

_mm_shuffle_ps
_mm_shuffle_pi16
_mm_shuffle_ps
_mm_extract_pi16
_mm_insert_pi16

Access to Classes Using Header Files

The required class header files are installed in the include directory with the Intel® oneAPI DPC++/C++ Compiler. To enable the classes, use the #include directive in your program file as shown in the table that follows.

Include Directives for Enabling Classes

Instruction Set Extension	Include Directive
MMX™ Technology	`#include <ivec.h>`
Intel® SSE	`#include <fvec.h>`
Intel® SSE2	`#include <dvec.h>`
Intel® Streaming SIMD Extensions 3 (Intel® SSE3)	`#include <dvec.h>`
Intel® Streaming SIMD Extensions 4 (Intel® SSE4)	`#include <dvec.h>`
Intel® AVX	`#include <dvec.h>`

Each succeeding file from the top down includes the preceding class. You only need to include fvec.h if you want to use both the Ivec and Fvec classes. Similarly, to use all the classes including those for Intel® SSE2, you only need to include the dvec.h file.

Usage Precautions

When using the C++ classes, you should follow some general guidelines. More detailed usage rules for each class are listed in Integer Vector Classes, and Floating-point Vector Classes.

Clear MMX™ Technology Registers

If you use both the Ivec and Fvec classes at the same time, your program could mix MMX™ Technology instructions, called by Ivec classes, with Intel® architecture floating-point instructions, called by Fvec classes. x87 floating-point instructions exist in the following Fvec functions:

fvec constructors
debug functions (cout and element access)
rsqrt_nr

NOTE:

MMX™ Technology registers are aliased on the floating-point registers, so you should clear the MMX™ Technology state with the EMMS instruction intrinsic before issuing an x87 floating-point instruction.

Example	Usage
`ivecA = ivecA & ivecB;`	An `Ivec` logical operation that uses MMX™ Technology instructions.
`empty ();`	Creates a clear state.
`cout << f32vec4a;`	A `F32vec4` operation that uses x87 floating-point instructions.

CAUTION:

Failure to clear the MMX™ Technology registers can result in incorrect execution or poor performance due to an incorrect register state.

Parent topic: Intel® C++ Class Libraries

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

C++ Classes and SIMD Operations

Available Classes

Access to Classes Using Header Files

Usage Precautions