Visible to Intel only — GUID: GUID-2A1CEEE6-9E7D-4FE0-A3A8-E66371F1CFE9
Visible to Intel only — GUID: GUID-2A1CEEE6-9E7D-4FE0-A3A8-E66371F1CFE9
C++ Classes and SIMD Operations
Use of C++ classes for SIMD operations allows for operating on arrays or vectors of data in a single operation. Consider the addition of two vectors, A and B, where each vector contains four elements. Using an integer vector class, the elements A[i] and B[i] from each array are summed in the typical method of adding elements using a loop example snippet below.
int a[4], b[4], c[4];
for (i=0; i<4; i++) /* needs four iterations */
c[i] = a[i] + b[i]; /* computes c[0], c[1], c[2], c[3] */
The following example shows the same results using one operation with an integer class, showing the SIMD method of adding elements using Ivec classes.
Is16vec4 ivecA, ivecB, ivec C; /*needs one iteration*/
ivecC = ivecA + ivecB; /*computes ivecC0, ivecC1, ivecC2, ivecC3 */
Available Classes
The C++ SIMD classes provide parallelism, which is not easily implemented using typical mechanisms of C++. The following table shows how the C++ classes use the SIMD classes and libraries.
SIMD Vector Classes
Instruction Set |
Class |
Signedness |
Data Type |
Size |
Elements |
Header File |
---|---|---|---|---|---|---|
MMX™ Technology |
I64vec1 |
unspecified |
__m64 |
64 |
1 |
ivec.h |
I32vec2 |
unspecified |
int |
32 |
2 |
ivec.h |
|
Is32vec2 |
signed |
int |
32 |
2 |
ivec.h |
|
Iu32vec2 |
unsigned |
int |
32 |
2 |
ivec.h |
|
I16vec4 |
unspecified |
short |
16 |
4 |
ivec.h |
|
Is16vec4 |
signed |
short |
16 |
4 |
ivec.h |
|
Iu16vec4 |
unsigned |
short |
16 |
4 |
ivec.h |
|
I8vec8 |
unspecified |
char |
8 |
8 |
ivec.h |
|
Is8vec8 |
signed |
char |
8 |
8 |
ivec.h |
|
Iu8vec8 |
unsigned |
char |
8 |
8 |
ivec.h |
|
Intel® Streaming SIMD Extensions (Intel® SSE) |
F32vec4 |
unspecified |
float |
32 |
4 |
fvec.h |
F32vec1 |
unspecified |
float |
32 |
1 |
fvec.h |
|
Intel® Streaming SIMD Extensions 2 (Intel® SSE2) |
F64vec2 |
unspecified |
double |
64 |
2 |
dvec.h |
I128vec1 |
unspecified |
__m128i |
128 |
1 |
dvec.h |
|
I64vec2 |
unspecified |
long int |
64 |
2 |
dvec.h |
|
I32vec4 |
unspecified |
int |
32 |
4 |
dvec.h |
|
Is32vec4 |
signed |
int |
32 |
4 |
dvec.h |
|
Iu32vec4 |
unsigned |
int |
32 |
4 |
dvec.h |
|
I16vec8 |
unspecified |
int |
16 |
8 |
dvec.h |
|
Is16vec8 |
signed |
int |
16 |
8 |
dvec.h |
|
Iu16vec8 |
unsigned |
int |
16 |
8 |
dvec.h |
|
I8vec16 |
unspecified |
char |
8 |
16 |
dvec.h |
|
Is8vec16 |
signed |
char |
8 |
16 |
dvec.h |
|
Iu8vec16 |
unsigned |
char |
8 |
16 |
dvec.h |
|
Intel® Advanced Vector Extensions (Intel® AVX) |
F32vec8 |
unspecified |
float |
32 |
8 |
dvec.h |
F64vec4 |
unspecified |
double |
64 |
4 |
dvec.h |
|
Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Foundation |
F32vec16 |
unspecified |
float |
32 |
16 |
dvec.h |
F64vec8 |
unspecified |
double |
64 |
8 |
dvec.h |
|
M512vec |
unspecified |
__m512i |
512 |
1 |
dvec.h |
|
I32vec16 |
unspecified |
int |
32 |
16 |
dvec.h |
|
Is32vec16 |
signed |
int |
32 |
16 |
dvec.h |
|
Iu32vec16 |
unsigned |
int |
32 |
16 |
dvec.h |
|
I64vec8 |
unspecified |
long int |
64 |
8 |
dvec.h |
|
Is64vec8 |
signed |
long int |
64 |
8 |
dvec.h |
|
Iu64vec8 |
unsigned |
long int |
64 |
8 |
dvec.h |
|
Intel® AVX-512 Byte and Word Instructions (BWI) |
I16vec32 |
unspecified |
int |
16 |
32 |
dvec.h |
Is16vec32 |
signed |
int |
16 |
32 |
dvec.h |
|
Iu16vec32 |
unsigned |
int |
16 |
32 |
dvec.h |
|
I8vec64 |
unspecified |
int |
8 |
64 |
dvec.h |
|
Is8vec64 |
signed |
int |
8 |
64 |
dvec.h |
|
Iu8vec64 |
unsigned |
int |
8 |
64 |
dvec.h |
Most classes contain similar functionality for all data types and are represented by all available intrinsics. However, some capabilities do not translate from one data type to another without suffering from poor performance, and are therefore excluded from individual classes.
- _mm_shuffle_ps
- _mm_shuffle_pi16
- _mm_shuffle_ps
- _mm_extract_pi16
- _mm_insert_pi16
Access to Classes Using Header Files
The required class header files are installed in the include directory with the Intel® oneAPI DPC++/C++ Compiler. To enable the classes, use the #include directive in your program file as shown in the table that follows.
Include Directives for Enabling Classes
Instruction Set Extension |
Include Directive |
---|---|
MMX™ Technology |
#include <ivec.h> |
Intel® SSE |
#include <fvec.h> |
Intel® SSE2 |
#include <dvec.h> |
Intel® Streaming SIMD Extensions 3 (Intel® SSE3) |
#include <dvec.h> |
Intel® Streaming SIMD Extensions 4 (Intel® SSE4) |
#include <dvec.h> |
Intel® AVX |
#include <dvec.h> |
Each succeeding file from the top down includes the preceding class. You only need to include fvec.h if you want to use both the Ivec and Fvec classes. Similarly, to use all the classes including those for Intel® SSE2, you only need to include the dvec.h file.
Usage Precautions
When using the C++ classes, you should follow some general guidelines. More detailed usage rules for each class are listed in Integer Vector Classes, and Floating-point Vector Classes.
Clear MMX™ Technology Registers
If you use both the Ivec and Fvec classes at the same time, your program could mix MMX™ Technology instructions, called by Ivec classes, with Intel® architecture floating-point instructions, called by Fvec classes. x87 floating-point instructions exist in the following Fvec functions:
fvec constructors
debug functions (cout and element access)
rsqrt_nr
Example | Usage |
---|---|
ivecA = ivecA & ivecB; |
An Ivec logical operation that uses MMX™ Technology instructions. |
empty (); |
Creates a clear state. |
cout << f32vec4a; |
A F32vec4 operation that uses x87 floating-point instructions. |