Intel® oneAPI DPC++/C++ Compiler Developer Guide and Reference

ID 767253
Date 3/22/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

Capabilities of C++ SIMD Classes

The fundamental capabilities of each C++ SIMD class include:

  • Computation

  • Horizontal data support

  • Branch compression/elimination

  • Caching hints

Understanding each of these capabilities and how they interact is crucial to achieving desired results.

Computation

The SIMD C++ classes contain vertical operator support for most arithmetic operations, including shifting and saturation.

Computation operations include: +, -, *, /, reciprocal ( rcp and rcp_nr ), square root (sqrt), and reciprocal square root ( rsqrt and rsqrt_nr ).

Operations rcp and rsqrt are approximating instructions with very short latencies that produce results with at least 12 bits of accuracy. You may get a different answer if used on non-Intel processors. Operations rcp_nr and rsqrt_nr use software refining techniques to enhance the accuracy of the approximations, with a minimal impact on performance. (The nr stands for Newton-Raphson, a mathematical technique for improving performance using an approximate result.)

Horizontal Data Support

The C++ SIMD classes provide horizontal support for some arithmetic operations. The term horizontal indicates computation across the elements of one vector, as opposed to the vertical, element-by-element operations on two different vectors.

The add_horizontal, unpack_low and pack_sat functions are examples of horizontal data support. This support enables certain algorithms that cannot exploit the full potential of SIMD instructions.

Shuffle intrinsics are another example of horizontal data flow. Shuffle intrinsics are not expressed in the C++ classes due to their immediate arguments. However, the C++ class implementation enables you to mix shuffle intrinsics with the other C++ functions. For example:

F32vec4 fveca, fvecb, fvecd; 
fveca += fvecb; 
fvecd = _mm_shuffle_ps(fveca,fvecb,0);

Branch Compression and Elimination

Branching in SIMD architectures can be complicated and expensive. The SIMD C++ classes provide functions to eliminate branches, using logical operations, max and min functions, conditional selects, and compares. Consider the following example:

short a[4], b[4], c[4];
for (i=0; i<4; i++) 
c[i] = a[i] > b[i] ? a[i] : b[i];

This operation is independent of the value of i. For each i, the result could be either A or B depending on the actual values. A simple way of removing the branch altogether is to use the select_gt function, as follows:

Is16vec4 a, b, c 
c = select_gt(a, b, a, b)

Caching Hints

Intel® Streaming SIMD Extensions provide prefetching and streaming hints. Prefetching data can minimize the effects of memory latency. Streaming hints allow you to indicate that certain data should not be cached.