simd

Intel® C++ Compiler Classic Developer Guide and Reference

Download PDF

ID 767249

Date 12/16/2022

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: GUID-1EA04294-988E-4152-B584-B028FD6FAC48

View Details

simd

Enforces vectorization of loops.

Syntax

#pragma simd [clause[ [,] clause]...]

Arguments

clause

Can be any of the following:

vectorlength (`n1`[, `n2`]...)	Where `n` is a vector length (VL). It must be an integer that is a power of 2; the value must be 2, 4, 8, or 16. If you specify more than one `n`, the vectorizor will choose the VL from the values specified. Causes each iteration in the vector loop to execute the computation equivalent to `n` iterations of scalar loop execution. Multiple `vectorlength` clauses are merged as a union.
vectorlengthfor (`data type`)	Where `data type` must be one of built-in integer types (8-, 16-, 32-, or 64-bit), pointer types (treated as pointer-sized integer), floating point types (32- or 64-bit), or complex types (64- or 128-bit). Otherwise, behavior is undefined. Causes each iteration in the vector loop to execute the computation equivalent to `n` iterations of scalar loop execution where `n` is computed from `size_of_vector_register`/`sizeof(data type)`. For example, `vectorlengthfor(float)` results in `n=4` for Intel® Streaming SIMD Extensions (Intel® SSE2) to Intel SSE4.2 targets (packed float operations available on 128bit XMM registers) and `n=8` for an Intel® Advanced Vector Extensions (Intel® AVX) target (packed float operations available on 256bit YMM registers). `vectorlengthfor(int)` results in `n=4` for Intel SSE2 to Intel AVX targets. `vectorlength()` and `vectorlengthfor()` clauses are mutually exclusive. In other words, the `vectorlengthfor()` clause may not be used with the `vectorlength()` clause, and vice versa. Behavior for multiple `vectorlengthfor` clauses is undefined.
private (`var1`[, `var2`]...)	Where `var` is a scalar variable. Causes each variable to be private to each iteration of a loop. Unless the variable appears in `firstprivate` clause, the initial value of the variable for the particular iteration is undefined. Unless the variable appears in `lastprivate` clause, the value of the variable upon exit of the loop is undefined. Multiple `private` clauses are merged as a union. NOTE: Execution of the SIMD loop with `firtsprivate`/`lastprivate` clauses may be different from serial execution of the same code even if the loop fails to vectorize. A variable in a `private` clause cannot appear in a `linear`, `reduction`, `firstprivate`, or `lastprivate` clause.
firstprivate (`var1`[, `var2`]...)	Provides a superset of the functionality provided by the `private` clause. Variables that appear in a `firstprivate` list are subject to `private` clause semantics. In addition, its initial value is broadcast to all private instances for each iteration upon entering the SIMD loop. A variable in a `firstprivate` clause can appear in a `lastprivate` clause. A variable in a `firstprivate` clause cannot appear in a `linear`, `reduction`, or `private` clause.
lastprivate (`var1`[, `var2`]...)	Provides a superset of the functionality provided by the `private` clause. Variables that appear in a `lastprivate` list are subject to `private` clause semantics. In addition, when the SIMD loop is exited, each variable has the value that resulted from the sequentially last iteration of the SIMD loop (which may be undefined if the last iteration does not assign to the variable). A variable in a `lastprivate` clause can appear in a `firstprivate` clause. A variable in a `lastprivate` clause cannot appear in a `linear`, `reduction`, or `private` clause.
linear (`var1:step1` [`,var2:step2`]...)	Where `var` is a scalar variable and `step` is a compile-time positive, integer constant expression. For each iteration of a scalar loop, `var1` is incremented by `step1`, `var2` is incremented by `step2`, and so on. Therefore, every iteration of the vector loop increments the variables by VLstep1, VLstep2, …, to VL*stepN, respectively. If more than one step is specified for a `var`, a compile-time error occurs. Multiple linear clauses are merged as a union. A variable in a `linear` clause cannot appear in a `reduction`, `private`, `firstprivate`, or `lastprivate` clause.
reduction (`oper:var1` [,`var2`]…)	Where `oper` is a reduction operator and `var` is a scalar variable. Applies the vector reduction indicated by `oper` to `var1`, `var2`, …, `varN`. The simd pragma may have multiple reduction clauses with the same or different operators. If more than one reduction operator is associated with a `var`, a compile-time error occurs. A variable in a `reduction` clause cannot appear in a `linear`, `private`, `firstprivate`, or `lastprivate` clause.
[no]assert	Directs the compiler to assert or not to assert when the vectorization fails. The default is `noassert`. If this clause is specified more than once, a compile-time error occurs.
[no]vecremainder	Instructs the compiler to vectorize or not to vectorize the remainder loop when the original loop is vectorized. See the description of the vector pragma for more information.

Description

The simd pragma is used to guide the compiler to vectorize more loops. Vectorization using the simd pragma complements (but does not replace) the fully automatic approach.

Without explicit vectorlength() and vectorlengthfor() clauses, the compiler will choose a vectorlength using its own cost model. Misclassification of variables into private, firstprivate, lastprivate, linear, and reduction, or lack of appropriate classification of variables may cause unintended consequences such as runtime failures and/or incorrect result.

You can only specify a particular variable in at most one instance of a private, linear, or reduction clause.

If the compiler is unable to vectorize a loop, a warning will be emitted (use the assert clause to make it an error).

If the vectorizer has to stop vectorizing a loop for some reason, the fast floating-point model is used for the SIMD loop.

The vectorization performed on this loop by the simd pragma overrides any setting you may specify for options -fp-model (Linux* and macOS) and /fp (Windows*) for this loop.

Note that the simd pragma may not affect all auto-vectorizable loops. Some of these loops do not have a way to describe the SIMD vector semantics.

The following restrictions apply to the simd pragma:

The countable loop for the simd pragma has to conform to the for-loop style of an OpenMP worksharing loop construct. Additionally, the loop control variable must be a signed integer type.
The vector values must be signed 8-, 16-, 32-, or 64-bit integers, single or double-precision floating point numbers, or single or double-precision complex numbers.
A SIMD loop may contain another loop (for, while, do-while) in it. Goto out of such inner loops are not supported. Break and continue are supported.

NOTE:
Inlining can create such an inner loop, which may not be obvious at the source level.
A SIMD loop performs memory references unconditionally. Therefore, all address computations must result in valid memory addresses, even though such locations may not be accessed if the loop is executed sequentially.

To disable transformations that enables more vectorization, specify the -vec -no-simd (Linux* and macOS) or /Qvec /Qno-simd (Windows*) options.

User-mandated vectorization, also called SIMD vectorization can assert or not assert an error if a #pragma simd annotated loop fails to vectorize. By default, the simd pragma is set to noassert, and the compiler will issue a warning if the loop fails to vectorize. To direct the compiler to assert an error when the #pragma simd annotated loop fails to vectorize, add the assert clause to the simd pragma. If a simd pragma annotated loop is not vectorized by the compiler, the loop holds its serial semantics.

Examples

This example shows how to use the simd pragma:


 void add_floats(float *a, float *b, float *c, float *d, float *e, int n){
  int i; 
#pragma simd
  for (i=0; i<n; i++){
    a[i] = a[i] + b[i] + c[i] + d[i] + e[i];
  } 
}

In the example, the function add_floats() uses too many unknown pointers for the compiler's automatic runtime independence check optimization to kick-in. The programmer can enforce the vectorization of this loop by using the simd pragma to avoid the overhead of runtime check.

Parent topic: Intel-Specific Pragma Reference

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Intel® C++ Compiler Classic Developer Guide and Reference

simd

Examples

See Also