Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 7/13/2023
Public
Document Table of Contents

Tuning Performance

This section describes several programming guidelines that can help you improve the performance of floating-point applications, including:

Handling Floating-point Array Operations in a Loop Body

Following the guidelines below will help auto-vectorization of the loop.

  • Statements within the loop body may contain float or double operations (typically on arrays). The following arithmetic operations are supported: addition, subtraction, multiplication, division, negation, square root, MAX, MIN, and mathematical functions such as SIN and COS.
  • Writing to a single-precision scalar/array and a double scalar/array within the same loop decreases the chance of auto-vectorization due to the differences in the vector length (that is, the number of elements in the vector register) between float and double types. If auto-vectorization fails, try to avoid using mixed data types.
NOTE:

The special __m64, __m128, and __m256 datatypes are not vectorizable. The loop body cannot contain any function calls. Use of the Intel® Streaming SIMD Extensions (Intel® SSE) and Intel® Advanced Vector Extensions (Intel® AVX) intrinsics (for example, mm_add_ps) is not allowed.

Reducing the Impact of Denormal Exceptions

Denormalized floating-point values are those that are too small to be represented in the normal manner; that is, the mantissa cannot be left-justified. Denormal values require hardware or operating system interventions to handle the computation, so floating-point computations that result in denormal values may have an adverse impact on performance.

There are several ways to handle denormals to increase the performance of your application:

  • Scale the values into the normalized range

  • Use a higher precision data type with a larger range

  • Flush denormals to zero

For example, you can translate them to normalized numbers by multiplying them using a large scalar number, doing the remaining computations in the normal space, then scaling back down to the denormal range. Consider using this method when the small denormal values benefit the program design.

Consider using a higher precision data type with a larger range; for example, by converting variables declared as float to be declared as double. Understand that making the change can potentially slow down your program. Storage requirements will increase, which will increase the amount of time for loading and storing data from memory. Higher precision data types can also decrease the potential throughput of Intel® Streaming SIMD Extensions (Intel® SSE) and Intel® Advanced Vector Extensions (Intel® AVX) operations.

If you change the type declaration of a variable, you might also need to change associated library calls, unless these are generic; ; for example, cos() instead of cosf().. Another strategy that might result in increased performance is to increase the amount of precision of intermediate values using the -fp-model [double|extended] option. However, this strategy might not eliminate all denormal exceptions, so you must experiment with the performance of your application. You should verify that the gain in performance from eliminating denormals is greater than the overhead of using a data type with higher precision and greater dynamic range.

In many cases, denormal numbers can be treated safely as zero without adverse effects on program results. Depending on the target architecture, use flush-to-zero (FTZ) options.

IA-32 and Intel® 64 Architectures

IA-32 and Intel® 64 architectures take advantage of the FTZ (flush-to-zero) and DAZ (denormals-are-zero) capabilities of Intel® Streaming SIMD Extensions (Intel® SSE) instructions.

By default, the Intel® C++ Compiler inserts code into the main routine to enable FTZ and DAZ at optimization levels higher than O0. To enable FTZ and DAZ at O0, compile the source file containing main() PROGRAM using compiler option [Q]ftz. When the [Q]ftz option is used on IA-32-based systems with the option –mia32 (Linux*) or /arch:IA32 (Windows*), the compiler inserts code to conditionally enable FTZ and DAZ flags based on a run-time processor check. IA-32 is not available on macOS*.

NOTE:

After using flush-to-zero, ensure that your program still gives correct results when treating denormal values as zero.

Avoiding Mixed Data Type Arithmetic Expressions

Avoid mixing integer and floating-point (float, double, or long double) data in the same computation. Expressing all numbers in a floating-point arithmetic expression (assignment statement) as floating-point values eliminates the need to convert data between fixed and floating-point formats. Expressing all numbers in an integer arithmetic expression as integer values also achieves this. This improves run-time performance.

For example, assuming that I and J are both int variables, expressing a constant number (2.0) as an integer value (2) eliminates the need to convert the data. The following examples demonstrate inefficient and efficient code.

Inefficient code:

int I, J;
  I = J / 2.0
;

Efficient code:

int I, J;
  I = J / 2;

Using Efficient Data Types

In cases where more than one data type can be used for a variable, consider selecting the data types based on the following hierarchy, listed from most to least efficient:

  • char

  • short

  • int

  • long

  • long long

  • float

  • double

  • long double

NOTE:

In an arithmetic expression, you should avoid mixing integer and floating-point data.

You can use integer data types (int, int long, etc.) in loops to improve floating point performance. Convert the data type to integer data types, process the data, then convert the data to the old type.

See Also