Intel® C++ Compiler Classic Developer Guide and Reference

ID 767249
Date 3/31/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

IEEE Floating-point Operations

Understanding the IEEE Standard for Floating-point Arithmetic, IEEE 754-2008

This version of the compiler uses a close approximation to the IEEE Standard for Floating-point Arithmetic, version IEEE 754-2008, unless otherwise stated. This standard is common to many microcomputer-based systems due to the availability of fast processors that implement the required characteristics.

This section outlines the characteristics of the IEEE 754-2008 standard and its implementation in the compiler. Except as noted, the description refers to both the IEEE 754-2008 standard and the compiler implementation.

Floating-point Formats

This IEEE 754-2008 standard specifies formats and methods for Floating-point representation in computer systems, and recommends formats for data interchange. The exception conditions are defined, and the standard handling of these conditions is specified below. The binary counterpart Floating-point exception functions are described in ISO C99. The decimal Floating-point exception functions are defined in the fenv.h header file. The compiler supports decimal floating point types in C and C++. The decimal floating point formats are defined in the IEEE 754-2008 standard.

In C, these decimal floating types are supported:

  • _Decimal32
  • _Decimal64
  • _Decimal128

In C++ for Windows and Linux, these decimal classes are supported:

  • decimal32
  • decimal64
  • decimal128
NOTE:
To use this feature in C++ on Linux, GCC 4.5 or later is required.

The decimal Floating-point is not supported in C++ for macOS.

To ensure correct decimal Floating-point behavior, you must define __STDC_WANT_DEC_FP__ before any standard headers are included. This is required for the declaration of decimal macros and library functions in order to ensure correct decimal Floating-point results at run-time.

Example: Linux

#include <iostream>
#define __STDC_WANT_DEC_FP__
#include <decimal/decimal>
typedef std::decimal::decimal32 _Decimal32;
typedef std::decimal::decimal64 _Decimal64;
typedef std::decimal::decimal128 _Decimal128;
#include <dfp754.h>

using namespace std;
using namespace std::decimal;

int main() {
    std::decimal::decimal32 d = 4.7df;
    std::cout << decimal_to_long_double(d) << std::endl; 
    return 0; 
}

Example: Windows

#include <iostream>
#define __STDC_WANT_DEC_FP__
#include <decimal>
#include <dfp754.h>

using namespace std;
using namespace std::decimal;

int main() {
    std::decimal::decimal32 d = 4.7df;
    std::cout << decimal_to_long_double(d) << std::endl; 
    return 0; 
}

Functions to Check Decimal Floating-point Status

Use these Floating-point exception functions to detect exceptions that occur during decimal Floating-point arithmetic:

Floating-point Functions

Function

Brief Description

fe_dec_feclearexcept()

Clears the supported Floating-point exceptions.

fe_dec_fegetexceptflag

Stores an implementation-defined representation of the states of the Floating-point status flags.

fe_dec_feraiseexcept

Raises the supported Floating-point exceptions.

fe_dec_fesetexceptflag

Sets the Floating-point status flags.

fe_dec_fetestexcept()

Determines which of a specified subset of the floating point exception flags are currently set.

Special Values

The following list provides a brief description of the special values that the Intel® C++ Compiler supports.

  • Signed Zero: The sign of zero is the same as the sign of a nonzero number. Comparisons consider +0 to be equal to -0. A signed zero is useful in certain numerical analysis algorithms, but in most applications the sign of zero is invisible.
  • Denormalized Numbers: Denormalized numbers (denormals) fill the gap between the smallest positive and the smallest negative normalized number, otherwise only (+/-) 0 occurs in the interval. Denormalized numbers extend the range of computable results by allowing for gradual underflow.

    Systems based on the IA-32 architecture support a Denormal Operand status flag. When this is set, at least one of the input operands to a Floating-point operation is a denormal. The Underflow status flag is set when a number loses precision and becomes a denormal.

  • Signed Infinity: Infinities are the result of arithmetic in the limiting case of operands with arbitrarily large magnitude. They provide a way to continue when an overflow occurs. The sign of an infinity is simply the sign you obtain for a finite number in the same operation as the finite number approaches an infinite value.

    By retrieving the status flags, you can differentiate between an infinity that results from an overflow and one that results from division by zero. The compiler treats infinity as signed by default. The output value of infinity is +Infinity or -Infinity.

  • Not a Number: Not a Number (NaN) may result from an invalid operation. For example, 0/0 and SQRT(-1) result in NaN. In general, an operation involving a NaN produces another NaN. Because the fraction of a NaN is unspecified, there are many possible NaNs

    The compiler treats all NaNs identically, but there are two classes of NaNs:

    • Signaling NaNs: Have an initial mantissa bit of 0. They usually raise an invalid exception when used in an operation.

    • Quiet NaNs: Have an initial mantissa bit of 1.

    The floating-point hardware usually converts a signaling NaN into a quiet NaN during computational operations. An invalid exception is raised and the resulting Floating-point value is a quiet NaN.