DSP Builder for Intel® FPGAs (Advanced Blockset): Handbook

ID 683337
Date 12/12/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

11.3. DSP Builder Round-Off Errors

Every mathematical operation on floating-point data incurs a round-off error.

For the fundamental operations (add, subtract, multiple, divide) this error is determined by the rounding mode:

  • Correct. A typical relative error is half the magnitude of the LSB in the mantissa.
  • Faithful. A typical relative error is equal to the magnitude of the LSB in the mantissa.

The relative error for float16_m10 is approximately 0.1% for faithful rounding, and 0.05% for correct rounding. The rounding mode is a configurable mask parameter.

The elementary mathematical functions conform to the error tolerances specified in the OpenCL standard. In practice, the relative error exhibited by the DSP Builder mathematical library lies comfortably within the specified tolerances.

Bit cancellations can occur when subtracting two floating-point numbers that are very close in value, which can introduce very large relativeerrors. You need to take the same precautions with floating-point designs as with numerical software to prevent bit cancellations.