Visible to Intel only — GUID: bqu1690719022333
Ixiasoft
Visible to Intel only — GUID: bqu1690719022333
Ixiasoft
2.2.6. Exception Handling for Floating-point Arithmetic
The Agilex™ 5 floating-point arithmetic supports exception handling for the multiplier and adder blocks.
Floating-point Format | Exception Flags | Width | Description |
---|---|---|---|
Single precision | Multiplication | ||
fp32_mult_overflow | 1 | This signal indicates if the multiplier result is a larger value than the maximum presentable value. 1: If the multiplier result is a larger value than the maximum representable value and the result is cast to infinity. 0: If the multiplier result is not larger than the maximum presentable value. This signal is not available in Adder or Subtract Mode. |
|
fp32_mult_underflow | 1 | This signal indicates if the multiplier result is a smaller value than the minimum presentable value. 1: If the multiplier result is a smaller value than the minimum representable non-zero absolute value and the result is flushed to zero. 0: If the multiplier result is a larger than the minimum representable value. This signal is not available in Adder or Subtract Mode. |
|
fp32_mult_inexact | 1 | This signal indicates if the multiplier result is not accurately represented.
1: If the multiplier result is:
0: If the multiplier result does not meet any of the criteria above. This signal is not available in Adder or Subtract Mode. |
|
fp32_mult_invalid | 1 | This signal indicates if the multiplier operation is ill-defined and produces an invalid result. 1: If the multiplier result is invalid and cast to qNaN. 0: If the multiplier result is not an invalid number. This signal is not available in Adder or Subtract Mode. |
|
Addition | |||
fp32_adder_overflow | 1 | This signal indicates if the adder result is a larger value than the maximum representable value. 1: If the adder result is a larger value than the maximum presentable value and the result is cast to infinity. 0: If the adder result is not larger than the maximum presentable value. This signal is not available in Multiplication Mode. |
|
fp32_adder_underflow | 1 | This signal indicates if the adder result is a smaller value than the minimum presentable value. 1: If the adder result is a smaller value than the minimum representable non-zero absolute value and the result is flushed to zero. 0: If the adder result is a larger than the minimum representable value. This signal is not available in Multiplication Mode. |
|
fp32_adder_inexact | 1 | This signal indicates if the adder result is not accurately represented.
1: If the adder result is:
0: If the adder result does not meet any of the criteria above. This signal is not available in Multiplication Mode. |
|
fp32_adder_invalid | 1 | This signal indicates if the adder operation is ill-defined and produces an invalid result. 1: If the adder result is invalid and cast to qNaN. 0: If the adder result is not an invalid number. This signal is not available in Multiplication Mode. |
|
Half precision | Multiplication | ||
fp16_mult_top_overflow fp16_mult_bot_overflow |
1 | This signal indicates if the top or bottom multiplier result is a larger value than the maximum presentable value. 1: If the multiplier result is a larger value than the maximum representable value and the result is cast to infinity. 0: If the multiplier result is smaller than the maximum presentable value. This signal is not available in Adder or Subtract Mode and Extended format. |
|
fp16_mult_top_underflow fp16_mult_bot_underflow |
1 | This signal indicates if the top or bottom multiplier result is a smaller value than the minimum presentable value. 1: If the multiplier result is a smaller value than the minimum representable value and the result is flushed to zero. 0: If the multiplier result is a larger than the minimum representable value. This signal is not available in Adder or Subtract Mode and Extended format. |
|
fp16_mult_top_inexact fp16_mult_bot_inexact |
1 | This signal indicates if the top or bottom multiplier result is an exact representation.
1: If the multiplier result is:
0: If the multiplier result does not meet any of the criteria above. This signal is not available in Adder or Subtract Mode. |
|
fp16_mult_top_invalid fp16_mult_bot_invalid |
1 | This signal indicates if the multiplier operation is ill-defined and produces an invalid result. 1: If the multiplier result is invalid and cast to qNaN. 0: If the multiplier result is not an invalid number. This signal is not available in Adder or Subtract Mode. |
|
fp16_mult_top_infinite fp16_mult_bot_infinite |
1 | This signal indicates if the top or bottom multiplier result is a positive or negative infinity. 1: If the result is infinite 0: If the result is normalized float or in the appropriate infinity range This signal is only available for Extended format. |
|
fp16_mult_top_zero fp16_mult_bot_zero |
1 | This signal indicates if the top or bottom multiplier result is a positive or negative zero. 1: If the result is zero 0: If the result is not a zero This signal is only available for Extended format. |
|
Addition | |||
fp16_adder_overflow | 1 | This signal indicates if the adder result is a larger value than the maximum representable value. 1: If the adder result is a larger value than the maximum presentable value and the result is cast to infinity. 0: If the adder result is not larger than the maximum presentable value. This signal is not available in Multiplication Mode Extended format. |
|
fp16_adder_underflow | 1 | This signal indicates if the adder result is a smaller value than the minimum presentable value. 1: If the adder result is a smaller value than the minimum representable value and the result is flushed to zero. 0: If the adder result is a larger than the minimum representable value. This signal is not available in Multiplication Mode Extended format. |
|
fp16_adder_inexact | 1 | This signal indicates if the adder result is an exact representation.
1: If the adder result is:
0: If the adder result does not meet any of the criteria above. This signal is not available in Multiplication Mode. |
|
fp16_adder_invalid | 1 | This signal indicates if the adder operation is ill-defined and produces an invalid result. 1: If the adder result is invalid and cast to qNaN. 0: If the adder result is not an invalid number. This signal is not available in Multiplication Mode. |
|
fp16_adder_infinite | 1 | This signal indicates if the adder result is a positive or negative infinity. 1: If the result is infinite 0: If the result is normalized float or in the appropriate infinity range This signal is only available for Extended format. |
|
fp16_adder_zero | 1 | This signal indicates if the adder result is a positive or negative zero. 1: If the result is zero 0: If the result is not a zero This signal is only available for Extended format. |
Input A | Input B | Result | 4 Flags Overflow/Underflow/Inexact/Invalid |
---|---|---|---|
Normalized | Normalized | Normalized value | 0/0/0/0 |
Normalized (rounded) value | 0/0/1/0 | ||
Positive/negative infinity value | 1/0/1/0 | ||
Subnormal (denormal) value | 0/1/1/0 | ||
0 or Subnormal (denormal) | Normalized | 0 value | 0/0/0/0 |
Positive/negative infinity | Normalized | Positive/negative infinity value | 0/0/0/0 |
Quiet Not A Number (qNaN) | Normalized | qNaN value | 0/0/0/0 |
0 or Subnormal (denormal) | 0 or Subnormal (denormal) | 0 value | 0/0/0/0 |
Positive/negative infinity | 0 or Subnormal (denormal) | qNaN value | 0/0/0/1 |
Quiet Not A Number (qNaN) | 0 or Subnormal (denormal) | qNaN value | 0/0/0/0 |
Positive/negative infinity | Positive/negative Infinity | Positive/negative infinity value | 0/0/0/0 |
Quiet Not A Number (qNaN) | Positive/negative Infinity | qNaN value | 0/0/0/0 |
Quiet Not A Number (qNaN) | Quiet Not A Number (qNaN) | qNaN value | 0/0/0/0 |
Input A | Input B | Result : | 4 Flags Overflow/Underflow/Inexact/Invalid |
---|---|---|---|
Normalized | Normalized | Normalized value | 0/0/0/0 |
Normalized (rounded) value | 0/0/1/0 | ||
Positive/negative infinity value | 1/0/1/0 | ||
0 value Sign bit = 0 |
0/0/0/0 | ||
Subnormal (denormal) value The sign is preserved |
0/1/1/0 | ||
0 or Subnormal (denormal) | Normalized | Input b | 0/0/0/0 |
Positive/negative infinity | Normalized | Positive/negative infinity value | 0/0/0/0 |
Quiet Not A Number (qNaN) | Normalized | qNaN value | 0/0/0/0 |
0 or Subnormal (denormal) | 0 or Subnormal (denormal) | 0 value For (-0 + (-0)) equation, sign bit = 1. For any other equation, sign bit = 0. |
0/0/0/0 |
Positive/negative infinity | 0 or Subnormal (denormal) | Positive/negative infinity value | 0/0/0/0 |
Quiet Not A Number (qNaN) | 0 or Subnormal (denormal) | qNaN value | 0/0/0/0 |
Positive/negative infinity | Positive/negative infinity | qNaN value for invalid cases Positive/negative infinity value for valid cases |
0/0/0/1 for invalid cases 0/0/0/0 for valid cases
Valid cases are:
|
Quiet Not A Number (qNaN) | Positive/negative infinity | qNaN value | 0/0/0/0 |
Quiet Not A Number (qNaN) | Quiet Not A Number (qNaN) | qNaN value | 0/0/0/0 |
Input A | Input B | Result: | 4 Flags Infinite/Zero/Inexact/Invalid |
---|---|---|---|
Normalized/Subnormalized | Normalized/Subnormalized | Normalized/Subnormalized | 0/0/x/0 |
0 value | Normalized/Subnormalized | 0 value | 0/1/0/0 |
Positive/negative infinity | Normalized/Subnormalized | Positive/negative infinity value | 1/0/0/0 |
Quiet Not A Number (qNaN) | Normalized/Subnormalized | qNaN value | 0/0/0/1 Mantissa = {100...00} |
0 value | 0 value | 0 value | 0/1/0/0 |
Positive/negative infinity | 0 value | qNaN value | 0/0/0/1 Mantissa = {100...00} |
Quiet Not A Number (qNaN) | 0 value | qNaN value | 0/0/0/1 Mantissa = {100...00} |
Positive/negative infinity | Positive/negative infinity | Positive/negative infinity value | 1/0/0/0 |
Quiet Not A Number (qNaN) | Positive/negative infinity | qNaN value | 0/0/0/1 Mantissa = {100...00} |
Quiet Not A Number (qNaN) | Quiet Not A Number (qNaN) | qNaN value | 0/0/0/1 Mantissa = {100...00} |
Input A | Input B | Result: | 4 Flags Infinite/Zero/Inexact/Invalid |
---|---|---|---|
Normalized/Subnormalized | Normalized/Subnormalized | Normalized/Subnormalized | 0/0/x/0 |
0 value Sign bit = 0 |
0/0/0/0 | ||
0 value | Normalized/Subnormalized | Input b | 0/0/0/0 |
Positive/negative infinity | Normalized/Subnormalized | Positive/negative infinity value | 1/0/0/0 |
Quiet Not A Number (qNaN) | Normalized/Subnormalized | qNaN value | 0/0/0/1 Mantissa = {100...00} |
0 value | 0 value | 0 value For (-0 + (-0)) equation, sign bit = 1. For any other equation, sign bit = 0. |
0/0/0/0 |
Positive/negative infinity | 0 value | Positive/negative infinity value | 1/0/0/0 |
Quiet Not A Number (qNaN) | 0 value | qNaN value | 0/0/0/1 Mantissa = {100...00} |
Positive/negative infinity | Positive/negative infinity | qNaN value for invalid cases Positive/negative infinity value for valid cases |
0/0/0/1 for invalid cases Mantissa = {100...00} 1/0/0/0 for valid cases
Valid cases are:
|
Quiet Not A Number (qNaN) | Positive/negative infinity | qNaN value | 0/0/0/1 Mantissa = {100...00} |
Quiet Not A Number (qNaN) | Quiet Not A Number (qNaN) | qNaN value | 0/0/0/1 Mantissa = {100...00} |