2.3.4.1. Fixed-point to 32-bit Floating-point Conversion Examples

Variable Precision DSP Blocks User Guide: Agilex™ 5 FPGAs and SoCs

Download PDF

ID 813968

Date 9/20/2024

Version

Public

A newer version of this document is available. Customers should click here to go to the newest version.

Visible to Intel only — GUID: uuc1690831999817

Ixiasoft

View Details

2.3.4.1. Fixed-point to 32-bit Floating-point Conversion Examples

The following examples show the conversion algorithm of the fixed-point to floating-point converter.

Example 1: Convert 20-bit fixed-point dot product vector of 00010011111010101010 (81578 decimal value) to 32-bit floating-point. The exponent is adjusted by the shared_exponent[7:0] values.

The most significant bit in the 20-bit fixed-point dot product vector result represents the sign of the number. In this case, the number is 0 and it represents a positive number.
The floating-point format of the number = 00010011111010101010*2₀ or 0.0010011111010101010*2₁₉.
Next, the value is shifted by 3 bits because it has three leading 0s. The exponent value is adjusted accordingly. The normalized result = 1.0011111010101010000*2₁₆.
The result is then converted into 32-bit floating-point with the following formula:
1. Exponent = 19-bit left shift value + shared_exponent[7:0] value - bias =143
2. Mantissa = {0011111010101010000, 4'b0000}
3. 32-bit floating-point operand = 0_10001111_00111110101010100000000

Example 2: Convert 20-bit fixed-point dot product vector of 11111010101010000000 (-21888 decimal value) to 32-bit floating-point. The exponent is adjusted by the shared_exponent[7:0] values. In this example, the shared_exponent[7:0] values is 0.

The most significant bit in the 20-bit fixed-point dot product vector result represents the sign of the number. In this case, the number is 1 and it represents a negative number.
The DSP block converts the number to a sign-magnitude format using the bitwise inversion method. The result is 00000101010110000000*2₀ or 0.0000101010110000000 * 2₁₉.
Next, the value is shifted by 5 bits because it has five leading 0s. The exponent value is adjusted accordingly. The normalized result = 1.0101011000000000000*2₁₄.
The result is then converted into 32-bit floating-point with the following formula:
1. Exponent = 19-bit left shift value + shared_exponent[7:0] value - bias = 141
2. Mantissa = {0101011000000000000, 4'b0000}
3. 32-bit floating-point operand = 1_10001101_01010110000000000000000

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Variable Precision DSP Blocks User Guide: Agilex™ 5 FPGAs and SoCs

2.3.4.1. Fixed-point to 32-bit Floating-point Conversion Examples