Visible to Intel only — GUID: gda1690833607804
Ixiasoft
Visible to Intel only — GUID: gda1690833607804
Ixiasoft
3.3.3. Tensor Floating-point Mode
- Data input feed
- Side input feed
A signed 20-bit fixed-point DOT product vector is calculated using the preloaded weights and data_in_{1..10} inputs.
The DOT product performs 10 signed 8x8 multiplications.
Next, the fixed-point to 32-bit floating-point converter converts the output of each DOT product into 32-bit floating-point values that are adjusted by the shared_exponent_data[7:0] and preloaded buffer share exponent values.
Then, the accumulator adds or subtracts the cascade_data_in_col_{1..2} or the previous cycle’s accumulation value depending upon the dynamic inputs acc_en and zero_en.
Whether the accumulator adds or subtracts is an IP configuration option.
The accumulator outputs the data in FP32 data format to the core fabric as fp32_col_{1..2}[31:0] and or the next DSP block in the chain through the cascade_data_out_col_{1..2}[31:0] buses.
zero_en | acc_en | fp32_col_1[31:0] | fp32_col_2[31..0] |
---|---|---|---|
0 | 0 | 32-bit floating point conversion of ( data_in_1[7:0]*B1C1 + data_in_2[7:0]*B2C1 + data_in_3[7:0]*B3C1 + data_in_4[7:0]*B4C1 + data_in_5[7:0]*B5C1 + data_in_6[7:0]*B6C1 + data_in_7[7:0]*B7C1 + data_in_8[7:0]*B8C1 + data_in_9[7:0]*B9C1 + data_in_10[7:0]*B10C1) +/- cascade_data_in_col_1[31..0] | 32-bit floating point conversion of ( data_in_1[7:0]*B1C2 + data_in_2[7:0]*B2C2 + data_in_3[7:0]*B3C2 + data_in_4[7:0]*B4C2 + data_in_5[7:0]*B5C2 + data_in_6[7:0]*B6C2 + data_in_7[7:0]*B7C2 + data_in_8[7:0]*B8C2 + data_in_9[7:0]*B9C2 + data_in_10[7:0]*B10C2) +/- cascade_data_in_col_2[31..0] |
0 | 1 | 32-bit floating point conversion of ( data_in_1[7:0]*B1C1 + data_in_2[7:0]*B2C1 + data_in_3[7:0]*B3C1 + data_in_4[7:0]*B4C1 + data_in_5[7:0]*B5C1 + data_in_6[7:0]*B6C1 + data_in_7[7:0]*B7C1 + data_in_8[7:0]*B8C1 + data_in_9[7:0]*B9C1 + data_in_10[7:0]*B10C1) +/- fp32_col_1[31..0] | 32-bit floating point conversion of ( data_in_1[7:0]*B1C2 + data_in_2[7:0]*B2C2 + data_in_3[7:0]*B3C2 + data_in_4[7:0]*B4C2 + data_in_5[7:0]*B5C2 + data_in_6[7:0]*B6C2 + data_in_7[7:0]*B7C2 + data_in_8[7:0]*B8C2 + data_in_9[7:0]*B9C2 + data_in_10[7:0]*B10C2) +/- fp32_col_2[31..0] |
1 | NA | 32-bit floating point conversion of ( data_in_1[7:0]*B1C1 + data_in_2[7:0]*B2C1 + data_in_3[7:0]*B3C1 + data_in_4[7:0]*B4C1 + data_in_5[7:0]*B5C1 + data_in_6[7:0]*B6C1 + data_in_7[7:0]*B7C1 + data_in_8[7:0]*B8C1 + data_in_9[7:0]*B9C1 + data_in_10[7:0]*B10C1) | 32-bit floating point conversion of ( data_in_1[7:0]*B1C2 + data_in_2[7:0]*B2C2 + data_in_3[7:0]*B3C2 + data_in_4[7:0]*B4C2 + data_in_5[7:0]*B5C2 + data_in_6[7:0]*B6C2 + data_in_7[7:0]*B7C2 + data_in_8[7:0]*B8C2 + data_in_9[7:0]*B9C2 + data_in_10[7:0]*B10C2) |
The output signals fp32_col_{1..2}_flag[3:0] are provided in conjunction with the floating point output to show the exception type. The encoding of this signal is as shown in the following table.
bit 3 | Overflow |
bit 2 | Underflow |
bit 1 | Inexact |
bit 0 | Invalid (NaN) |