Visible to Intel only — GUID: gda1690833607804
Ixiasoft
Visible to Intel only — GUID: gda1690833607804
Ixiasoft
3.3.1. Tensor Floating-point Mode
- Data input feed
- Side input feed
The ping-pong buffers load the data to the two DOT product vector engines to start calculating the signed 20-bit fixed-point DOT product vectors simultaneously. DOT product can support 10 signed 8x8 multiplication. Next, the fixed-point to 32-bit floating-point converter converts the output of each DOT product into 32-bit floating-point operands that are adjusted by the shared_exponent[7:0] values. Then, the accumulator adds the two 32-bit floating-point values to either the data input from the cascade_data_in[63:0] bus or the previous cycle’s accumulation value. The accumulator outputs the data in FP32 data format to core fabric or the next DSP block in the chain through the cascade_data_out[63:0] bus.
Input Operands | Cascade Input Enabled | Accumulator Enabled |
---|---|---|
10-element signed 8x8 | Column one = column one cascade_data_in[31:0] + 32-bit floating-point conversion of (data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10, (shared_exponent_in[7:0] + shared_exponent_data[7:0])) b1, b2, b3, b4, b5, b6, b7, b8, b9, b10 are fed into the loading buffer by the same bandwidth of “data_in_1[7:0] … data_in_10[10]”
Note: shared_exponent_data are fed into the loading buffer by shared_exponent_in[7:0].
|
Column one = column one accumulator result + 32-bit floating-point conversion of (data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10 + shared_exponent_data[7:0]) |
Column two = column two cascade_data_in[63:32] + 32-bit floating-point conversion of (data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10, (shared_exponent_in[7:0] + shared_exponent_data[7:0])) b1, b2, b3, b4, b5, b6, b7, b8, b9, b10 are fed into the loading buffer by the same bandwidth of “data_in_1[7:0] … data_in_10[10]”
Note: shared_exponent_data are fed into the loading buffer by shared_exponent_in[7:0].
|
Column two = column two accumulator result + 32-bit floating-point conversion of (data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10 + shared_exponent_data[7:0]) |
Input Operands | Cascade Input Enabled | Accumulator Enabled |
---|---|---|
10-element signed 8x8 | Column one = column one cascade_data_in[31:0] + 32-bit floating-point conversion of [data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10, (side_in_2[7:0] + shared_exponent_data[7:0])) b1, b2, b3, b4, b5 are feed in by shifting side_in_1[7:0] b6, b7, b8, b9, b10 are feed in by shifting side_in_2[7:0] |
Column one = column one accumulator result + 32-bit floating-point conversion of [data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10, (side_in_2[7:0] + shared_exponent_data[7:0])) b1, b2, b3, b4, b5 are feed in by shifting side_in_1[7:0] b6, b7, b8, b9, b10 are feed in by shifting side_in_2[7:0] |
Column two = column two cascade_data_in[63:32] + 32-bit floating-point conversion of [data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10, (side_in_2[7:0] + shared_exponent_data[7:0])) b1, b2, b3, b4, b5 are feed in by shifting side_in_1[7:0] b6, b7, b8, b9, b10 are feed in by shifting side_in_2[7:0] |
Column two = column two accumulator result + 32-bit floating-point conversion of [data_in_1[7:0]*b1 + data_in_2[7:0]*b2 + data_in_3[7:0]*b3 + data_in_4[7:0]*b4 + data_in_5[7:0]*b5 + data_in_6[7:0]*b6 + data_in_7[7:0]*b7 + data_in_8[7:0]*b8 + data_in_9[7:0]*b9 + data_in_10[7:0]*b10, (side_in_2[7:0] + shared_exponent_data[7:0])) b1, b2, b3, b4, b5 are feed in by shifting side_in_1[7:0] b6, b7, b8, b9, b10 are feed in by shifting side_in_2[7:0] |