R-Tile Avalon® Streaming Intel® FPGA IP for PCI Express* User Guide

ID 683501
Date 10/07/2024
Public
Document Table of Contents

4.3.1.4. Avalon® Streaming TX Interface

The Application Layer transfers data to the Transaction Layer of the R-Tile PCI Express IP core over the Avalon® -ST TX interface. The R-Tile PCI Express IP core must assert pX_tx_st_ready_o before transmission begins.

If the R-Tile PCI Express IP core is configured in Configuration Mode 0 (1x16) with a double-width configuration, there are four segments with a 256-bit data width that allows multiple TLPs per cycle. This means there are four pX_tx_stN_sop_i signals and four pX_tx_stN_eop_i signals for Configuration Mode 0 (1x16).

This interface also does not follow a fixed latency between the pX_tx_st_ready_o and pX_tx_stN_dvalid_i signals as specified by the Avalon Interface Specifications.

The R-Tile PCI Express core when in Configuration Mode 0 (1x16) and in a double-width configuration provides four segments with each one having 256 bits of data (pX_tx_stN_data_i[255:0]), 128 bits of header (pX_tx_stN_hdr_i[127:0]), and 32 bits of TLP prefix (pX_tx_stN_prefix_i[31:0]). If the core is configured in Configuration Mode 0 (1x16), all four segments are used, so the data bus becomes a 1024-bit bus altogether, consisting of pX_tx_st0_data_i[255:0], pX_tx_st1_data_i[255:0], pX_tx_st2_data_i[255:0], and pX_tx_st3_data_1[255:0].

Parity generation is done via a 32:1 XOR (i.e. there is one parity bit for every 32 data, header or prefix bits).

Table 58.  Avalon Streaming TX Interface Signals
Signal Name Direction Description EP/RP/BP Clock Domain
pX_tx_stN_data_i[255:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Application Layer data for transmission. The data bus is organized in multiple 256-bit segments. In x16 mode, all four segments are used to effectively form a 1024-bit data bus. In x8 mode, two segments are used to form a 512-bit data bus. In x4 mode, each 256-bit segment is an independent data bus.

The Application Layer must provide a properly formatted TLP on the TX interface. The data is valid when the corresponding tx_stN_valid_i signal is asserted.

The mapping of message TLPs is the same as the mapping of Transaction Layer TLPs with 4-dword headers. The number of data cycles must be correct for the length and address fields in the header. Issuing a packet with an incorrect number of data cycles results in the TX interface hanging and becoming unable to accept further requests.

Note: There must be no Idle cycle between the tx_stN_sop_i and tx_stN_eop_i cycles unless there is backpressure with the deassertion of tx_st_ready_o.
EP/RP/BP coreclkout_hip
pX_tx_stN_hdr_i[127:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input This is the header to be transmitted, which follows the TLP header format of the PCIe specifications. Consider the following guidelines:
  • When the R-Tile Avalon® Streaming Intel FPGA IP for PCIe is configured in EP mode, it automatically calculates the Completer/Requester ID and the Application logic does not need to provide this information as part of the TLP header being transmitted. Note that this guideline does not apply when the IP is configured in TL Bypass mode.
  • When the R-Tile Avalon® Streaming Intel FPGA IP for PCIe is configured in RP mode, the requester/Completer ID must be set to 0x0 on the Avalon Streaming TX interface. This applies to all the Avalon Streaming TX interfaces for each of the ports when using bifurcation. The R-tile IP translates the BDF as follows:
    Port AVST TX Req/Cpl ID PCIe Link Req/Cpl ID B:D.F
    0 0x0000 0x0000 00:00.0
    1 0x0000 0x0008 00:01.0
    2 0x0000 0x0010 00:02.0
    3 0x0000 0x0018 00:03.0

    The RP BDF is hardcoded and does not change with enumeration, i.e. the enumeration software cannot assign different bus/device numbers to the ports.

  • When the R-Tile Avalon® Streaming Intel FPGA IP for PCIe is configured in EP mode and SR-IOV is enabled, follow the guidelines stated in the BDF Assignments section of SR-IOV Support Implementation.

These signals are valid when the corresponding tx_stN_sop_i signal is asserted.

EP/RP/BP coreclkout_hip
pX_tx_stN_prefix_i[31:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

This is the TLP prefix to be transmitted, which follows the TLP prefix format of the PCIe specifications. PASID is supported.

These signals are valid when the corresponding tx_stN_sop_i signal is asserted.

The TLP prefix uses a Big Endian implementation (i.e. the Fmt field is in bits [31:29] and the Type field is in bits [28:24]).

If no prefix is present for a given TLP, that dword, including the Fmt field, is all zeros.

EP/RP/BP coreclkout_hip
pX_tx_stN_sop_i where

X = 0,1,2,3 (IP core number)

N = 0,2 (segment number)

Input Indicate the first cycle of a TLP when asserted in conjunction with the corresponding bit of tx_stN_valid_i. For the x16 configuration:
  • tx_st3_sop_i: When asserted, indicates the start of a TLP in tx_st3_data_i[255:0].
  • tx_st2_sop_i: When asserted, indicates the start of a TLP in tx_st2_data_i[255:0].
  • tx_st1_sop_i: When asserted, indicates the start of a TLP in tx_st1_data_i[255:0].
  • tx_st0_sop_i: When asserted, indicates the start of a TLP in tx_st0_data_i[255:0].

These signals are asserted for one clock cycle per each TLP. They also qualify the corresponding tx_stN_hdr_i and tx_stN_tlp_prfx_i signals.

Note: pX_tx_stN_sop_i pulses can only be sent on segments 0 or 2 (st0 or st2).
EP/RP/BP coreclkout_hip
pX_tx_stN_eop_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input Indicate the last cycle of a TLP when asserted in conjunction with the corresponding bit of tx_stN_valid_i. For the x16 configuration:
  • tx_st3_eop_i: When asserted, indicates the end of a TLP in tx_st3_data_i[255:0].
  • tx_st2_eop_i: When asserted, indicates the end of a TLP in tx_st2_data_i[255:0].
  • tx_st1_eop_i: When asserted, indicates the end of a TLP in tx_st1_data_i[255:0].
  • tx_st0_eop_i: When asserted, indicates the end of a TLP in tx_st0_data_i[255:0].

These signals are asserted for one clock cycle per each TLP.

EP/RP/BP coreclkout_hip
pX_tx_stN_dvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the data of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_dvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_hvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the header of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_hvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_pvalid_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Qualify the prefix of the corresponding segment of tx_stN_data_i into the IP core on ready cycles.

To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_stN_pvalid_i signals.

EP/RP/BP coreclkout_hip
pX_tx_stN_data_par_i[Z:0] where

X = 0,1,2,3 (IP core number) and Z varies based on the core.

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_data_i. Bit [0] corresponds to tx_stN_data_i[31:0], bit [1] corresponds to tx_stN_data_i[63:32], and so on.

By default, the PCIe Hard IP generates the parity for the TX data.

EP/RP/BP coreclkout_hip
pX_tx_stN_hdr_par_i[3:0] where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_hdr_i.

By default, the PCIe Hard IP generates the parity for the TX header.

EP/RP/BP coreclkout_hip
pX_tx_stN_prefix_par_i where

X = 0,1,2,3 (IP core number)

N = 0,1,2,3 (segment number)

Input

Parity for tx_stN_tlp_prfx_i.

By default, the PCIe Hard IP generates the parity for the TX TLP prefix.

EP/RP/BP coreclkout_hip
pX_tx_st_ready_o where

X = 0,1,2,3 (IP core number)

Output

Indicates that the PCIe Hard IP is ready to accept data.

When tx_st_ready_o is driven high by the R-Tile Avalon Streaming Intel FPGA IP for PCIe, the Application logic may assert high the tx_stN_valid_i signal and transfer data.

When tx_st_ready_o is driven low by the R-Tile Avalon Streaming Intel FPGA IP for PCIe, the Application logic must drive low the tx_stN_valid_i within a maximum of 16 clock cycles.

Refer to Avalon Streaming TX Interface pX_tx_st_ready_o Behavior for additional information.

The pX_tx_st_ready_o signal can be deasserted in the following conditions:
  • The LTSSM is not in L0 state.
  • A TLP retry is in progress.
  • The R-Tile Avalon-ST IP is busy sending internally generated TLPs.
  • The internal R-Tile TX FIFO is full.
EP/RP/BP coreclkout_hip

As an example, Avalon® Streaming TX Interface Timings below shows the behavior of the Avalon Streaming TX interface in a back-to-back TLPs scenario with data spanning across multiple segments. The following text describes the waveforms per clock cycle:

  1. Clock cycle 1: The R-Tile Intel FPGA IP for PCI Express asserts p0_tx_st_ready_o signal, indicating the Hard IP is ready to accept TLPs from the Application logic.
  2. Clock cycle 2:
    1. The start of the first TLP (T0) is in segment 0, indicated by the assertion of p0_tx_st0_sop_i.
    2. The signal p0_tx_st0_hvalid_i is asserted to validate the header of this first TLP (T0H0) in the p0_tx_st0_hdr_i bus.
    3. The signal p0_tx_st0_dvalid_i is asserted to validate the data of this first TLP (T0D0) in the p0_tx_st0_data_i bus.
    4. The signal p0_tx_st1_dvalid_i is asserted to validate the next portion of the data of this first TLP (T0D1) in the p0_tx_st1_data_i bus.
    5. The signal p0_tx_st2_dvalid_i is asserted to validate the next portion of the data of this first TLP (T0D2) in the p0_tx_st2_data_i bus.
    6. The signal p0_tx_st3_dvalid_i is asserted to validate the final portion of the data of this first TLP (T0D3) in the p0_tx_st3_data_i bus.
    7. The end of this first TLP (T0) is in segment 3, denoted by the assertion of p0_tx_st3_eop_i.
  3. Clock cycle 3:
    1. The next TLP (T1), arrives in segment 0, as denoted by p0_tx_st0_sop_i staying high.
    2. The signal p0_tx_st0_hvalid_i is asserted to validate the header of this TLP (T1H0) in the p0_tx_st0_hdr_i bus.
    3. The signal p0_tx_st0_dvalid_i is asserted to validate the data of this TLP (T1D0) in the p0_tx_st0_data_i bus.
    4. The signal p0_tx_st1_dvalid_i is asserted to validate the next portion of the data of this TLP (T1D1) in the p0_tx_st1_data_i bus.
    5. The signal p0_tx_st2_dvalid_i is asserted to validate the next portion of the data of this TLP (T1D2) in the p0_tx_st2_data_i bus.
    6. The signal p0_tx_st3_dvalid_i is asserted to validate the final portion of the data of this TLP (T1D2) in the p0_tx_st3_data_i bus.
    7. The end of this TLP (T1) is in segment 3, denoted by p0_tx_st3_eop_i staying high.
Figure 32.  Avalon® Streaming TX Interface Timings
Note: For Configuration Mode 0 (1x16), the start of a TLP (pX_tx_stN_sop_i) can only happen on segment 0 (st0) or segment 2 (st2).