Intel® FPGA P-Tile Avalon® Streaming IP for PCI Express* Design Example User Guide

ID 683038
Date 12/15/2021
Public

A newer version of this document is available. Customers should click here to go to the newest version.

1.3. Functional Description for the Performance Design Example

The performance design example is designed to showcase the performance of the P-Tile Avalon-ST for PCI Express Hard IP. It can be configured to send memory write-only TLPs, memory read-only TLPs or both memory write and memory read TLPs for throughput measurement. The throughput counter is implemented in the FPGA Application logic to minimize the software overhead. For throughput measurement, the software application running at the host side will issue a memory read to read the throughput counter value from the control register and then print the throughput figure at the system terminal. The software application is required to issue a memory write to the control register to stop the traffic at the end of the test.

The performance design example automatically creates the files necessary to simulate and compile in the Intel® Quartus® Prime software. It supports the Gen4 x16, 512-bit interface Hard IP mode. For the Intel® Agilex™ device family, this design example supports up to a 500MHz PLD clock frequency. For the Intel® Stratix® 10 DX device family, it supports up to a 400MHz PLD clock frequency.

This design example includes the following components:
  • The generated P-Tile Avalon Streaming Hard IP Endpoint variant (DUT) with the parameters you specified. This component interacts with the root complex/switch at the other end of the PCIe link and translates the data from the PCIe link into Avalon-ST data format, and vice versa.
  • The PCIe Perf_ED (perf0) component generates the requested data traffic for throughput measurement. It consists of the following sub-modules:
    • The pioperf_multitlp_adapter module converts the 2-segment data of the Avalon-ST interface into two single-segment streams of Avalon-ST data.

      The pioperf_rx_diverter module diverts memory write or memory read TLPs from the host and completion TLPs to their respective destinations for further processing.

      The pioperf_rx_intf module decodes the TLP headers and data from the pioperf_rx_diverter module and converts them into the Avalon-MM format. It also extracts the information needed to construct the TLP header of the completion data such as the requester ID, tag, attribute, tc and byte count and then pass this to pioperf_tx_intf for further processing.

      The pioperf_tx_intf module converts the requests from the pioperf_rx_intf and pioperf_dma_top modules into TLPs and send them to the P-Tile Avalon Streaming Hard IP. It transmits TLPs in a simplified weighted round-robin manner with this priority scheme: completion TLP > memory read TLP > memory write TLP.

      The pioperf_dma_top module generates memory write and memory read requests and passes them to the pioperf_tx_intf module based on the information in the control register. Every memory read request will be tagged to expecting completion data before timeout to ensure data completeness. The release of memory write and memory read TLPs will build up the traffic at the TX and RX interfaces of the PCIe link. A throughput counter is included to analyze the overall throughput of the system.

  • Reset Release IP: This IP holds the control circuit in reset until the device has fully entered user mode. The FPGA asserts the INIT_DONE output to signal that the device is in user mode. The Reset Release IP generates an inverted version of the internal INIT_DONE signal to create the nINIT_DONE output that you can use for your design.The nINIT_DONE signal is high until the entire device enters user mode. After nINIT_DONE asserts (low), all logic is in user mode and operates normally. You can use the nINIT_DONE signal in one of the following ways:
    • To gate an external or internal reset.
    • To gate the reset input to the transceiver and I/O PLLs.
    • To gate the write enable of design blocks such as embedded memory blocks, state machine, and shift registers.
    • To synchronously drive register reset input ports in your design.
Figure 9. Block Diagram for the Platform Designer PCIe 1x16 Performance Design Example
Figure 10. Platform Designer System Contents for P-Tile Avalon-ST for PCI Express 1x16 Performance Design Example
Note: Hardware testing support for the PCIe Performance design example is not available in the 21.4 release of Intel® Quartus® Prime.