Low Latency 100G Ethernet Intel® FPGA IP Core User Guide: For Intel® Stratix® 10 Devices

ID 683100
Date 2/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

8. Debugging the Link

Use the Ethernet Link Inspector (ELI) tool to debug your link.

The ELI is an inspection tool that can continuously monitor an Ethernet link that contains an Ethernet IP, which includes Ethernet lane alignment status, clock data recovery (CDR) lock, media access controller (MAC) statistics, Forward Error Correction (FEC) statistics, and others. If needed, the ELI can capture an event with the help of Signal Tap Logic Analyzer to further examine the link behavior during Auto-negotiation (AN), Link Training (LT), or any other event during the link operation. The ELI also creates a graphical user interface (GUI) to represent the link behavior and is available in the Intel® Quartus® Prime Pro software.

To use ELI, turn on Enable JTAG to Avalon Master Bridge feature in the IP, For more information, refer to the IP Core Parameters.

Begin debugging your link at the most basic level, with word lock. Then, consider higher level issues.

The following steps should help you identify and resolve common problems that occur when bringing up a Low Latency 100G Ethernet Intel FPGA IP core link:

  1. Establish word lock—The RX lanes should be able to achieve word lock even in the presence of extreme bit error rates. If the IP core is unable to achieve word lock, check the transceiver clocking and data rate configuration. Check for cabling errors such as the reversal of the TX and RX lanes. Check the clock frequency monitors in the Control and Status registers.

    To check for word lock: Clear the FRM_ERR register by writing the value of 1 followed by another write of 0 to the SCLR_FRM_ERR register at offset 0x324.Then read the FRM_ERR register at offset 0x323. If the value is zero, the core has word lock. If non-zero the status is indeterminate.

  2. When having problems with word lock, check the EIO_FREQ_LOCK register at address 0x321. The values in this register define the status of the recovered clock. In normal operation, all the bits should be asserted. A non-asserted (value-0) or toggling logic value on the bit that corresponds to any lane, indicates a clock recovery problem. Clock recovery difficulties are typically caused by the following problems:
    • A high bit error rate (BER)
    • Failure to establish the link
    • Incorrect clock inputs to the IP core
  3. Check the PMA FIFO levels by selecting appropriate bits in the EIO_FLAG_SEL register and reading the values in the EIO_FLAGS register. During normal operation, the TX and RX FIFOs should be nominally filled. Observing a the TX FIFO is either empty or full typically indicates a problem with clock frequencies. The RX FIFO should never be full, although an empty RX FIFO can be tolerated.
  4. Establish lane integrity—When operating properly, the lanes should not experience bit errors at a rate greater than roughly one per day. Bit errors within data packets are identified as FCS errors. Bit errors in control information, including IDLE frames, generally cause errors in XL/CGMII decoding.
  5. If the IP core acquires word lock but the link is still not established, check the AM_LOCK register at offset 0x328 by reading it repeatedly. If it is deasserted or toggling, check the cables and connections.
  6. If the IP core acquires alignment marker lock on all virtual lanes (bit [0] of the AM_LOCK has the consistent value of 1), but the link is still not established, check the LANE_DESKEWED register at offset 0x329. If this register remains at the value of 0, the skew is greater than the deskew limit.
  7. Verify packet traffic—The Ethernet protocol includes automatic lane reordering so the higher levels should follow the PCS. If the PCS is locked, but higher level traffic is corrupted, there may be a problem with the remote transmitter virtual lane tags.
  8. Tuning—You can adjust analog parameters to improve the bit error rate. IDLE traffic is representative for analog purposes.

In addition, your IP core can experience loss of signal on the Ethernet link after it is established. In this case, the TX functionality is unaffected, but the RX functionality is disrupted. The following symptoms indicate a loss of signal on the Ethernet link:

  • The IP core deasserts the rx_pcs_ready signal, indicating the IP core has lost alignment marker lock.
  • The IP core deasserts the RX PCS fully aligned status bit (bit [0]) of the RX_PCS_FULLY_ALIGNED_S register at offset 0x326. This change is linked to the change in value of the rx_pcs_ready signal.
  • If Enable link fault generation is turned on, the IP core sets local_fault_status to the value of 1.
  • The IP core asserts the Local Fault Status bit (bit [0]) of the Link_Fault register at offset 0x508. This change is linked to the change in value of the local_fault_status signal.
  • The IP core triggers the RX digital reset process by asserting soft_rxp_rst.