The transceiver is makes up of two sub-layers of the PHY, namely the physical coding sub-layer (PCS) and the physical medium attachment (PMA).
The purpose of the PMA is to convert the digital data into an analog stream or the reverse. The PMA could also connect directly to your physical communication medium. An example of the PMA block would the parallel to serial converter or serial to parallel converter.
The PCS is the digital portion of the PHY layer. In the transmitter, it’s job is to prepare the parallel data for transmission across the physical medium. Examples of this would be encoding or scrambling the data in some way to guarantee a certain number of transitions in the serial stream. In the receiver, the job of the PCS is to return the transmitted data back to its original form. This would include decoding or descrambling the data stream as well as locating the byte or word boundaries.
The MAC is responsible for managing the transactions through the PHY. It receives data or information from the upper layers and assembles the packets to be given to the PCS for transmission across the link. It disassembles any packets received from over the link. Any errors or faults generated during transmission from the link are decoded and handled by the MAC in order to maintain communication between the endpoints. In a sense, the exact data content the MAC transmits or receives is not important, just as long as the link continues to function correctly. The upper layers then are more concerned with where the data is actually going and what data is being is sent/received.
Altera offers midrange FPGAs with embedded transceivers with the Arria series of FPGAs. The core architecture of these families borrow from the Stratix line, but is optimized for lower power, more cost sensitive applications. The Arria families support maximum data rates from 6 Gbps up to 28.1 Gbps.
Altera also offers a low-cost FPGA families with embedded transceivers called the Cyclone series of FPGAs. These devices are designed for more high-volume, cost-sensitive applications with higher I/O bandwidth requirements. They support maximum data rates from 3 G up to 5 G.
Click on any device family on this page to taken to a website to learn more about that device family and which high-speed protocols and specifications are supported by them.
Please use the links from the previous slide for specifics on the transceiver architecture of your particular target device.
One of the Hard IP for PCI Express blocks has support for Configuration via Protocol (CvP). CvP allows the FPGA fabric to be configured via a PCIe link without a host reboot or FPGA full chip re-initialization. Refer to the Arria 10 Transceiver PHY User Guide for specifics on transceiver architecture and packaging information.
With dynamic reconfiguration, you can reconfigure parts of the transceiver or the entire transmitter or receiver paths on the fly without cycling power to the FPGA or reconfiguring the entire FPGA. Thus, you can simply change analog buffer settings, transmit data rates, transmitter PLL settings or switch from one protocol implementation to another.
This is useful for debugging and bringing up your system or to build adaptable or reconfigurable systems to support different environments.
We will now look at these paths in more detail in the next two sections.
Since the transmitter path has really only two responsibilities, its construction will be simpler and use fewer blocks when compared to the receiver.
Hardening means that a dedicated block is built in silicon to perform the function versus using general FPGA resources to perform the task. Dedicated blocks almost always have better performance when compared to using general FPGA resources to do the same thing.
This configuration is found in all Altera transceiver devices.
Using the phase compensation FIFO, you make sure that your parallel data gets synchronized into the transmitter's own clock domain.
The FIFO accepts data in various widths from 8 up to 40 bits wide, depending on what is supported by the device family.
In some transceivers, this FIFO can operate in additional modes to accommodate additional protocols like IEEE 1588.
So, if your parallel data needs to run faster than the supported maximum FPGA-transmitter interface rate in order to achieve your desired line rate, then the byte serializer doubles the parallel FPGA-transmitter interface data width so you can provide twice the data at one time. The byte serializer then converts the data back down to single or double width data used by the rest of the transmitter. So, if you want to use single width mode, then your FPGA interface will be 16 or 20 bits and the byte serializer converts it to 8 or 10 bits. If you use double width modes, then the FPGA interface will be 32 or 40 bit data and byte serializer will convert it to 16 or 20 bits. On the output of the byte serializer, the least significant byte is transmitted first.
PRBS 9 is used to test channels with 8B/10B encoding/decoding scheme. PRBS 15 is mostly used for jitter measurements. PRBS 23 is used for channels that don’t use 8B/10B encoding scheme. PRBS 31 is the recommended pattern for 10GBASE-R, 10BBASE-KR and other applications that use forward error correction (FEC).
The blocks in the 10G PCS are the transmit FIFO, the frame generator, the CRC-32 generator, the 64B/66B encoder, the scrambler, the disparity generator and the transmitter gearbox. Not all blocks are required for all 10G protocols, so they are enabled and disabled as needed for the particular target protocol implementation.
In some transceivers, the TX FIFO can also operate in register mode. This mode is for CPRI and IEEE 1588 applications that require deterministic latency. In register mode, the TX FIFO incurs one cycle of latency of the parallel low speed PCS clock.
Having this operation performed in hardware save resources and allows for faster transmission.
For higher data rates above 5G, other encoding methods are used such as the 64B/66B encoding scheme. Unlike 8B/10B which works on individual bytes, this encoding method transforms every 64 bits (8 bytes) of data into 66 bits for transmission across the link, thus less overhead is incurred. Still, it allows DC balancing and disparity to be controlled as well as provides transitions so that the embedded clock can be recovered. The output is typically sent to a scrambler to further ensure that the data being sent has enough transitions for clock recovery.
For Interlaken, the scrambler operates in frame synchronous mode where a synchronizing word-pattern is used in each frame to maintain synchronization between the transmitter and receiver. This unique pattern is always sent as the first word of the Meta Frame and is sent unscrambled. Also, the second word of the Meta Frame sent by the transmitter is the current state of the scrambler polynomial, which is used by the receiver to decode the rest of the Meta Frame. The scrambler state word is also sent unscrambled. Using this method, the receiver always knows where the start of the Meta Frame is and from where to obtain the scrambler state.
For 10GBASE-R, the scrambler operates in self synchronous mode. In this mode, unlike the frame synchronous mode, once the scrambler is synchronized or seeded, it then uses its polynomial to continuously scramble any words after that point (excluding the sync header bits) and is not continuously re-synchronized at regular intervals.
Note that the clocking of the gearbox and PCS has to be matched to support the target line rate. In other words, if the data is being removed from the gearbox in 40-bit parallel words and is being placed into the gearbox in 66 or 67-bit words, the clocking of the two sides of the gearbox must be matched so as to:
Eventually not overflow or starve the the gearbox
Maintain the perceived target line rate to the link
These clock rates will be determined and provided by the transceiver configuration tools when building the system.
The transmitter gearbox can also reverse the parallel word so that the default behavior of sending the LSB first changes to send the MSB first, for protocols that require this.
Since the Enhanced PCS shares the PRBS generator with the standard PCS, its behavior was discussed earlier in standard PCS section. You may use the link of this page if you wish to go back and review that information.
The PRP generator generates various pseudo-random test patterns depending on the initial value loaded into the scrambler, or seed, and the data pattern selected. It is designed specifically for use in 10GBASE-R and IEEE 1588 applications.
You cannot enable both generators at once.
The KR FEC block is part of the Enhanced PCS and is designed in accordance with the 10G-KRFEC and 40G-KRFEC sections of the IEEE 802.3 specification.
Most data transmission systems, such as Ethernet, have minimum requirements for the bit error rate (BER). However, due to channel distortion or noise in the channel, the required BER may not be achievable. In these cases, adding forward error control correction can improve the BER performance of the system by adding redundancy into the transmitted data. This redundancy allows for errors to be detected at the receiver and possibly corrected without having to retransmit the data.
The FEC block is optional and can be bypassed.
For more details on KR FEC block for Arria 10 devices, refer to Arria 10 Transceiver PHY User Guide.
The first block is the phase compensation FIFO. Like the FIFOs in the other 2 PCS configurations, this one compensates for phase offsets between the FPGA clock domain and the transmitter path clock domain, ensuring reliable transfers between them.
The encoder block encodes a 128-bit data word according to the PCIe Gen3 specification, converting it into a 130-bit word by appending 2-bit sync header to the MSBs. It also indicates to the scrambler which packets should be scrambled and which should not be scrambled. All data packets get scrambled, while ordered sets, packets used for synchronization or link management, are not.
The scrambler block, like the scrambler in the 10G PCS, ensures there are enough transitions in the outgoing data for the RX PLL to remain locked, by scrambling data words according to the PCIe Gen3 specification using linear feedback shift register.
Lastly, the gearbox adapts the 130-bit data width of the PCS output to the 32bit width of the PMA by converting the data words along with skip characters into 32-bit segments.
For example, you may require a different CRC polynomial or generation scheme. Or you may want to use a different method of encoding your transmitted data, whether simpler or more complicated. You may want to use a different method or polynomial for scrambling. Or you may need to frame your packets differently than what is done natively in the PCS.
To implement these alternatives, you would need to bypass the dedicated PCS block and build your block using general FPGA resources. Of course, your alternative processing or data conversion would have to be done before entering the PCS. Building this functionality in general FPGA resources may also impact the performance of your system or link.
So we enter the transmitter PMA. The transmitter PMA consists of the serializer and the transmitter buffer.
The buffer may also contain additional dedicated blocks or features like the receiver detect block or the transmitter output tri-state feature. These blocks and features are for specific applications or protocols, such as PCI Express protocol. Again see your device handbook to see if any of these are available in your target device.
From the transmitter buffer, the data could be driven directly across board traces, a back plane or even a cable or it could go through the 3rd PHY sub-layer, the physical medium dependent layer, or PMD, which would convert the signal to a form based on the transmission medium, such as optical fibers.
This feature is supported in Arria V GT/GZ, Stratix V GX/GS/GT, and Stratix IV GX/GT.
Similar functionality is supported by Arria 10 transceivers in the PCS-Direct mode.
For example, in Stratix IV and Arria II devices, every transceiver block contains a central control block that houses two CMU, or clock management unit, PLLs. These PLLs generate clocks for the transceiver block to which they are associated and for other transceiver blocks. In Cyclone V, Arria V and Stratix V devices, every channel has a PLL that can be configured as a CMU PLL to provide transmitter clocks for the channels around it. Arria 10, Stratix V and Stratix IV devices also have what are called ATX PLLs. These PLLs sit near the transceiver blocks and generate clocks with lower-jitter than the CMU PLLs. The Cyclone IV GX devices have what are called MPLLs which can be used as general PLLs, but can also generate clocks for the transmitter. In some FPGA families, there are general device PLLs and fractional PLLs that can also be used as clock sources for the transmitter channel.
More detail on the clocking behavior can be found in the handbook for your particular target device. For Arria 10 devices, refer to the Arria 10 Transceiver PHY User Guide.
In some transceivers like those in Arria 10 devices, the transceiver receiver buffer also supports advanced features like continuous time linear equalization (CTLE) and decision feedback equalization (DFE).
As you can imagine, the byte ordering block cannot be used with rate matching as the rate matcher is adding and deleting bytes while the byte ordering block would be trying to synchronize to them.
Once through the phase compensation FIFO, the data can be further processed and analyzed by any additional PCS logic or a media access control, or MAC, block implemented in the FPGA core.
The PRBS verifier supports the same 5 patterns as the PRBS generator in the transmitter path: PRBS 9, 15, 23 and 31 along with square wave patterns
As mentioned earlier, PRBS 9 is used to test channels with 8B/10B encoding/decoding scheme. PRBS 15 is mostly used for jitter measurements. PRBS 23 is used for channels that don’t use 8B/10B encoding scheme. PRBS 31 is the recommended pattern for 10GBASE-R, 10GBASE-KR and other applications that use FEC .
Like the transmitter, not all blocks are required for all 10G protocols, so they are enabled and disabled as needed for the particular target protocol implementation.
Natively, the PCS receives the LSB first, but the receiver gearbox also has the ability to do bit reversal so that the MSB can be received first.
If your board incorrectly swaps the p and n traces of your differential pair, you can also use the receiver gearbox to compensate by inverting each bit of the incoming signal.
In receiver phase compensation mode, the FIFO takes the incoming data and retimes it into the FPGA core clock domain to account for any slight phase offsets between the two domains. This is similar to the phase compensation FIFO in the standard PCS.
In clock compensation mode, the FIFO behaves like the rate matcher block in the standard PCS in which it inserts or removes ordered sets to compensate up to ±100 PPM between the link endpoints. This mode is used in Ethernet 10GBASE-R .
In generic mode, the FIFO functions just like a simple FIFO, but provides flags to the FPGA logic so that control logic can monitor the FIFO and manage the data flow, namely FIFO full and empty flags. This mode is used in the Interlaken protocol.
The PRBS verifier is shared with the standard PCS and was discussed previously. Use the link on this page if you wish to go back and review that information.
Once block synchronization is achieved, the PRP verifier monitors the descrambled output of the descrambler to confirm the pseudo-random data pattern sent by the transmitter has been received correctly. The PRP verifier was designed specifically for 10GBASE-R and IEEE 1588 applications.
You cannot enable both the PRP and PRBS verifiers at once.
As described in the transmit section, forward error correction can improve the BER performance of the system by inserting redundancy and allowing a receiver to detect errors in the transmitted data pattern and possibly correct for them.
The FEC block is optional and can be bypassed.
For more details on the KR FEC block for Arria 10 devices, refer to Arria 10 Transceiver PHY User Guide.
It is made up of the block synchronizer, the rate match FIFO, the decoder, the descrambler and the phase compensation FIFO.
The rate match FIFO, like the rate matcher in the standard PCS and the RX FIFO in the 10G PCS, compensates for frequency differences of up to ±300 PPM between the link endpoints by adding or removing SKP ordered sets in the data word as the FIFO underflows or overflows.
The decoder converts the 130-bit data word back to its original 128 bits by removing the 2-bit synchronization header added by the transmitter. It can also report any ordered set or sync header violations. Since ordered sets and sync headers are not scrambled, when the decoder detects them, it signals this to the descrambler so the descrambler does not try to descramble them as it does all other data words.
The descrambler returns the data words back to the non-scrambled original states using a LFSR implementing the PCIe Gen3 polynomial.
The phase compensation FIFO, like the phase compensation FIFO in the standard PCS and the RX FIFO in the 10G PCS, takes the incoming data and retimes it into the FPGA core clock domain to account for any slight phase offsets between the two domains. It is a shallow FIFO that cannot tolerate any frequency differences, so the read and write FIFO clocks must be derived from the same source.
Please refer to Arria 10 Transceiver PHY User Guide for details.
This feature is supported in Arria V GT/GZ, Stratix V GX/GS/GT and Stratix IV GX/GT devices.
Similar functionality is supported by Arria 10 transceivers in the PCS-Direct mode.
The high-speed serial clock is derived directly from the incoming data by the CDR. This is why an encoding or scrambling method must be used so as to guarantee enough transitions in the incoming data so that a CDR can recover the clock from it.
The lower-speed parallel clock can come from a few sources depending on the mode in which the transceiver is configured. For example, the clock could also come from the receiver CDR or it could come from an associated transmitter PLL.
More detail on the clocking behavior can be found in the handbook for your particular target device.
Thank you.