Transceiver Basics

Menu
Notes

1. Transceiver Basics
2. Course Objectives
3. Agenda
4. Introduction
4.1. What is a Transceiver?
4.2. Definitions
4.3. Altera Transceiver Devices
4.4. Transceiver Locations
4.5. Transceiver Locations (2 of 2)
4.6. Dynamic Reconfiguration
4.7. Transceiver Block Diagram
5. Transmitter Path
5.1. Transmitter Path Definition
5.2. Transmitter Path Agenda
5.3. Transmitter Block Diagram
5.4. TX PCS Configurations
5.4.1. Standard PCS Blocks
5.4.1.1. Transmitter Phase Compensation FIFO
5.4.1.2. Byte Serializer
5.4.1.3. Byte Serializer
5.4.1.4. 8B/10B Encoder
5.4.1.5. Standard PCS Blocks – Arria 10 Transceivers
5.4.1.6. PRBS Generator (Arria 10 Transceivers)
5.4.2. 10G PCS Blocks
5.4.2.1. Transmitter FIFO
5.4.2.2. Frame Generator
5.4.2.3. CRC-32 Generator
5.4.2.4. 64B/66B Encoder
5.4.2.5. Scrambler
5.4.2.6. Disparity Generator
5.4.2.7. Transmitter Gearbox
5.4.3. Enhanced PCS Blocks - Arria 10 Transceivers
5.4.3.1. PRBS/PRP Generators (Arria 10 Transceivers)
5.4.3.2. FEC Blocks (Arria 10 Transceivers)
5.4.4. PCIe Gen3 PCS Blocks
5.4.4.1. PCIe Gen3 PCS Block Descriptions
5.4.4.2. PCIe Gen 3 PCS – Arria 10 Transceivers
5.4.5. Additional Transmitter PCS Blocks
5.5. Transmitter PMA Blocks
5.5.1. Serializer
5.5.2. Transmitter Buffer
5.5.3. Pre-emphasis Example
5.6. PMA-Only Transmit Channel
5.7. Transmitter Path Clocking
5.8. Transmitter Block Diagram Review
6. Receiver Path
6.1. Receiver Path Definition
6.2. Receiver Path Agenda
6.3. Receiver Block Diagram
6.4. Receiver PMA Blocks
6.4.1. Receiver Input Buffer
6.4.2. Receiver CDR
6.4.3. CDR Functional Block Diagram
6.4.4. Deserializer
6.5. RX PCS Configurations
6.5.1. Standard PCS Blocks
6.5.1.1. Word Aligner
6.5.1.2. Word Aligner Blocks
6.5.1.3. Synchronization State Machine
6.5.1.4. Deskew FIFO
6.5.1.5. Rate Matcher
6.5.1.6. Rate Matcher Deletion – Basic Single Width
6.5.1.7. 8B/10B Decoder
6.5.1.8. Byte Deserializer
6.5.1.9. Byte Ordering Block
6.5.1.10. Byte Ordering Example
6.5.1.11. RX Phase Compensation FIFO
6.5.1.12. Standard PCS Blocks – Arria 10 Transceivers
6.5.1.13. PRBS Verifier (Arria 10 Transceivers)
6.5.2. 10G PCS Blocks
6.5.2.1. Receiver Gearbox
6.5.2.2. Block Synchronizer
6.5.2.3. Disparity Checker
6.5.2.4. Descrambler
6.5.2.5. Frame Synchronizer
6.5.2.6. BER Monitor
6.5.2.7. 64B/66B Decoder
6.5.2.8. CRC-32 Checker
6.5.2.9. Receiver FIFO
6.5.3. Enhanced PCS Blocks – Arria 10 Transceivers
6.5.3.1. PRP Verifier (Arria 10 Transceivers)
6.5.3.2. RX KR FEC (Arria 10 Transceivers)
6.5.4. PCIe Gen3 PCS Blocks
6.5.4.1. PCIe Gen3 PCS Block Descriptions
6.5.4.2. PCIe Gen 3 PCS – Arria 10 Transceivers
6.6. Additional Receiver Blocks
6.7. PMA-Only Receive Channel
6.8. Receiver Path Clocking
6.9. Receiver Block Diagram
7. Summary
8. Summary
9. Give Us Your Feedback

Welcome to the Altera® online training presentation entitled Transceiver Basics. The purpose of this training is to provide you with an introduction to the high-speed serial transceivers used in the Altera FPGAs today.

By the end of this course, you will be able to name the basic building blocks found in most high-speed serial transceivers. That’s not to say that you won’t come across other blocks that have some specialized purpose according to the protocols supported by your transceivers, but we will discuss the most common. You will also be able to describe why each of the blocks is needed in the transceiver. The transceivers implement two sub-layers called the physical medium attachment, or PMA, layer and the physical coding sub-layer, or PCS. I will define these terms in more detail later. After this training, you will also be able to indicate which blocks make up these two sub-layers.

The agenda is as follows, we will start with an introduction in which we will define a few terms and go more into the purpose of transceivers. Then we will individually describe the blocks found in the transmitter and receiver data paths.

So, let’s start with an introduction.

A transceiver is a combination transmitter and receiver used by applications to provide high-speed communication. When I say high-speed I mean in the 100’s of Mbits per second to the Gbits per second range. A transceiver allows this communication to use a variety of different physical mediums depending on the application. Some examples are shown on this slide. It could be simply across a board or a backplane or across an optical fiber or cable of some kind. Thus they are used in the PHY, or physical, layer of the Open Systems Interconnection (OSI) model as they serve as the interface between the digital realm and the analog transmission domain. Depending upon your physical medium and the high-speed protocol in which you want to support, you may require specific features from your transceiver. Some application specific standard products, or ASSPs, since they are designed to support specific applications or protocols, will implement only certain features of the transceiver. Since FPGAs are designed to be generic, many of their transceiver blocks and features are configurable.

The transceiver is makes up of two sub-layers of the PHY, namely the physical coding sub-layer (PCS) and the physical medium attachment (PMA).

Before discussing the transceiver any further, I want to define the roles of two of the PHY sub-layers, the PCS and PMA, as well as the media access controller, or MAC.

The purpose of the PMA is to convert the digital data into an analog stream or the reverse. The PMA could also connect directly to your physical communication medium. An example of the PMA block would the parallel to serial converter or serial to parallel converter.

The PCS is the digital portion of the PHY layer. In the transmitter, it’s job is to prepare the parallel data for transmission across the physical medium. Examples of this would be encoding or scrambling the data in some way to guarantee a certain number of transitions in the serial stream. In the receiver, the job of the PCS is to return the transmitted data back to its original form. This would include decoding or descrambling the data stream as well as locating the byte or word boundaries.

The MAC is responsible for managing the transactions through the PHY. It receives data or information from the upper layers and assembles the packets to be given to the PCS for transmission across the link. It disassembles any packets received from over the link. Any errors or faults generated during transmission from the link are decoded and handled by the MAC in order to maintain communication between the endpoints. In a sense, the exact data content the MAC transmits or receives is not important, just as long as the link continues to function correctly. The upper layers then are more concerned with where the data is actually going and what data is being is sent/received.

Here are some of the Altera device families offering embedded transceivers. For high-end, high-performance FPGAs, there’s the Stratix series of devices. These can support maximum data rates from 8.5 Gbps up to 28 Gbps.

Altera offers midrange FPGAs with embedded transceivers with the Arria series of FPGAs. The core architecture of these families borrow from the Stratix line, but is optimized for lower power, more cost sensitive applications. The Arria families support maximum data rates from 6 Gbps up to 28.1 Gbps.

Altera also offers a low-cost FPGA families with embedded transceivers called the Cyclone series of FPGAs. These devices are designed for more high-volume, cost-sensitive applications with higher I/O bandwidth requirements. They support maximum data rates from 3 G up to 5 G.

Click on any device family on this page to taken to a website to learn more about that device family and which high-speed protocols and specifications are supported by them.

The full-duplex transceivers in a transceiver-based device are embedded into the I/O on one or both sides of the chip. For example, the Cyclone V GX device shown here on the left has transceivers on one side of the device, while the Stratix V GX device shown on the right has transceivers on both sides. The transceivers are grouped into blocks of 3, 4 or 6 channels sharing certain resources. Most transceivers contain both a PCS and a PMA. In some devices, you will also find transceiver channels that contain only a PMA.

Please use the links from the previous slide for specifics on the transceiver architecture of your particular target device.

The figure shows an Arria 10 GX device with 96 transceiver channels and 4 Hard IP for PCI Express blocks.

One of the Hard IP for PCI Express blocks has support for Configuration via Protocol (CvP). CvP allows the FPGA fabric to be configured via a PCIe link without a host reboot or FPGA full chip re-initialization. Refer to the Arria 10 Transceiver PHY User Guide for specifics on transceiver architecture and packaging information.

A very useful feature also offered by Altera transceivers is dynamic reconfiguration.

With dynamic reconfiguration, you can reconfigure parts of the transceiver or the entire transmitter or receiver paths on the fly without cycling power to the FPGA or reconfiguring the entire FPGA. Thus, you can simply change analog buffer settings, transmit data rates, transmitter PLL settings or switch from one protocol implementation to another.

This is useful for debugging and bringing up your system or to build adaptable or reconfigurable systems to support different environments.

This is a simplified block diagram of a full-duplex transceiver channel with the transmit path on the top and the receive path on the bottom. Looking at the diagram, the transmitter PCS conditions the data received from the FPGA logic for conversion and transmission by the transmitter PMA as serial data. The receiver PMA receives serial data and converts it so that it can be conditioned by the receiver PCS to be written into the FPGA logic.
We will now look at these paths in more detail in the next two sections.

In our next topic, we focus on the transmitter path.

The role of the transmitter path is to convert parallel data patterns into serial data that can be transmitted at high data rates. In order to do this, the transmitter must also embed the sampling clock into the serial stream. Thus, the clock can be extracted at the receiver and used to sample the incoming data. To embed the clock, the transmitter needs to ensure there are adequate number of transitions in the serial data stream.

Since the transmitter path has really only two responsibilities, its construction will be simpler and use fewer blocks when compared to the receiver.

To discuss the transmitter path, we will first look at a overall block diagram. Then, we will break the transmitter path down into its two sub-layers, the PCS and the PMA. We will finish by discussing the clocking of the transmitter path.

As I showed before, the transceiver’s transmitter path contains the PCS and the PMA. Let's look at the individual blocks that make up these two sub-layers following the data from the FPGA core out to the serial link.

In Altera devices, there are 3 different PCS configurations that are available. They are the standard, the 10G or Enhanced and the PCI Express Gen3 PCS configurations. Some devices contain only 1 configuration, others have all 3. Having multiple configurations allows the support of many more higher speed protocols and interfaces than what can be supported by using a single configuration transceiver as more PCS/PMA functionality can be “hardened” in the FPGA.

Hardening means that a dedicated block is built in silicon to perform the function versus using general FPGA resources to perform the task. Dedicated blocks almost always have better performance when compared to using general FPGA resources to do the same thing.

Starting with the standard PCS, we have the phase compensation FIFO, the byte serializer and the 8B/10b encoder.

This configuration is found in all Altera transceiver devices.

The phase compensation FIFO serves as the interface between the transceiver and your FPGA design logic. It is a shallow FIFO that is used to compensate for phase differences between the FPGA core clock and the transmitter clock. There is a 0 PPM tolerance between the two domains, otherwise you will overflow or underflow the FIFO. But why does the transmitter even need its own clock domain? Why can't it simply use the clock from the FPGA? Well, the transmitter needs to run at very high speeds with specified jitter tolerances. To achieve this, the transmitters use a local PLL that generates all of its necessary clocks, since these jitter tolerances are not directly achievable with the FPGA core clock.

Using the phase compensation FIFO, you make sure that your parallel data gets synchronized into the transmitter's own clock domain.
The FIFO accepts data in various widths from 8 up to 40 bits wide, depending on what is supported by the device family.

In some transceivers, this FIFO can operate in additional modes to accommodate additional protocols like IEEE 1588.

The next block in the transmitter output path is the byte serializer. In each device, the interface between the FPGA logic and the transmitter has a maximum rate at which data can pass through it. This would limit the maximum data rate at which the transmitter could send data. The block serializer allows you to reduce the data rate of this interface while maintaining same line rate by multiplying its input data path by 2. Depending upon the device, transmitters can serialize data in one or two modes: single width and double width. Single-width mode uses an 8 or 10 bit wide data path and double-width uses a 16 or 20 bit wide data path. Not all transceiver devices support double-width mode or using the byte serializer in double-width mode. See your device handbook for more details.

So, if your parallel data needs to run faster than the supported maximum FPGA-transmitter interface rate in order to achieve your desired line rate, then the byte serializer doubles the parallel FPGA-transmitter interface data width so you can provide twice the data at one time. The byte serializer then converts the data back down to single or double width data used by the rest of the transmitter. So, if you want to use single width mode, then your FPGA interface will be 16 or 20 bits and the byte serializer converts it to 8 or 10 bits. If you use double width modes, then the FPGA interface will be 32 or 40 bit data and byte serializer will convert it to 16 or 20 bits. On the output of the byte serializer, the least significant byte is transmitted first.

Here is an example to illustrate why the byte serializer would be used. In the top example, you can see that, with the bye serializer bypassed, in order to achieve a 3.125 Gbps line rate, the FPGA-transmitter interface needs to run at 312.5 MHz which is not allowed. If we double the FPGA-transmitter interface and enable the byte serializer, then the interface needs to run at half the frequency as before or 156.25 MHz.

The next block is the 8B/10B encoder. It is enabled for protocols like PCI Express and Ethernet 10GBASE-R. The encoder takes 8 bit parallel data and 1 bit control code and converts it to 10 bit code groups. The code groups are chosen specifically to ensure there are enough transitions on the serial stream to maintain synchronization with the receiver, allowing the receiver to extract a clock from the transmitted data. The encoder works in both single and double-width modes and can be bypassed if you are using a protocol that does not require it and uses another type of encoding or scrambling. So why use the encoder? This you may wonder since it is true that 8B/10B encoding adds an overhead of 25% per character. To put it another way, 20% of the overall bandwidth is taken up by the encoding method alone. Well, the benefit is that the way the codes have been chosen for each data byte, we can also use 8B/10B to maintain a neutral running disparity. If you are not familiar with that term, running disparity refers to the number of 0’s and 1’s in the transmission. And with 8B/10B encoding, the codes are chosen such that a neutral running disparity can be maintained in the line. This encoding scheme also contains other special encoded characters, or control codes, that can be used for things like line synchronization, for example indicating start and end of packets as well as idle states between packets.

The standard transmitter PCS in Arria 10 transceivers contains an additional block, the PRBS generator block.

The PRBS generator, found in the standard PCS of Arria 10 transceivers, is shared with the Enhanced PCS. The PRBS generator creates a semi-random sequence of numbers that can then be used to test the functionality of the transmitter path. There are 5 patterns that can be generated: PRBS 9, 15, 23 and 31 along with square wave patterns.

PRBS 9 is used to test channels with 8B/10B encoding/decoding scheme. PRBS 15 is mostly used for jitter measurements. PRBS 23 is used for channels that don’t use 8B/10B encoding scheme. PRBS 31 is the recommended pattern for 10GBASE-R, 10BBASE-KR and other applications that use forward error correction (FEC).

In order to support different protocols and higher data rates, the 10G PCS configuration is available in Arria V GZ and the Stratix V devices. Each transceiver in these devices that has the standard PCS with the 10G PCS in parallel to it. So when configuring the transceiver channel, you can select whether to enable the 10G PCS, the standard PCS or even both.

The blocks in the 10G PCS are the transmit FIFO, the frame generator, the CRC-32 generator, the 64B/66B encoder, the scrambler, the disparity generator and the transmitter gearbox. Not all blocks are required for all 10G protocols, so they are enabled and disabled as needed for the particular target protocol implementation.

The first block is the transmitter FIFO. Similar to the phase compensation FIFO, the transmitter FIFO retimes data and corresponding control bits into transmitter clock domain to remove any possible phase offsets. The width of this interface is dependent upon the target protocol.

In some transceivers, the TX FIFO can also operate in register mode. This mode is for CPRI and IEEE 1588 applications that require deterministic latency. In register mode, the TX FIFO incurs one cycle of latency of the parallel low speed PCS clock.

This frame generator is enabled when the transceiver is configured to support the Interlaken protocol. It takes the 64-bit data and control words of the transmitter FIFO along with a 1-bit control signal and encapsulates it to form the Interlaken Meta Frame, by attaching the synchronization, scrambler state and skip control words to the front of the payload and the diagnostic control word to the end of the payload.

Having this operation performed in hardware save resources and allows for faster transmission.

The next block is the CRC-32 generator. This block calculates a CRC checksum on the input and sends the value with the transmitted words which allows transmission errors to be detected at the receiver. Like the Frame Generator, this block is enabled to support the Interlaken protocol. It calculates the CRC across the entire Meta Frame received from the Frame Generator (excluding a few bits like the framing bits) and then embeds the checksum into the appropriate field in the diagnostic word at the end of the Meta Frame.

The next block is the 64B/66B encoder. It is used in 10G BASE-R configuration. As I mentioned before, the 8B/10B protocol is very popular, but because of converting every byte into 10 bits, your link has a built-in 20% overhead. That steals from your total bandwidth.

For higher data rates above 5G, other encoding methods are used such as the 64B/66B encoding scheme. Unlike 8B/10B which works on individual bytes, this encoding method transforms every 64 bits (8 bytes) of data into 66 bits for transmission across the link, thus less overhead is incurred. Still, it allows DC balancing and disparity to be controlled as well as provides transitions so that the embedded clock can be recovered. The output is typically sent to a scrambler to further ensure that the data being sent has enough transitions for clock recovery.

The role of the scrambler is to reduce the effects of electromagnetic interference between channels by scrambling long sequences of 0s and 1s as well as repetitious patterns. Unlike when using 8B/10B encoding, the encoding methods used for the 10G protocols do not prevent a string of up to 64 identical bits from being transmitted in a row. So, the scrambler works by applying a polynomial to the data words to scramble the bit patterns. The method of scrambling and the polynomial used for scrambling are determined by the protocol. The scrambler in the 10G configuration supports both Interlaken and 10GBASE-R.

For Interlaken, the scrambler operates in frame synchronous mode where a synchronizing word-pattern is used in each frame to maintain synchronization between the transmitter and receiver. This unique pattern is always sent as the first word of the Meta Frame and is sent unscrambled. Also, the second word of the Meta Frame sent by the transmitter is the current state of the scrambler polynomial, which is used by the receiver to decode the rest of the Meta Frame. The scrambler state word is also sent unscrambled. Using this method, the receiver always knows where the start of the Meta Frame is and from where to obtain the scrambler state.

For 10GBASE-R, the scrambler operates in self synchronous mode. In this mode, unlike the frame synchronous mode, once the scrambler is synchronized or seeded, it then uses its polynomial to continuously scramble any words after that point (excluding the sync header bits) and is not continuously re-synchronized at regular intervals.

Even with frame synchronous scrambling, it is still possible for the “right” bit pattern to occur in a word that would cause the output of the scrambler to be all zero's or 1's. This again would affect the DC balancing of the transmission. In compliance with the Interlaken protocol, the Disparity Generator monitors the output of the scrambler. If the disparity of the current word is the same as the current running disparity, then the Disparity Generator inverts the transmitted word and inverts bit 66 to indicate it has done so.

The PMA in a 10G configuration has a parallel input that can be 40 or 64 bits wide. Since the PCS can be 66 or 67 bit words, each word must be adapted or re-formated to width of the PMA. This action is done by the Transmitter Gearbox.

Note that the clocking of the gearbox and PCS has to be matched to support the target line rate. In other words, if the data is being removed from the gearbox in 40-bit parallel words and is being placed into the gearbox in 66 or 67-bit words, the clocking of the two sides of the gearbox must be matched so as to:

Eventually not overflow or starve the the gearbox
Maintain the perceived target line rate to the link

These clock rates will be determined and provided by the transceiver configuration tools when building the system.

The transmitter gearbox can also reverse the parallel word so that the default behavior of sending the LSB first changes to send the MSB first, for protocols that require this.

This variation of the 10G PCS, found in Arria 10 transceivers, is called the Enhanced PCS . To the 10G PCS, the Enhanced PCS adds the PRBS and PRP generators and the forward error correction (FEC) block.

The data generators available in Arria 10 transceivers are the PRBS and PRP generators. Both are used to generate data patterns for testing the functionality of the transmitter and its paired receiver.

Since the Enhanced PCS shares the PRBS generator with the standard PCS, its behavior was discussed earlier in standard PCS section. You may use the link of this page if you wish to go back and review that information.

The PRP generator generates various pseudo-random test patterns depending on the initial value loaded into the scrambler, or seed, and the data pattern selected. It is designed specifically for use in 10GBASE-R and IEEE 1588 applications.

You cannot enable both generators at once.

The Arria 10 Enhanced PCS includes a hardened KR-FEC (forward error correction) block. The KR FEC block is made up of 4 sub-blocks: the transcode encoder, the encoder, the scrambler, and the gearbox.

The KR FEC block is part of the Enhanced PCS and is designed in accordance with the 10G-KRFEC and 40G-KRFEC sections of the IEEE 802.3 specification.

Most data transmission systems, such as Ethernet, have minimum requirements for the bit error rate (BER). However, due to channel distortion or noise in the channel, the required BER may not be achievable. In these cases, adding forward error control correction can improve the BER performance of the system by adding redundancy into the transmitted data. This redundancy allows for errors to be detected at the receiver and possibly corrected without having to retransmit the data.

The FEC block is optional and can be bypassed.

For more details on KR FEC block for Arria 10 devices, refer to Arria 10 Transceiver PHY User Guide.

The last PCS configuration is the PCIe Gen3 configuration. It is also found in Arria V GZ devices and all Stratix V devices. It is also in parallel with the standard and 10G PCS’s, so a designer can choose with each transceiver which PCS they need to employ. The blocks in the PCIe Gen3 PCS are the phase compensation FIFO, the encoder, the scrambler and the gearbox.

This table describes the blocks found in the PCIe Gen3 PCS configuration.

The first block is the phase compensation FIFO. Like the FIFOs in the other 2 PCS configurations, this one compensates for phase offsets between the FPGA clock domain and the transmitter path clock domain, ensuring reliable transfers between them.

The encoder block encodes a 128-bit data word according to the PCIe Gen3 specification, converting it into a 130-bit word by appending 2-bit sync header to the MSBs. It also indicates to the scrambler which packets should be scrambled and which should not be scrambled. All data packets get scrambled, while ordered sets, packets used for synchronization or link management, are not.

The scrambler block, like the scrambler in the 10G PCS, ensures there are enough transitions in the outgoing data for the RX PLL to remain locked, by scrambling data words according to the PCIe Gen3 specification using linear feedback shift register.

Lastly, the gearbox adapts the 130-bit data width of the PCS output to the 32bit width of the PMA by converting the data words along with skip characters into 32-bit segments.

The PCIe Gen 3 PCS for an Arria 10 transceiver only has the phase compensation FIFO and the gearbox on the transmitter side. The encoding functionality is implemented by the gearbox. This means a user has to implement the scrambler functionality in the FPGA fabric. Please refer to the Arria 10 Transceiver PHY User Guide for details.

That concludes the blocks found in the transmitter path through the PCS. It is worth mentioning that not all protocols use the hardware blocks as you find them in the Altera transceiver PCS configurations. In fact, you may choose to implement your own custom protocol that is tailored to your exact system and transmission needs.

For example, you may require a different CRC polynomial or generation scheme. Or you may want to use a different method of encoding your transmitted data, whether simpler or more complicated. You may want to use a different method or polynomial for scrambling. Or you may need to frame your packets differently than what is done natively in the PCS.

To implement these alternatives, you would need to bypass the dedicated PCS block and build your block using general FPGA resources. Of course, your alternative processing or data conversion would have to be done before entering the PCS. Building this functionality in general FPGA resources may also impact the performance of your system or link.

And that ends the PCS. So at this point, we have data that is ready to be serialized and presented to the physical medium.
So we enter the transmitter PMA. The transmitter PMA consists of the serializer and the transmitter buffer.

The first block is the serializer. The serializer converts the parallel data, whether scrambled or encoded, to serial data. To do this, the serializer requires two clocks, a low-speed for the parallel side and a high-speed clock for the serial side. These clocks are generated by dedicated clock resources located in or near the transceivers. The data from the serializer is transmitted with the least significant bit transmitted first.

The transmitter buffer is the last block before the serial data exits the device for the link. The transmitter buffer converts the data to a differential I/O standard as supported by the FPGA. It also sets a common-mode voltage and uses features such as programmable pre-emphasis, programmable differential output voltage and programmable internal termination resistance to improve signal integrity. The values for these programmable features and the supported features will be dependent on target device family. See the device handbook for your target device for more detail on these buffer features and their available settings. For Arria 10 devices, refer to the Arria 10 Transceiver PHY User Guide for more details.

The buffer may also contain additional dedicated blocks or features like the receiver detect block or the transmitter output tri-state feature. These blocks and features are for specific applications or protocols, such as PCI Express protocol. Again see your device handbook to see if any of these are available in your target device.

From the transmitter buffer, the data could be driven directly across board traces, a back plane or even a cable or it could go through the 3rd PHY sub-layer, the physical medium dependent layer, or PMD, which would convert the signal to a form based on the transmission medium, such as optical fibers.

This slide shows an example of transmitter buffer pre-emphasis. Pre-emphasis combats the low-pass filter characteristics of transmission lines by boosting the high-frequency components of the data before transmission. Thus, as you can see from the images here, increasing pre-emphasis serves to clean up the signal at the receiver buffer and opens the eye.

As the transceiver PCS is a Hard IP block in the FPGA, it has a maximum data at which it can process data through it. The PMA layer, being simpler in structure, can typically process data at higher data rates. So, to support protocols that require higher data rates than what is supported by the PCS, some transceiver FPGAs support a PMA-Only mode. In this mode, the PCS is disabled and bypassed, so that FPGA logic can drive data directly to the PMA. In this mode, all PCS functionality must be constructed in the FPGA, so additional FPGA resources are required.

This feature is supported in Arria V GT/GZ, Stratix V GX/GS/GT, and Stratix IV GX/GT.

Similar functionality is supported by Arria 10 transceivers in the PCS-Direct mode.

In order to drive the logic in the transmitter data path, multiple clocks are needed. For example, a low-speed parallel clock is needed drive the blocks in the PCS and the parallel input side of the serializer block. While a high-speed serial clock is needed to drive the rest of the PMA. These clocks must be fully synchronous so as not to lose any data during transmission. To generate the synchronous clock signals, PLLs inside the FPGAs are enabled. Depending on the device family, you have several options of PLLs to use in transmission. Some of the PLLs are dedicated or reserved for transceiver use while others are not.

For example, in Stratix IV and Arria II devices, every transceiver block contains a central control block that houses two CMU, or clock management unit, PLLs. These PLLs generate clocks for the transceiver block to which they are associated and for other transceiver blocks. In Cyclone V, Arria V and Stratix V devices, every channel has a PLL that can be configured as a CMU PLL to provide transmitter clocks for the channels around it. Arria 10, Stratix V and Stratix IV devices also have what are called ATX PLLs. These PLLs sit near the transceiver blocks and generate clocks with lower-jitter than the CMU PLLs. The Cyclone IV GX devices have what are called MPLLs which can be used as general PLLs, but can also generate clocks for the transmitter. In some FPGA families, there are general device PLLs and fractional PLLs that can also be used as clock sources for the transmitter channel.

More detail on the clocking behavior can be found in the handbook for your particular target device. For Arria 10 devices, refer to the Arria 10 Transceiver PHY User Guide.

So that concludes our look at the transmitter data path.

We will now take a look at the receiver path.

The receiver path has more to do than the transmitter path, thus it contains more blocks and more complex blocks than the transmitter path. The receiver must first extract the clock from the incoming data. It then uses that clock to sample the incoming data stream. Next it converts the serial data into parallel data and locates the incoming byte boundaries, to which it aligns itself. Finally, it must undo any encoding or scrambling applied to the data to present the data in its original form to the FPGA logic and account for any phase differences between the transmitter clock domain and the receiver clock domain.

After looking at a block diagram of the receiver, we will then see what blocks make up the receiver PMA and then the PCS. We will conclude this section with a quick look at receiver data path clocking.

Here is a simplified block diagram of the receiver. The receiver, like the transmitter, is made of the 2 regions, the PMA and the PCS. Let's look at these blocks following data from the serial link into the FPGA core.

The receiver PMA consists of the receiver buffer, the clock and data recovery unit, or CDR, and the de-serializer.

The receiver buffer converts the differential input signal into a CMOS value for use by the rest of the receiver. To aid in design flexibility, the receiver supports both AC and DC coupling, common mode regeneration and programmable on-chip termination. The receiver buffers also support equalization to aid in signal integrity. With equalization, the receiver compensates for signal degradation due to transmission line losses, for example losses due to the traces of your PC board. Some Altera devices even go a step further with an additional feature called adaptive equalization. With this enabled, the receivers will dynamically choose the best equalization settings based on the input signal. This feature provides for true Plug & Play signal integrity in your design. And like the transmitter, you may also find protocol specific capability in the buffer, such signal detect used for PCI Express.

In some transceivers like those in Arria 10 devices, the transceiver receiver buffer also supports advanced features like continuous time linear equalization (CTLE) and decision feedback equalization (DFE).

The clock and data recovery unit, or CDR, extracts the clock from the serial input data. These clocks can then be used by other blocks in the receiver to sample data. The CDR must first be trained to the correct frequency by an input clock source. This input clock source can be an input I/O pin or the output of PLLs, like the ones used for the transmit path. Once trained, the CDR then tracks the incoming data stream using the transitions in the data signal.

The extracted clock is used by the CDR to generate two output clocks, a high-speed and a low-speed clock. The high-speed clock drives the deserializer and the low-speed clock drives the deserializer and the receiver PCS.

The last block in the PMA is the deserializer. It converts the serial data stream into parallel data, either encoded or scrambled data. The LSB is received first. Remember that all transceiver devices support single-width mode, but select devices support double-width mode.

At this point in the receiver path, we have parallel data coming from the PMA and ready to enter the PCS. Like in the transmit channel, the receive channel may also contain up to 3 PCS configurations: the standard, the 10G or Enhanced and the PCIe Gen3.

There are up to 7 functional blocks in the receiver standard PCS. These are the word aligner, the deskew FIFO, the rate matcher, the 8B/10B decoder, the byte deserializer, the byte re-ordering block and the phase compensation FIFO. This configuration is found in all transceiver devices.

The word aligner uses an alignment pattern to locate byte or word boundaries in the incoming data. Once the alignment pattern has been found, the word aligner can then shift the data to align itself and subsequent data to that boundary, as shown in the diagram. The word aligner has an available synchronization state machine. I will discuss this in a few slides.

The word aligner consists of 4 blocks. The first is the pattern detector. The pattern detector checks the incoming data for a programmable alignment pattern within the current word boundary and signals a flag when it is found. It does not perform any realignment. The aligner block locates the alignment pattern and realigns the word boundary and signals a flag when it has performed realignment. The manual bit slip block allows you to manually shift the word boundary one bit at a time to achieve alignment. And the run length checker looks for a user or protocol defined number of consecutive 1's or 0's on the incoming data and flags when this occurs. As all protocols implement some sort of algorithm to prevent this, whether its encoding or scrambling, this would indicate an error. Which, if any, of these controls you have will determined by your target device.

All transceiver devices have synchronization state machines. The synchronization state machines allow you to control system synchronization, or the synchronization between the transmitter and the receiver and the ends of the link. The state machine uses code groups to determine synchronization. When a predefined number of good code are received, then the state machine indicates synchronization has been established. When a predefined number of bad code groups are received, then synchronization has been lost. These numbers are programmable so you can decide how flexible to make your system. For defined protocols like PCI Express and Ethernet, the number of good and bad code groups to control synchronization is defined in the specification.

Some transceivers have an extra block in the receiver path called the deskew FIFO, or channel aligner. The deskew FIFO is used in bonded, or multi-lane, configurations. For example, in x4 mode as used by the XAUI protocol. It ensures all receiver channels are aligned to each other. To do this, each of the four channels simultaneously transmits special /A/ characters. The deskew FIFO ensures that that /A/ characters appears in the same columns across all four channels. If not, the deskew FIFO will adjust to align all four channels. Once aligned, if misaligned /A/’s are received, the deskew FIFO considers the channels to have fallen out of alignment.

The rate matcher is next. The rate matcher tries to compensate for transmitter and receivers whose clocks are suffering from some PPM difference. The rate matcher uses a FIFO to decouple the transmitter and receiver domain. If the FIFO is reaching underflow or overflow indicating an asynchronous nature in your systems, then it will insert or removes special characters to compensate. Protocols that use rate matchers have defined characters that are used for rate matching, but the Altera devices allow you to program the character you want to use.

Here is an example of rate matcher deletion. In this example, the 8B/10B encoding skip character K28.0 is used. If the rate matcher FIFO is reaching overflow, then it will delete K28.0 characters to prevent it from happening.

The 8B/10B decoder converts the 10 bit code groups back to the original 8bit data plus a single bit control code. Like the encoder in the transmitter data path, it can be bypassed for protocols that do not use it. It can detect incorrect code groups as well as disparity errors. Disparity errors occur when the incoming data pattern is not correctly maintaining neutral disparity.

The decoder is followed by the byte deserializer. Like the byte serializer, the byte deserializer widens the FPGA parallel interface to reduce the FPGA interface clock rate.

The byte deserializer increases the parallel data width but it cannot restore the original striping of the transmitter. For example, what was transmitted in byte position 0 at the transmitter can end up in byte position 1, 2 or 3 at the receiver depending on the parallel data width and the byte position to which the word aligner fixed itself. The byte ordering block can re-order bytes after byte deserialization by detecting a programmable byte ordering pattern. I’ll show an example in the next slide. The byte ordering block cannot be used with rate matching as the rate matcher is deleting bytes that byte ordering block is trying to arrange.

The byte ordering block uses a programmable alignment character to align the bytes into the parallel pattern with which they were originally sent. In the diagram on the right, you see the pattern at the input to the byte ordering block. As you can see, the words have been aligned properly, but the alignment character A is appearing in byte position 2. The byte ordering block, once it locates this alignment character, pads the data with a programmable pad character which delays the alignment character until the byte 0 location, as it was originally sent. Now with byte ordering, you do have to be careful of false synchronization with scrambled codes. Unlike with 8B/10B encoding, alignment characters can appear in your data stream as the result of a scrambled data patterns, and thus your byte ordering block would falsely align to the wrong byte. Encoding, particularly 8B/10B encoding is done such that this is not allowed to happen.

As you can imagine, the byte ordering block cannot be used with rate matching as the rate matcher is adding and deleting bytes while the byte ordering block would be trying to synchronize to them.

And lastly, the receiver phase compensation FIFO is a shallow FIFO that allows you to synchronize your incoming data to a particular clock domain and compensates for ay phase differences. Like the transmitter, the receiver path must be driven by a local PLL. The clock that drives the read side of the FIFO must have a 0 PPM difference with respect to the write side clock. This means that the clocks must be from the same PLL or derived from the same clock source. The transceiver allows you to use the recovered clock for the read side or transmitter PLL clock output. So, for example, if you have multiple receivers running, you can synchronize all of their read side clocks to a single receiver output clock or a single transmitter PLL.

Once through the phase compensation FIFO, the data can be further processed and analyzed by any additional PCS logic or a media access control, or MAC, block implemented in the FPGA core.

The standard receiver PCS in Arria 10 transceivers contains an additional block, the PRBS verifier block.

The PRBS verifier, found in the standard PCS of Arria 10 transceivers, is shared with the Enhanced PCS. It is used to the verify the PRBS pattern generated by the transmitter to ensure proper link operation.

The PRBS verifier supports the same 5 patterns as the PRBS generator in the transmitter path: PRBS 9, 15, 23 and 31 along with square wave patterns

As mentioned earlier, PRBS 9 is used to test channels with 8B/10B encoding/decoding scheme. PRBS 15 is mostly used for jitter measurements. PRBS 23 is used for channels that don’t use 8B/10B encoding scheme. PRBS 31 is the recommended pattern for 10GBASE-R, 10GBASE-KR and other applications that use FEC .

When the 10G PCS is selected in Arria V GZ and all Stratix V devices, there are 9 possible blocks that can be enabled: the receiver gearbox, the block synchronizer, the disparity checker, the descrambler, the frame synchronizer, the 64B/66B decoder, the CRC-32 checker and the receiver FIFO. We’ll take a look at these next.

Like the transmitter, not all blocks are required for all 10G protocols, so they are enabled and disabled as needed for the particular target protocol implementation.

The first block in the receiver PCS is the receiver gearbox. Like transmitter gearbox, it adapts the bit widths of the PMA to the PCS. So, the 40 or 64-bit output of the PMA is adapted to the 66 or 67 bit input to the PCS.

Natively, the PCS receives the LSB first, but the receiver gearbox also has the ability to do bit reversal so that the MSB can be received first.

If your board incorrectly swaps the p and n traces of your differential pair, you can also use the receiver gearbox to compensate by inverting each bit of the incoming signal.

After the gearbox, the data is sent to the block synchronizer. When data is transmitted over a link and read as serial input to the receiver, the receiver would have no idea where in the serial stream data words begin. So, similar to the word aligner in the 8G configuration, the block synchronizer locates the synchronization word and aligns itself and the rest of the PCS to that word.

After the word boundary has been determined, in the Interlaken configuration, the incoming word then passes through the Disparity Checker. Remember that the Disparity Generator may have inverted the word and set bit 66 in order to maintain neutral disparity. So, the disparity checker reads bit 66 and re-inverts the word to return it to its original scrambled form.

The next block is the Descrambler. After being initialized or seeded, the Descrambler then uses the polynomial specified by the protocol to descramble the data to its original unscrambled form. Like the Scrambler in the transmitter, the Descrambler supports self synchronous mode for 10GBASE-R and frame synchronous for Interlaken.

After the Interlaken word is descrambled, the frame synchronizer then synchronizes the receiver channel to the Meta Frame by looking for four correct sync words in four consecutive Meta Frames. Once this occurs, the frame synchronizer signals to the rest of the PCS (and the upper layers of your design) that it synchronization is achieved. If at any time, three incorrect sync words are discovered in three consecutive frames, then the Frame Synchronizer determines that synchronization has been lost and starts this process over again.

After descrambling, when configured for 10GBASE-R, the bit error rate (BER) monitor is enabled. This block counts the number of invalid synchronization headers that occur in 125 us. If more than 16, it asserts a flag to indicate that a high number of bit errors have occurred.

The 64B/66B Decoder converts the 66-bit encoded data back into its original 8-byte wide state with one control flag per byte. It also monitors the bit error flag from the BER monitor and if asserted sends fault codes on into the receiver FIFO and into the FPGA core.

In the Interlaken configuration, after the Frame boundary has been determined by the Frame Synchronizer, the CRC Checker then calculates the CRC on the Meta Frame and compares it against the CRC checksum found in the Meta Frame itself to determine if transmission errors have occurred.

Lastly, the receiver FIFO. The 10G PCS receiver FIFO has multiple modes that can be enabled based on the protocol implementation or the functional requirements.

In receiver phase compensation mode, the FIFO takes the incoming data and retimes it into the FPGA core clock domain to account for any slight phase offsets between the two domains. This is similar to the phase compensation FIFO in the standard PCS.

In clock compensation mode, the FIFO behaves like the rate matcher block in the standard PCS in which it inserts or removes ordered sets to compensate up to ±100 PPM between the link endpoints. This mode is used in Ethernet 10GBASE-R .

In generic mode, the FIFO functions just like a simple FIFO, but provides flags to the FPGA logic so that control logic can monitor the FIFO and manage the data flow, namely FIFO full and empty flags. This mode is used in the Interlaken protocol.

The receive path in Arria 10 transceivers have a variation of the 10G PCS called the Enhanced PCS . The functionality is similar to the 10G PCS except for additional PCS blocks like the PRBS / PRP verifier blocks and the forward error correction (FEC) block.

Corresponding to the PRBS and PRP data generators in the Arria 10 Enhanced PCS transmit path, the Enhanced PCS receive path contains the PRBS and the PRP verifier blocks used to test the functionality of the receiver and its paired transmitter.

The PRBS verifier is shared with the standard PCS and was discussed previously. Use the link on this page if you wish to go back and review that information.

Once block synchronization is achieved, the PRP verifier monitors the descrambled output of the descrambler to confirm the pseudo-random data pattern sent by the transmitter has been received correctly. The PRP verifier was designed specifically for 10GBASE-R and IEEE 1588 applications.

You cannot enable both the PRP and PRBS verifiers at once.

The Enhanced PCS in Arria 10 transceivers has a KR FEC (forward error correction) block used for 10GBASE-KR applications. The KR FEC block is made up of sub-blocks: the block synchronizer, the descrambler, the decoder, the gearbox and the transcode decoder.

As described in the transmit section, forward error correction can improve the BER performance of the system by inserting redundancy and allowing a receiver to detect errors in the transmitted data pattern and possibly correct for them.

The FEC block is optional and can be bypassed.

For more details on the KR FEC block for Arria 10 devices, refer to Arria 10 Transceiver PHY User Guide.

The last PCS configuration, found in Arria V GZ and all Stratix V devices, is again the PCIe Gen3 PCS. It is in parallel to the other two available PCS configurations.

It is made up of the block synchronizer, the rate match FIFO, the decoder, the descrambler and the phase compensation FIFO.

The first block in the PCIe Gen3 PCS is the block synchronizer. The block synchronizer works like a combination of the gearbox and the block synchronizer in the 10G PCS. It adapts between the 32-bit wide PMA output and the 130 bit Gen3 PCS. But then it also locates the non-scrambled bit patterns, known as ordered sets, in the incoming data and aligns the 130 bit data word boundary to them. In PCIe links, ordered sets are used to transfer information between the PHY layers of the endpoints. After receiving a variable length SKP ordered set, the block synchronizer will automatically re-align to the 130-bit word boundary.

The rate match FIFO, like the rate matcher in the standard PCS and the RX FIFO in the 10G PCS, compensates for frequency differences of up to ±300 PPM between the link endpoints by adding or removing SKP ordered sets in the data word as the FIFO underflows or overflows.

The decoder converts the 130-bit data word back to its original 128 bits by removing the 2-bit synchronization header added by the transmitter. It can also report any ordered set or sync header violations. Since ordered sets and sync headers are not scrambled, when the decoder detects them, it signals this to the descrambler so the descrambler does not try to descramble them as it does all other data words.

The descrambler returns the data words back to the non-scrambled original states using a LFSR implementing the PCIe Gen3 polynomial.

The phase compensation FIFO, like the phase compensation FIFO in the standard PCS and the RX FIFO in the 10G PCS, takes the incoming data and retimes it into the FPGA core clock domain to account for any slight phase offsets between the two domains. It is a shallow FIFO that cannot tolerate any frequency differences, so the read and write FIFO clocks must be derived from the same source.

The PCIe Gen 3 PCS for Arria 10 transceivers has only the block synchronizer, rate match FIFO and phase compensation FIFO on the receiver side. A user must implement the descrambler functionality in the FPGA fabric.

Please refer to Arria 10 Transceiver PHY User Guide for details.

Again, the hardware blocks built into the receiver path are there for best performance and to reduce the FPGA resources needed to implement these protocols. But, you can bypass these blocks use FPGA logic to implement alternative versions of many of the blocks we’ve looked at here.

Similar to the PMA-only transmit channel, the embedded transceivers in select devices also support PMA-only receive channels. This allows the PMA to support protocols that require higher data rates than what is supported by its associated PCS. Here, the PCS is disabled and bypassed, so that incoming receive data from the PMA is passed directly to the FPGA core. Again, this means all PCS functionality must be constructed in the FPGA, so additional FPGA resources are required.

This feature is supported in Arria V GT/GZ, Stratix V GX/GS/GT and Stratix IV GX/GT devices.

Similar functionality is supported by Arria 10 transceivers in the PCS-Direct mode.

Like the transmitter, the receiver requires a high-speed serial clock for the PMA and a lower-speed parallel clock for the deserializer and the PCS.
The high-speed serial clock is derived directly from the incoming data by the CDR. This is why an encoding or scrambling method must be used so as to guarantee enough transitions in the incoming data so that a CDR can recover the clock from it.

The lower-speed parallel clock can come from a few sources depending on the mode in which the transceiver is configured. For example, the clock could also come from the receiver CDR or it could come from an associated transmitter PLL.

More detail on the clocking behavior can be found in the handbook for your particular target device.

And that concludes our look at the receiver channel path and the blocks found in the receiver PMA and PCS.

And that concludes the presentation Transceiver Basics.

In summary, you have seen that transceivers use various blocks to enable high-speed serial communication between different systems or devices. These blocks include FIFOs, serializers and deserializers, encoders and decoders and other alignment blocks. There are also transmitter and receiver buffers which can include additional features to improve your signal integrity and link performance.

This ends the recorded presentation. In order to improve future material, the training department at Altera appreciates any feedback you can provide. After you registered for this course, a confirmation email was sent to you containing a link to a short online survey. Please fill out the survey to let us know your thoughts on both the material and the method of delivery.

Thank you.

FINISH

SUBMIT

Title

Title

Title