If you need to review any of the prerequisite material, you should look at the existing TimeQuest timing analyzer trainings or some documentation on timing analysis such as the TimeQuest timing analysis chapter of the Quartus II Handbook or the Timequest user guide located on the alter wiki site.
Here are the example wave forms.
The top diagram shows a center aligned clock. The transmitter launches data on the positive edge. A separate PLL output is likely used to shift the clock going to the receive device to achieve the center alignment. The receive device then uses the clock as it is sent to latch the data which should be in the middle of the data valid window.
The bottom diagram shows an edge aligned clock, here the transmitter still launches data on the positive edge but the transmitter sends the clock either is or adjusted so it’s edge aligned with respect to the data, in this case the receive must adjust the clock by 180 degrees to properly clock in the data with the associated clock.
With same edge transfers data being launched on rising edge of the clock gets latched also on the rising edge of the clock and if data is sent on the falling edge of the clock, the receiving devices is expected to also latch in the data on the falling edge of the clock.
With opposite edge transfers data being launched on the rising edge of the clock gets latched by the falling edge of the clock and data sent on the falling edge of the clock gets latched by the launch edge of the clock, though the last case is extremely rare.
Same edge and opposite edge transfers can happen with edge aligned or center aligned interfaces as shown in the diagram. Notice that the clock is adjusted for the various edge to edge transfers so that the actual setup and hold relationship do not change with respect to either same edge or opposite edge transfers.
We first need to create a virtual clock that we’ll reference the input delays to later. This is used to determine the launch edge of the transfer.
Then we’ll create a clock at the input clock IO of the FPGA, this is used to determine the latch edge of the interface.
Lastly we specify the input delay relative to the virtual clock we declared in the first step. The input delay calculations depend on specs for the upstream device available, we will see how we can derive and set the input delays.
SS Input Data will arrive to the FPGA in reference in to this virtual clock. So when we declare input delays the associated launch clock will be the virtual clock.
It is actually not necessary to use a virtual clock to constrain the input delays. You can create
input delay constraints relative to the input clock instead of the virtual clock, but
using a virtual clocks makes the constraining of the interface easier and more accurate.
Using virtual clocks makes derivation of clock uncertain more accurate since off chip to on chip clock domain transfers can now be calculated. Also since the virtual clock is in its own clock domain, it’ll be easier to constrain certain interfaces as well see in the upcoming slide and also it makes it easier to analyze the input paths in timequest since reports can be generated based on the launch virtual clock.
With direct clocking, clock is used as is to capture the data, because of this only center aligned low speed inputs can be implemented with direct clocking as fine tuning of clock to data relationship is not allowed here.
With PLL clocking you’ll be able to implement either center aligned or edge aligned interface.
Higher speed interfaces require PLL resources to control the clock data relationship. The PLL provides compensation over the process, voltage and temp. When using the PLL Configure it in the SS comp. mode. In source-synchronous compensation mode, the clock and data relationship at the FPGA device inputs will be preserved to the best of PLLs ability at the Input Register through precise adjustment of clock to data relationship.
First because of the center alignment, we have the option of using direct clocking only if this is a low speed interface since clock and data in delays will not be exactly the same and the center alignment cannot be precisely preserved at the input register..
First we create a virtual clock, then because the transmitting device has shifted the clock, we specify a clock constraint for the clk_in port with a phase shift of 180 degrees of the clock period. Since in our case the period is 8, the waveform parameter in the second create clock command which specifies the rising edge at 4ns and falling at 8ns denotes the 180 phase shift in relation to the virtual clock and data.
<click>
If you’ve chose to use a PLL in source synchronous mode to better align the clock with data, then in addition to the two create clock commands for the virtual clock and input clock, you’ll also need an additional create generated clock command to generate the clock at the output of the PLL. There will be no phase, multiply, or divide factor for this generated clock command.
With the PLL you could’ve also just used the derive pll clocks command.
Here, We again create a virtual clock that describes the launch clock. We create a clock constraint for the clock input pin that drives the PLL and then create a generated clock constraint for the PLL output which is our latch clock.
Notice the clock in clock we created has the same phase as the virtual clock but the 180 degree phase shift is now specified for the generated clock command for the PLL output tap. Of course when parameterizing the pll using megafunctions, you have to make sure the 180 degree is added there. Again derive pll clocks could’ve been used here as well.
These constraints are derived based on the type of information available to the FPGA designer we’ll look at each of these individually.
The input external device may provide tco max and min numbers and those maybe listed as either relative to its output clock or its input clock.
Or you may have a spec that provides desired setup and hold times of the FPGA.
And finally common to source synchronous transfers, specifications may provide the maximum skew between clock and data at the FPGA input.
When one of these three sets of information, we can derive the proper value to add to our SDC constraint.
Because when we define delays using the set input delay command the delay is in reference to the clock, then the Tco value relative to the output clock is the delay. Here the calculation is made easy
Max delay is simply the maximum tco plus max data trace delay minus the min clock trace delay, to get the aboslute biggest most pessimistic value.
Min delay is minimum tco value plus min data trace delay + minus the maximum clock trace delay, again to get the smallest most pessimistic number.
Remember input delay max is used for setup calculations where the biggest delay contribution to the delay path makes it most likely to fail and input delay min is used for setup calculations where smaller value is most likely to fail.
In my example, I’ve stored the value of tco max and min, data trace max and min and clk trace max and min in tcl variables.
Then I use the tcl set command to create new variables in max delay and in min delay. The expr command in the first two lines is used to do arithmetic calculations in Tcl.
When in max delay and inmin delay calculated I can use the set input delay command referencing the virtual clock with the calculated values and the target being the data_in IO as shown in the 3rd and 4th comamnds in the example SDC code.
In this case device spec lists tco data max and min and tco clk max and min, you will have to do a subtraction of the tco clk from the tco data to calculate the data tco in relation to the output clock.
As with the previous example we’ll need the most pessimistic numbers to calculate the input delays so for max delay we use tco data max subtract by tco clk min adding to it data trace max and subracting clk trace min.
For input delay min, we use the opposite, tco data min subtracting tco clk max adding data trace min and subtracting clk trace max.
The syntax of the input delay command is exactly the same as the previous example.
The max delay is the time between the latch edge and the launch edge minus the setup requirement
The min delay is the hold time subtract by the difference between hold luanch and latch edges.
Once we derive the delays we would use the set input delay max and min commands to input the value into the SDC file as we’ve done in the previous examples.
The input delay is defined as the time it takes from the launch edge to get to the FPGA IO.
Setup requirement is defined as the latest a signal must get to the FPGA IO before the latch edge. Looking at the top diagram the maximum amount of external delay is the distance between the two blue lines which is setup latch minus setup launch which in the center aligned case is period over 2 subtract by the setup requirement.
The input delay min is defined as the fastest a signal can get from the launch edge to the FPGA IO while the hold time requirement is the minimum amount of time the data must remain stable after the latch clock or the minimum delay from the latch clock for the next data.. So the two terms, hold time and min delay has the same meaning but different reference points. With the hold time know we need to convert it in relationship to the launch edge. Since the difference between the edges is period over 2 the input min delay is hold minus period / 2.
In the edge aligned hold case the equation becomes hold subtracted by a period since launch edge is actually a full cycle a head of the latch edge. Again this equation is the same as the center aligned case but shifted by an additional period / 2.
Using the equations provided I calculate my max delay which is period /2 minus Tsu which is 1.7 and my min delay is hold time minus period /2 which is -2.8.
With the variables in max delay and in min delay calculated you would just use the familiar set input delay command to communicate to timequest my maximum and minimum delays in relation to the launch virtual clock edge.
The skew value says the data will arrives within the window formed by the skew value around the clock edge.
In this case the maximum delay is simply the skew value as that’s the latest a data can get to the FPGA after the clock edge.
While the minimum delay in this case would simply be negative skew as that’s the fast the data can arrive and since it’s before the clock edge the value is negative.
With skew specification regardless of data alignment, we would use those values in reference to the virtual clock and set the maximum and minimum skew values for the data inputs.
This table can be used as a cheat sheet as you constrain your interfaces. Here all of the methods of calculating delay values are listed.
In order to properly constrain the interface, follow the methodology here.
First you will create a generated clock at the output clock IO of the FPGA.
Then you will specify output max and min delays for the data out signal in reference to the generated output clock.
Finally, certain exceptions are needed to makes sure the valid timing calculations are done and also to cut paths that don’t need to be analyzed.
The way to tell the timing analyzer to analyze this as part of the clock path is to create a generated clock for the output clock, the source of the generated clock would be the output of the PLL.
Then when creating output delay constraints for dataout, you would use the generated output clock as reference, in order for timequest to determine where the latch edge is.
First, you may choose to use a common data and output clock, this save FPGA device resources but it would only work for edge-aligned low-speed interface where precise clock data alignment is not important. And in this case, the receiving device is required to shift the clock.
A output clock can also be generated by a PLL, because the PLL output phase is adjustable this clock can be used in either edge aligned or center aligned interfaces at the same time it can also be used to adjust the clock phase for precise output clock data alignment.
The third option is to use a DDIO register, DDIO or double data rate IO register are located in most mid to high end FPGAS, they are mean for use in double data rate interfaces but can also be used to align clock and data for source synchronous single data rate interfaces. Here because the same type of dedicate circuitry is used to generate both data and clock you can easily get precise alignment. You can also use DDIO register in conjunction with a PLL.
Here the clock that drives the data register is also routed out to clk out IO. Because you’re limited to using this in low speed applications, this type of interface is probably not too common.
But this interface is simple and does not require PLL resources.
To constrain this interface all you need to do is create the clock that drives the data out register as with the first line of code here.
And then use a generated clock command to create a clock at the output IO of the FPGA with the source of the generated clock being the clock that drives the data register.
In this case the first thing you do is create the clock for the clock in IO this is not shown in this slide.
Then we use a generated clock to create a clock for the first tap of the PLL which is used for the data register. This is shown in the first line of code. Note the source is the input of the PLL.
The second generated clock command here creates the clock for the clock output. Here if you’re using edge aligned interface the phase offset would be 0 while if you choose to use an edge aligned interface the phase alignment would be 180.
Note also that the first two generated clock commands here could have been replaced with the derive pll clocks command.
Lastly, we use a third generated clock command to bring the clock to the output IO, here the generated clock source is the second output tap of the PLL and the destination is the clock out port.
In this example I’m using the PLL to generate the clocks for both the data out register and the clock out DDIO registers. This give me separate control. So my first two generated commands creates the two clocks at the pll output taps. Notice for the second statement I can use a phase shift to generate center aligned clock.
The final create generate clock command declares the clock at the clock out port using the pll output tap as a source.
Note in this case I could have also been able to preserve one of the PLL outputs as you can invert the clock out easily by swapping the VCC and GND connections instead of using the 180 phase shift.
For output interfaces, the data delay may be derived in one of two ways depending on the information available for the receiving device.
First method can be used if the external downstream device provides its setup and hold requirements and board parameters are known this is also known as the system centric view.
The second method is used if a maximum allowed data skew which specify the relationship between the clock and the data is given for the FPGA interface.
In this case, the setup requirement of the external delay contributes to the delay because setup is defined as the minimum time data must be stable before the clock gets to the external device so the setup time is the amount of external delay.
So for maximum delay, we use the Setup requirement added to the data trace max subtracting the clock trace min.
For hold time, the hold time subtracts from the external delay because hold time is defined as the amount of time data must remain stable after the clock, so the delay must come from the FPGA which means the external delay is a negative value.
So the minimum delay would be negative Th plus data trace min minus clock trace max.
Again this is the same equation as common IO interfaces.
Once we derive the value, we input the delay in to the set output delay max and min statements. Notice that it’s very important to set the output delays in reference to the clock out at the output clock IO.
For max delay, the setup relation ship is simply the latch edge minus the launch edge if the total data is less than that then the timing is met. The skew requirement means the amount of delay inside the FPGA must not exceed the skew value. So the maximum delay external to the FPGA must be setup latch minus setup launch minus skew. Depending on center aligned or edge aligned data, we would use different values for latch minus launch.
For min delay, the skew represent the minimum amount of time data must remain stable past the hold time. So the minimum output delay without violating skew at hold is simply skew added by the hold relationship with hold relationship being negative hold launch hold latch
We’ll see diagrams of these equasions on the next few slides.
On the top, for setup skew is the amount of delayed allowed in the FPGA so the external delay is calculated as latch minus launch minus skew which is period /2 minus skew.
For the bottom diagram, here’s our hold diagram, skew is the amount of time after the hold relationship data must remain stable. So in the diagram the minimum exter delay is the distance from the skew edge to the latch edge. Which is skew subtracted by period over 2.
So in this case the maximum external delay from the FPGA IO pin to the latch edge becomes just negative skew
And the minimum output delay becomes skew minus period..
For both of these it’s expected that the latch clock edge will get shifted which is why the equation looks like the way it does.
Using the equations from the previous slides for a center aligned outputs using the skew approach: Given our period is 8 and skew is .7 ns.
The Max delay is 3.3 ns, the max amount of data shift allowed outside of the FPGA to still meet skew setup timing.
The Min delay is -3.3 ns The min amount of time data could be shifted external to the FPGA before the next launch to meet the .7 ns input skew (hold) requirement.
Once we use the tcl variable to calculate the values, we use the set output delay max and min to enter the constraint in SDC and referencing it to the clk_out clock we created at the clock output IO.
Timequest assume all output IOs are used as data and will try to analyze it, if a min and max delay is not specified it will be flagged as an unconstrained path.
When we brought the clk signal out to IO for source synchronous purpose, it also be falsely analyze as data.
Since it’s a clock with no destination register, we’ll need to cut off the data analysis.
Remember the easiest way to do this is to use the set false path command. Here I simply need set false path to with the name of the clk out IO.
When we use set false path here, only the clk out as data is cut off, clk out as part of the clock path for the data out will not be effected.
However with edge aligned interfaces that’s not the setup relationship we want, instead we need the setup relationship to be the same clock cycle as the latch clock will eventually be shifted approximate 180 degrees.
So we need to set a multicycle path in those circumstances. As a review multicycle paths are used to modified setup or hold relationship to a specified edge. For setup the default setup relationship is 1. Therefore with edge-aligned outputs a multi-cycle setup of zero is required to move the timing analysis to the same edge. Multicycle hold is not needed because as a review hold relationships moves with the setup relationshipo.
In the constraints shown notice an offset variable was created. The tx offset variable is used to adjust the PLL so your output clock is exactly edge aligned to the data at the interface compensating for any clock and data path differences.
As long as the offset less than or = zero then the multi-cycle exception is required. If the off set is greater than 0 then multicycle is not needed because the default setup, which is the smallest relationship greater than 0 is the actual setup relationship we want.
As before the offset variable must match the actual offset of the PLL set in the megafunctions.
The DDIO registers have two output register one launches on the rising clock edge one launches on the falling clock edge. Time quest understands the polarity of these registers and will try to analyze both rising and falling edge launches.
However because we have tied the data bus to both inputs, data launched by the falling edge of the register is exactly the same as the data launched by the rising edge of the clock so the falling edge of the clock will never trigger a data change and therefore we do not need to analyze the data launched by the falling edge of the clock.
To resolve this we use the set false path command but use the option fall from the tx data clk the clock that launches the data.
Diagnostic reports will tell you if things make sense. Are constraints ignored? What signals go across clock domains? This is particularly useful because we separated input clocks with the virtual clocks and the output generated clocks from the clocks driving the output data, we can easily see the IO cross crossing this way.
For a quick look at what paths go across the SS interfaces, look at the setup and hold summary reports for the input and output paths. If you want a detailed look at either the inputs or the outputs run the report timing commands.
Finally for the absolute best timing, you can run an analysis of all the outputs to analyze the margins of setup an holds at all corners and then adjust the phase of the output clock to optimize the margins to get equal setup and hold margins to give a wide data valid window.
Notice that the input data delay enter is .7ns this shows in both the data arrival path report as well as the waveform.
Also important is to verify the setup relationship, you can see the setup relationship in the path summary but visually it’s easiest to see in the waveforms. Here you see that the launch edge is precisely half a clock cycle ahead of the latch meaning this is a correct center aligned input.
For this path after the quickest data required time and the slowest data arrival path are analyzed we have a setup slack of 1.394 ns meaning this particular SS interface passes timing.
Here you see that the latch clock is tx clock placed at the output IO. The delay I’ve derive and specified in the SDC file, .8ns in this case, shows up in the data required path. This is because outputs are always analyzed at the FPGA IO so any external delay just gets subtracted from the data required path. This is true for common clock synchronous Ios as well.
When analyzing edge aligned output interfaces make sure the launch and latch relationship match what you expect. In our case we see in the waveform launch and latch is the same clock cycle which coincides with our edge aligned behavior.
This interface passes timing by .014 ns.
If you want to do this, you should analyze output timing for setup and hold at all operating conditions and then take an average of the setup and hold margin then finally based on the difference of the margins, adjust the PLL so the average setup and hold margins are the same. This will mean data valid window is centered.
To make this easy you should write a timing analysis script to accomplish this in one step.
If you would like to learn more about source synchronous interfaces and other timing analysis techniques from an in-person instructor please sign up for the advanced timing analysis instructor-led training.
If you would like to learn more about tcl syntax, see our free introduction to tcl online training class.
In this course we focused on single data rate source synchronous interfaces, if you’d like to learn more about double data rate source synchronous interfaces, please see our follow on training.
There is a great document on constraining source synchronous interfaces on the altera wiki site and finally altera provides app note 433 which documents how to constrain and analyze source synchronous interfaces.