Stratix® 10 SoC FPGA Boot User Guide

ID 683847
Date 8/23/2024
Public
Document Table of Contents

7.3. Other Debug Considerations

Peripherals

The first step in the board bring-up process is peripheral testing. Add one interface at a time to your design. After a peripheral passes the tests you create for it, remove it from the test design. Avoid leaving peripherals that pass testing in your design as you move to other peripheral tests. Multiple peripherals can create instability due to noise or crosstalk. By testing peripherals in a system individually, you can isolate issues in your design to a particular interface.

A common failure in any system involves memory. The most problematic memory devices operate at high speeds, which can result in timing failures. High performance memory also requires many board traces to transfer data, address, and control signals, which cause failures if they are not routed properly. You can use the Nios® II processor to verify your memory devices using verification software or a debugger such as the Stratix® 10 EMIF Toolkit. The Nios® II processor is not capable of stress testing your memory but you can use it to detect memory address and data line issues.

Data Trace Failure

If your board fabrication facility does not perform bare board testing, you must perform these tests. To detect data trace failures on your memory interface, use a “walking ones" pattern. The "walking ones" pattern shifts a logical 1 through all of the data traces between the FPGA and the memory device.

The pattern can be increasing or decreasing; the important factor is that only one data signal is 1 at any given time. The increasing version of this pattern is as follows: 1, 2, 4, 8, 16, and so on. Using this pattern, you can detect a few issues with the data traces such as short or open circuit signals. A signal is short circuited when it is accidentally connected to another signal. A signal is open circuited when it is accidentally left unconnected. Open circuits can have a random signal behavior unless you connect a pull-up or pull-down resistor to the trace. If you use a pull-up or pull-down resistor, the signal drives a 0 or 1; however, the resistor is weak relative to a signal being driven by the test, so that test value overrides the pull-up or pull-down resistor. To avoid mixing potential address and data trace issues in the same test, test only one address location at a time. To perform the test, write the test value out to memory, and then read it back. After verifying that the two values are equal, proceed to testing the next value in the pattern. If the verification stage detects a variation between the written and read values, a bit failure has occurred. The table below provides an example of the process used to find a data trace failure. It makes the simplifying assumption that sequential data bits are routed consecutively on the PCB.

Table 18.  Data Trace Test ("Walking Ones") Example
Written Value Read Value Failure Detected
00000001 00000001 No failure detected.
00000010 00000000 Error, most likely the second data bit, D[1], is stuck low or shorted to ground.
00000100 00000100

No failure detected, confirmed D[1] is stuck low or shorted to another trace that is not listed in this table.

00001000 00001000 No failure detected.
00010000 00010000 No failure detected.
00100000 01100000 Error, most likely D[6] and D[5 are short circuited.
01000000 01100000 Error, confirmed that D[6] and D[5] are short circuited.
10000000 10000000 No failure detected.

Address Trace Failure

The address trace test is similar to the "walking ones" test used for data with one exception. For this test, you must write to all the test locations before reading back the data. Using address locations that are powers of two, you can quickly verify all the address traces of your circuit board. The address trace test detects the aliasing effects that short or open circuits can have on your memory interface. For this reason, it is important to write to each location with a different data value so that you can detect the address aliasing. You can use increasing numbers such as 1, 2, 3, 4, and so on while you verify the address traces in your system. The table below shows how to use powers of two in the process of finding an address trace failure.

Table 19.  Address Trace Test (Powers of Two) Example
Address Written Value Read Value Failure Detected
00000000 1 1 No failure detected.
00000001 2 2 No failure detected.
00000010 3 1 Error, the second address bit, A[1], is stuck low.
00000100 4 4 No failure detected.
00001000 5 5 No failure detected.
00010000 6 6 No failure detected.
00100000 7 6 Error, A[5] and A[4] are short circuited.
01000000 8 8 No failure detected.
10000000 9 9 No failure detected.
Device Isolation

Using device isolation techniques, you can disable features of devices on your PCB that cause your design to fail. Typically, designers use device isolation for early revisions of the PCB, and then remove these capabilities before shipping the product. Most designs use crystal oscillators or other discrete components to create clock signals for the digital logic. If the clock signal is distorted by noise or jitter, failures may occur. To guard against distorted clocks, you can route alternative clock pins to your FPGA. If you include SMA connectors on your board, you can use an external clock generator to create a clean clock signal. Having an alternative clock source is very useful when debugging clock-related issues.

Sometimes the noise generated by a particular device on your board can cause problems with other devices or interfaces. Having the ability to reduce the noise levels of selected components can help you determine the device that is causing issues in your design. The simplest way to isolate a noisy component is to remove the power source for the device in question. For devices that have a limited number of power pins, if you include 0 ohm resistors in the path between the power source and the pin, you can cut the power to the device by removing the resistor. This strategy is typically not possible with larger devices that contain multiple power source pins connecting directly to a board power plane. Instead of removing the power source from a noisy device, you can often put the device into a reset state by driving the reset pin to an active state. Another option is to simply not exercise the device so that it remains idle. A noisy power supply or ground plane can create signal integrity issues. With the typical voltage swing of digital devices frequently below a single volt, the power supply noise margin of devices on the PCB can be as little as 0.2 volts. Power supply noise can cause digital logic to fail. For this reason, it is important to be able to isolate the power supplies on your board. You can isolate your power supply by using fuses that are removed so that a stable external power supply can be substituted temporarily in your design.

JTAG

FPGAs use the JTAG interface for programming, communication, and verification. Designers frequently connect several components, including FPGAs, discrete processors, and memory devices, communicating with them through a single JTAG chain. Sometimes the JTAG signal is distorted by electrical noise, causing a communication failure for the entire group of devices. To guarantee a stable connection, you must isolate the FPGA under test from the other devices in the same JTAG chain.