SEU Mitigation User Guide: Agilex™ 5 FPGAs and SoCs

ID 813649
Date 9/20/2024
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

1.5. Triple Modular Redundancy

Triple modular redundancy (TMR) is an established SEU mitigation technique for improving hardware fault tolerance. Use TMR if your system cannot suffer downtime caused by an SEU.

A TMR design has three identical instances of hardware with a voting hardware at the output. If an SEU affects one of the hardware instances, the voting logic notes the majority output. This operation masks malfunctioning hardware.

With TMR, your design does not suffer downtime in the case of a single SEU:

  • When the system detects a faulty module, the system scrubs the error by reprogramming the module.
  • The error detection and correction time is many orders of magnitude less than the mean time between failures (MTBF) of SEU events.
  • The system can repair a soft interrupt before another SEU affects another instance in the TMR application.

The disadvantage of TMR is that, in addition to voting logic, it requires three times more hardware cost than a non-TMR design. To minimize the hardware cost, implement TMR for only the most critical parts of your design.

You can automate generation of TMR designs by automatically replicating designated functions and synthesizing the required voting logic. For example, Synopsys* offers a tool that automate TMR synthesis.