Agilex™ 5 ES Device Errata and User Guidelines

ID 825514
Date 12/09/2024
Public
Document Table of Contents

3.6.1. 2384374: Failure to forward highest priority interrupt

Description

The Distributor (GICD) has a buffer where it stores the next SET or CLEAR packet to be sent to any CPU.

If the GIC is waiting to send a message to a CPU, re-evaluates that CPU and finds a new higher priority message then it fails to update the request and assumes the new packet has been sent.

There are two possible failure modes:

  • Part 1: If the packet to be sent is a SET packet then a higher priority SET may not be sent when it should be until an unblocking event occurs.
  • Part 2: If the packet is a CLEAR (caused by an SPI recall) and the same interrupt is released from the CPU then a further SET packet may be delayed until an unblocking event occurs.
Note: Relevant releases can only be generated if the corresponding cpu group enable is being disabled in the CPU.

Unblocking events are any of the following:

  1. Toggling any GICR_CTLR.DPG<x> bit of the impacted CPU
  2. Toggling any cpu_group_enable bit of the impacted CPU
  3. Toggling any gicd_group_enable bit
  4. Activate/Release of the outstanding interrupt on the impacted CPU
  5. Another interrupt arrives that is accepted as one of the top 5 (3 if no LPI) interrupts for the impacted CPU

Conditions

Interrupt in the following description refers to LPI (if configured) or SPI.

Part 1: (SET)

  1. A CPU has no SPI or LPI sent to it. Note this can occur after the activation of an interrupt. It does not mean there are no interrupts pending for a CPU in the GIC.
  2. The GIC identifies a new interrupt (A) and generates a SET packet.
  3. A new higher priority interrupts (B) arrives so the GIC attempts to re-evaluate the CPU but fails to send the new interrupt as the previous SET packet has not yet been sent.
  4. Interrupt (B) is not sent until and unblocking event occurs.

Part 2: (CLEAR)

  1. A CPU has an SPI sent to it and no other pending interrupts identified.
  2. The SPI is recalled as it becomes disabled or non-pending causing a CLEAR packet to be created.
  3. The corresponding CPU group enabled is cleared causing SPI to be released.
  4. The GIC identifies a new interrupt (C) for the same CPU and attempts to send it but fails as the CLEAR packet has not yet been issued.
  5. Interrupt (C) is not sent until an unblocking event occurs.

Impact

Part 1 (SET)

If the Priority Mask Register (PMR) is not being used then this appears as a temporary miss-prioritisation with no real impact.

However, if the PMR is being used to the first SET while waiting for the second the system may hang as the second SET will not necessarily be delivered in a finite time.

In cases where an OS does use the PMR it is expected that the value will ultimately be relaxed to allow all interrupts to be serviced which would allow servicing of any stalled interrupts to continue. This is the case in upstream Linux which only uses PMR to create pseudo-NMIs for profiling purposes.

Part 2 (CLEAR)

If the interrupt being targeted by the CLEAR is released from the CPU before the CLEAR is sent to the CPU, then a subsequent SET packet may not be delivered in a finite time.

Workaround

Part 1 (SET)

If not using the PMR then no Workaround is expected to be required.

However, if PMR functionality must be used then either:

  • Periodically toggle GICR_CTLR.DPG<x> to ensure that interrupts are delivered. The required frequency will be a function of system interrupt latency tolerance.

OR

  • If not using LPIs, then use GICD_I(S|C)ENABLERn to model PMR functionality by disabling interrupts that would otherwise be masked. Note there is no need to poll GICD_CTLR.RWP in this case.

Part 2 (CLEAR)

SW should issue a DSB and toggle GICR_CTLR.DPG<x> after clearing the corresponding cpu group enable.

Category

Category B