How to Recover from an Internal Error (IERR) for Intel® Server Boards

Documentation

Troubleshooting

000006043

07/17/2023

What am I seeing?

An IERR is a catastrophic error reported by the processor but generally caused by devices outside of the processor core (e.g., memory, PCIe).

  • The processor execution has stalled due typically to an event outside of the processor.
  • This issue is often accompanied by a CATERR event that can be cross-referenced for additional information.

How to fix it:

Follow these steps in order:

  1. Review the System Event Log (SEL) for Error correction code (ECC) events. Defective memory can trigger an IERR.
  2. Review the SEL for any PCIe events. Malfunctioning PCIe devices can trigger an IERR.
  3. Ensure that Operating System (OS) drivers are up to date for the server as well as for any recently added hardware devices. Out-of-date OS drivers can trigger an IERR.
  4. Check the OS logs for any Machine Check Architecture (MCA) entries that may indicate a hardware fault that could have triggered the IERR. 
  5. Confirm that you have the latest BIOS for the server system.
  6. Go to Baseboard Management Controller Web Console > Configuration > Memory Configuration > PPR Type and set PPR settings to Hard.
  7. If the logs confirm that there is a specific memory module(s) that can be causing the issue, proceed to reseat the memory stick(s) and monitor the server for 24 hours.

 

Related topics
My server crashes and shows this error: Processor CPU Machine Chk
For firmware updates and troubleshooting tips
System Event Log Troubleshooting Guides for Intel® Server Boards