Article ID: 000038234 Content Type: Error Messages Last Reviewed: 05/30/2023

What is Memory Error Correction Code (ECC) Correctable Error Event?

Environment

Intel® Server Board Product Family and Intel® Server System Product Family

BUILT IN - ARTICLE INTRO SECOND COMPONENT
Summary

A guide to memory ECC correctable error and when it triggers an event

Description

Steps to follow when dealing with ECC correctable error event logged in System Event Log (SEL)

Resolution

ECC correctable errors represent a threshold overflow for a given Dual In-line Memory Modules (DIMM) within a given timeframe.

  • if there is no catastrophic issue (Purple Screen Of Death (PSOD) or unexpected restart), and the correctable ECC error including Adaptive Double Device Data Correction (ADDDC) error that is less than 10 events within every 24 hours for each DIMM location is within threshold limit, so the recommendation is to monitor for any recurrence of ECC error each DIMM location that triggers the event
  • If there is a catastrophic issue (Purple Screen Of Death (PSOD) or unexpected restart), and the correctable ECC error including Adaptive Double Device Data Correction (ADDDC) error that are more than 10 events within every 24 hours for each DIMM location, it is recommended to re-seat each DIMM location by following the steps below:
    1. Power OFF the system and remove the AC power cable
    2. Identify the DIMM location to re-seat, refer to Technical Product Specifications for your server platform to identify DIMM location
    3. Perform the re-seat of identified DIMM(s)
    4. Insert AC power cable and power ON the system
    5. Observe for 24 hours for any recurrence of ECC error
    6. If the ECC error persists with the same DIMM location that was re-seated, then generate and send SEL and Debug logs, both generated from the BMC Web Console, to Intel Customer Support
Notes

The Error Correction Code (ECC) errors are self-correcting. Depending on the Reliability Availability Serviceability (RAS) configuration of the memory, the Integrated Memory Controller (IMC) may take the affected DIMM offline.

For different Intel server platforms, there are some differences in their event definition, refer to System Event Log Troubleshooting Guide for your server platform

Intel recommends to download and update the system BIOS to the latest available version for your server platform.

If the system is an Intel® Data Center Systems certified for Nutanix* Enterprise Cloud Platform, visit the Nutanix* Life Cycle Manager page. For a list of hardware and firmware compatibility, visit the Nutanix* Hardware and Firmware compatibility page.

Additional information

 

 

 

Related Products

This article applies to 154 products

Intel® Server System D50TNP1MHCRAC Compute Module
Intel® Server System D50TNP1MHCRLC Compute Module
Intel® Server System D50TNP1MHEVAC Compute Module
Intel® Server System D50TNP2MFALAC Acceleration Module
Intel® Server System D50TNP2MHSTAC Storage Module
Intel® Server System D50TNP2MHSVAC Management Module
Intel® Compute Module HNS2600BPB
Intel® Compute Module HNS2600BPB24
Intel® Compute Module HNS2600BPB24R
Intel® Compute Module HNS2600BPBLC
Intel® Compute Module HNS2600BPBLC24
Intel® Compute Module HNS2600BPBLC24R
Intel® Compute Module HNS2600BPBLCR
Intel® Compute Module HNS2600BPQ
Intel® Compute Module HNS2600BPQ24
Intel® Compute Module HNS2600BPQ24R
Intel® Compute Module HNS2600BPQR
Intel® Compute Module HNS2600BPS
Intel® Compute Module HNS2600BPS24
Intel® Compute Module HNS2600BPS24R
Intel® Compute Module HNS2600BPSR
Intel® Server Board S2600STK
Intel® Server Board S2600STS