Correctable ECC Error threshold reached errors of multiple DIMMs were reported in multiple servers based on Intel® Server Board S2600WF.
ECC error persists even after multiple DIMM replacements.
If an ECC error persists even after multiple DIMM replacements, a complete test is required to isolate DIMM failure versus board DIMM slot failure.
Rearrange the memory to see if the marked DIMM still presents ECCs at other slots. This indicates a damaged or lightly damaged DIMM.
If an ECC error is reported on same DIMM slot but with a different DIMM installed on the DIMM slot, verify if there is any debris/dust in the socket which may cause a fault connection. If there is no debris/dust, it could be a board DIMM slot fault, and the S2600WF board needs to be replaced.
If there is any DIMM of that system with slight or potential failure, it will be detected through the steps below. This process could be slow but can identify a potential issue with an specific DIMM component.
Testing steps:
Now all of the original DIMM slots should be populated and tested. Even the original DIMM slot with the ECC error is populated and tested.