Steps to isolate DIMM failure vs. DIMM slot failure on Intel® Server Board Product Families
Correctable ECC Error threshold reached errors of multiple DIMMs were reported in multiple servers based on Intel® Server Board S2600WF.
ECC error persists even after multiple DIMM replacements.
If an ECC error persists even after multiple DIMM replacements, a complete test is required to isolate DIMM failure versus board DIMM slot failure.
Rearrange the memory to see if the marked DIMM still presents ECCs at other slots. This indicates a damaged or lightly damaged DIMM.
If an ECC error is reported on same DIMM slot but with a different DIMM installed on the DIMM slot, verify if there is any debris/dust in the socket which may cause a fault connection. If there is no debris/dust, it could be a board DIMM slot fault, and the S2600WF board needs to be replaced.
If there is any DIMM of that system with slight or potential failure, it will be detected through the steps below. This process could be slow but can identify a potential issue with an specific DIMM component.
Testing steps:
- Remove all DIMMs.
- Follow the DIMM Population Guidelines section in the Technical Product Specifications for Intel® Server Products and install only 1pc DIMM that didn't present an ECC error in the past.
- Start system with 1pcs DIMMs and run for some time. Check if there is any ECC error.
- Follow same guidelines and install the 2nd DIMM that didn't present an ECC error in the past.
- Start system with 2pcs DIMMs and run for some time. Check if there is any ECC error.
- Follow same guidelines and install the 3rd DIMM that didn't present an ECC error in the past.
- Start system with 3pcs DIMMs and run for some time. Check if there is any ECC error.
- Follow same guidelines and install the 4th DIMM that didn't present an ECC error in the past.
- Start system with 4pcs DIMMs installed and check if there is any ECC error.
- Follow same steps to install one more DIMM each time and start system. Check if there is any ECC error.
- Perform the test until all good DIMMs are populated.
- Follow same steps to install the DIMM that is reporting an ECC error and start system. Check if there is any ECC error.
Now all of the original DIMM slots should be populated and tested. Even the original DIMM slot with the ECC error is populated and tested.