Article ID: 000056128 Content Type: Troubleshooting Last Reviewed: 09/24/2021

How to Troubleshoot Multiple Power Supply Unit (PSU) Failures Detected on Intel® Server System S9200WK Family

Environment

Intel® Server System S9200WK Family

osindependentfamily

BUILT IN - ARTICLE INTRO SECOND COMPONENT
Summary

Troubleshooting steps for errors seen in the logs related to to power, power supplies, or fans

Description

Examples of error messages seen in the logs:

  • PSU2, AC lost, AC removed.
  • Non-Redundant, sufficient from insufficient. The system is not running in redundant power supply mode. This event is accompanied by specific power supply error Alternating Current (AC) lost.
  • Non-Redundant, Insufficient. System is not running in redundant power supply mode.
Resolution

Step 1:

  1. Update the BIOS firmware (FW) to the latest version available (Version 22010091 or newer). There were fixes added to the power supply unit (PSU) firmware (FW) and Baseboard Management Controller (BMC) communication. You can refer to the BMC and Field Replaceable Unit and Sensor Data Record (FRUSDR) release notes.
  2. After the BIOS FW has been updated, if there are still PSU issues, follow Step 2 below.

Step 2:

Workaround: Multiple PSU failures detected

  • If you see errors in the logs related to power, power supplies, or fans, note the color of the status Light-emitting diodes (LEDs) and check the sensors to see if the readings are normal or abnormal.
    • The power supplies (PS1, PS2, PS3) should be within normal ranges for Input Power, Curr Out %, Inlet Temp, Temperature, and redundancy (2+1).
  • If the sensor readings look abnormal, perform troubleshooting to see which of the suspect PSUs are actually bad by swapping them around.
    • Does the problem follow the PSU swap?
  • If the sensor readings look normal, but there are power-related errors in the logs, check the Status LEDs.
    •  If the PSUs have amber LEDs on all the time when running heavy workload, there is a workaround. Running the command below should make amber LED go away:

Command: Disable Power Supply Cold Redundancy. ipmitool raw 0x30 0x2d 0x01 0x00

example image

  • If running the command above does not solve the issues reported in the logs, and you have already cross-checked the PSUs (by swapping PSUs around around), but the LED is still  amber, the suspect PSU will need to be replaced. 

False Alarm: Nodes report AC lost

  •  Check for false alarms.
    • If the amber LED does go away, but you still see AC lost error messages in the logs, check to see if the logs show errors logged by the slave node.

example image