Approach towards "Thrm Trip" events and using Intel® Server System S9200WK Product Family
The following events are reported:
Memory Mem P0D1 Th Trip | Critical Overtemperature
Memory Mem P1D1 Th Trip | Critical Overtemperature
Each S9200WK node has two die per CPU to achieve the max core count:
CPU 0
P0D0 - processor 0, die 0
P0D1 - processor 0, die 1
CPU1
P1D0 - processor 1, die 0
P1D1 - processor 1, die 1
Therefore, the message "Memory Mem P1D1 Th Trip" means Processor 1, Die 1 was overheated and a thermal trip (Th Trip) event occurred.
Note | It does not mean the memory DIMM in D1 slot has is defective. |
Memory (DIMMs) are sensitive to excessive heat and may cause the server to be unstable; therefore, a thermal trip occurs when the internal server temperature is too high.
Check the following and ensure the server node is receiving adequate cooling:
- Ensure there is good airflow.
- Make sure nothing in blocking airflow so that the heat can dissipate from the server.
- Ensure all fans are up and running.
For more information, refer to the following documents:
- Intel® Server System S9200WK Product Family Liquid Cooled Rack Reference Design Guide (See the Thermal Specifications and Requirements section.)
- Technical Product Specification for the Intel® Server System S9200WK Product Family (See the Thermal Management section.)