Introduction
Windows* black screen hangs and crashes are difficult to debug since the system doesn't display any status or debug information, and frequently regular WinDbg* connection methods are not usable. Intel® Debug Extensions for WinDbg* included with Intel® System Debugger can help you with the debug processes, providing a debug connection method to an otherwise unresponsive Windows* target.
This article shows how to use Intel® Debug Extensions for WinDbg* to analyze black screen hangs and crashes. It is assumed that you are familiar with Intel® DCI debug, and that you have installed Intel® System Debugger and WinDbg* on your host system, enabled DCI on your target system, and connected your host system to your target using a supported DCI method, such as Intel® SVT DCI DbC cable, or Intel® SVT Closed Chassis Adapter. If you are not familiar Intel® System Debugger, please review Intel® System Debugger User Guide.
Loading Debug Symbols
The first step for the debugging using Intel® Debug Extensions for WinDbg* is loading the debug symbols. This process takes a bit of time since the debugger has to enumerate all modules and to download PDB files from Microsoft* server. To see additional details about downloading status run .sym noisy before .reload /f command. The status word BUSY in the left down corner indicates that the command is still executing. Once the symbols are loaded, WinDbg will show kd> command prompt. At this point you can run lm command to see the modules list:
Issue Analysis
When symbols are loaded, the stack trace become more informative, and you can analyze the current state of each processor core using ~<number> to switch cores.
There are several possible black screen causes or hardware related BSODs:
- Dead loops and deadlocks
- Kernel Debug transport configuration issues
- Memory corruption issues, invalid opcodes in key Windows processes
- Bug Check 0x124: WHEA, NMI interrupt, Machine Check
A common method for root causing Windows* issues is to use !analyze -v extension command. This extension performs a tremendous amount of automated analysis. The results of this analysis are displayed in the Debugger Command window.
In case !analyze command fails with “The debuggee is ready to run” message, you may want to force the analysis to take place as if a crash had occurred. Use !analyze -v -f to accomplish this task.
Dead Loops and Deadlocks in Windows*
Let’s start from with possible hang due to pure software issues. Fortunately, Windows* comes with an embedded Driver Verifier tool, that can profile spinlocks. Once deadlocks profiling is enabled, the tool will produce verbose information for a lock state in a crashdump. When the debug connection is established, the !deadlock extension can be used in conjunction with Driver Verifier to detect inconsistent use of locks in your code that have the potential to cause deadlocks.
The Driver Verifier doesn't support APC level locks: mutexes(fast, guard) and resources. These locks can be analyzed using !analyze -hang and !locks commands. If needed !thread extension command can be used to obtain the thread information.
For example, here is typical output of !locks command:
Dead loops can be identified by looking at the instruction pointer, stack, using breakpoints or step by step execution using Step Over command.
Kernel Debug Transport Configuration Issues
A software trap combined with a misconfiguration of the debug transport methods might cause Windows* to wait for the kernel debugger to connect instead of generating BSOD and a crashdump, giving an appearance of unresponsive black screen hang.
Here are some examples of such configuration issues:
- Network Kernel Debugging is configured, but a supported NIC is not installed in the system
- Kernel-Mode Debugging over a 1394 (Firewire) Cable is configured, but Firewire controller is not installed in the system
- Kernel-Mode USB Debugging is configured, but it conflicts with Intel© DCI
This might happen when you are debugging a difficult to reproduce issue, and in this case it is important to collect the debug data.
When Windows* is waiting for the kernel debugger connection, there will be at least one thread with TrapFrame. Stack would look like this:
In this case you can restore register context from the trap information using .trap [Address] command. For example:
The following steps of analysis depend on the type of issue. Search MSDN for the exception type. Generate minidump after restoring trap. Run !analyze -f -v command to further analyze the crash.
Memory Corruption Issues and Invalid Opcodes in Key Windows Processes
The most complex issues to debug are memory corruption issues. This this case the system might crash with seemingly random errors and because of corrupted PTE (Page Table) crash dumps might not contain useful information. The recommendation is to is to run Driver Verifier on all non-Microsoft drivers. If it doesn't find any violations run !chkimg, and if memory corruption happens in a non-writable area, protected by NX bit, it might be caused by a BIOS issue, a memory controller issue, or a malware.
Bug Check 0x124: WHEA, NMI Interrupt, Machine Check
The most interesting to analyze are the hardware issues that lead to unrecoverable errors. In this case it is not guaranteed that system will fail with the 0x124 error. It also might not be able to successfully write the crashdump to the disk. The system might freeze after the second NMI interupt, but before the BSOD screen is shown. In such case first run !analyze -v to confirm that the issue is uncorrectable HW error. Next run !whea and !errrec extensions to obtain the crash details. Here is an example:
And
These two commands contain enough information to find the actual problem. In this example, PCI Express (PCIe) advanced error reporting structure provided(errors marked red) for device 8086:a296 (South Bridge). From the PCIe documentation, it appears that 0x124 BSOD is triggered by the “Data Link Protocol Error” UCE. The further analysis could be done by the PCIe team.
Conclusion
While debugging Windows* black screen hangs and crashes is a difficult task, Intel® Debug Extensions for WinDbg* included with Intel® System Debugger simplifies the debug process by providing Intel© DCI connection method to otherwise unresponsive Windows* target.
"