Saturday, 19 September 2015

BSOD REGISTRY ERROR 51

BSOD REGISTRY ERROR 51

It wasn't the way that I expected to spend my day, but things don't always turn out the way that you expect. I had just received a call that one of our production Terminal Servers was rebooting, and to make matters worse it was the server that could handle the majority of our workload. I logged unto the server and as I went  through the System Event log I saw the message:

"The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000051 (0x00000004, 0x00000001, 0xe7079d70, 0x00000238). A dump was saved in: C:\WINDOWS\MEMORY.DMP."



Great!  What the heck was causing the fatal Blue Screen of Death (BSOD) on the Terminal Server?
I asked the Help Desk Team to assign the users to other Terminal Servers while I tried to figure out what was causing the problem.

I figured that the quickest way to resolve the problem would be to use the Debugging Tools for Windows to analyze the memory dump that was saved at C:\Windows\Memory.dmp.

I copied the Debugging Tools for Windows to the Terminal Sever and used it to open the memory dump that I had copied to the D:\MEMDUMP folder as shown.

Opening the memory dump in Windbg.


The next thing that I did was to type !analyze -v to get the detailed information on what caused the Terminal Server to crash. The output of  !analyze -v follows:

4: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

REGISTRY_ERROR (51)
Something has gone badly wrong with the registry.  If a kernel debugger
is available, get a stack trace. It can also indicate that the registry got
an I/O error while trying to read one of its files, so it can be caused by
hardware problems or filesystem corruption.
It may occur due to a failure in a refresh operation, which is used only
in by the security system, and then only when resource limits are encountered.
Arguments:
Arg1: 00000004, (reserved)
Arg2: 00000001, (reserved)
Arg3: e16db960, depends on where Windows bugchecked, may be pointer to hive
Arg4: 00000238, depends on where Windows bugchecked, may be return code of
        HvCheckHive if the hive is corrupt.

Debugging Details:
------------------
DEFAULT_BUCKET_ID:  DRIVER_FAULT
BUGCHECK_STR:  0x51
PROCESS_NAME:  msiexec.exe
CURRENT_IRQL:  0
LAST_CONTROL_TRANSFER:  from 808c2475 to 80827f7d
STACK_TEXT: 
f167a9c4 808c2475 00000051 00000004 00000001 nt!KeBugCheckEx+0x1b
f167a9e8 808c4abd 00000008 00000238 00000000 nt!CmpAssignSecurityToKcb+0x61
f167aa18 808d6614 e2589008 0573f020 c2e40024 nt!CmpCreateKeyControlBlock+0x2b5
f167aa7c 808d7e43 e2589008 0573f020 c2e40024 nt!CmpDoOpen+0x284
f167ab90 80939aa1 e55912e0 e55912e0 910f5e28 nt!CmpParseKey+0x53d
f167ac10 80936066 000001b8 f167ac50 00000040 nt!ObpLookupObjectName+0x11f
f167ac64 808dbad9 00000000 926e5548 00000001 nt!ObOpenObjectByName+0xea
f167ad50 8088b658 00bee984 0003001f 00bee990 nt!NtOpenKey+0x1ad
f167ad50 7c82845c 00bee984 0003001f 00bee990 nt!KiSystemServicePostCall
00bee94c 7c8271b9 71c2b4a3 00bee984 0003001f ntdll!KiFastSystemCallRet
00bee950 71c2b4a3 00bee984 0003001f 00bee990 ntdll!NtOpenKey+0xc
00bee968 71c2b74d 00000000 00beeaf4 00beea3c TSAPPCMP!KeyNode::Open+0x17
00bee9c0 71c2b75e 00bee9dc 00000090 00000000 TSAPPCMP!KeyNode::EnumerateAndDeleteSubKeys+0x65
00beea20 71c2b75e 00beea3c 00000026 00000000 TSAPPCMP!KeyNode::EnumerateAndDeleteSubKeys+0x76
00beea80 71c2b75e 00beea9c 00000026 7c812ce2 TSAPPCMP!KeyNode::EnumerateAndDeleteSubKeys+0x76
00beeae0 71c2b9d3 00beeb10 0000002e 00000000 TSAPPCMP!KeyNode::EnumerateAndDeleteSubKeys+0x76
00beeb04 71c2a1c3 7c82b2eb 00000000 00000000 TSAPPCMP!KeyNode::DeleteSubKeys+0x35
00beeb50 71c2a499 00beed70 00000002 000d1c58 TSAPPCMP!DeleteReferenceHive+0x5e
00beef7c 42cbc5dd 000cb850 000e5cd8 00000001 TSAPPCMP!TermServPrepareAppInstallDueMSI+0xb9
00beef9c 42cdf01c 00000001 000e8378 000cb850 msi!CMsiEngine::OpenHydraRegistryWindow+0xea
00bef070 42e1734b 000cb850 000d34bc 000cb850 msi!CMsiEngine::BeginTransaction+0x74d
00bef0b0 42d69c4d 00000001 000cb964 000cb850 msi!InstallInitialize+0x367
00bef304 42d6c2cb 000d34bc 42c516b0 000cb850 msi!CMsiEngine::FindAndRunAction+0x86
00bef334 42d6bc56 000cb850 000d34bc 000e0e00 msi!CMsiEngine::DoAction+0x256
00bef4e0 42e13ea8 000cb850 42e13ff0 000cb850 msi!CMsiEngine::Sequence+0x42e
00bef510 42e13fc5 000cb850 42c9a488 42e13fcc msi!RunUIOrExecuteSequence+0x1e7
00bef528 42d69c4d 000cb850 000cb964 000cb850 msi!Install+0x1c
00bef77c 42d6c2cb 42d6c410 42c516b0 000cb850 msi!CMsiEngine::FindAndRunAction+0x86
00bef7ac 42c90b6f 010e0ea8 42d6c410 00000000 msi!CMsiEngine::DoAction+0x256
00beff68 42c6d9cc 00000001 000a39b0 00000000 msi!CreateAndRunEngine+0x4704
00beffb8 77e6484f 00a0f4d0 00000000 00000000 msi!MsiUIMessageContext::MainEngineThread+0x2b
00beffec 00000000 42c6d9a1 00a0f4d0 00000000 kernel32!BaseThreadStart+0x34

STACK_COMMAND:  kb
FOLLOWUP_IP:
nt!CmpAssignSecurityToKcb+61
808c2475 807d1000        cmp     byte ptr [ebp+10h],0

SYMBOL_STACK_INDEX:  1
SYMBOL_NAME:  nt!CmpAssignSecurityToKcb+61
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: nt
IMAGE_NAME:  ntkrpamp.exe
DEBUG_FLR_IMAGE_TIMESTAMP:  54bc53b3
FAILURE_BUCKET_ID:  0x51_nt!CmpAssignSecurityToKcb+61
BUCKET_ID:  0x51_nt!CmpAssignSecurityToKcb+61
Followup: MachineOwner
---------

In essence the output was showing that the crash was being caused when the application msiexec.exe was accessing the registry and it was quite likely happening because of the corruption of a registry key. 

The next step that I took was to run some hardware diagnostic utilities on the server to ensure that the registry problem was not being caused by faulty computer hardware. Those hardware diagnostics results did not show a hardware problem.

Once I had brought the Terminal Server online, after completing the hardware diagnostics, I decide to go through the memory dump once more. I wanted to find out which area of the registry was being accessed  when the crash took place. 

After opening up the memory dump in Windbg.exe I typed the command "!process 0 7 msiexec.exe" . I have provided the relevant part of the output below and it shows that the process 90f75d88 with a process id of 584 was the cause of the crash. The stack trace for this process is the same as the one that I got when I typed !analyze -v.

Output of !process 0 7 msiexec.exe


After that I used the !handle command to see which registry keys were being accessed by process 90f75d88. I typed "!handle 0 7 90f75d88 Key". The relevant part of the output shows that the process was accessing printer registry keys located at HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Terminal Server\Install.

024c: Object: e665ea68  GrantedAccess: 0003001f Entry: e7e0d498
Object: e665ea68  Type: (926e5548) Key
    ObjectHeader: e665ea50 (old version)
        HandleCount: 1  PointerCount: 1
        Directory Object: 00000000  Name: \REGISTRY\MACHINE\SOFTWARE\MICROSOFT\WINDOWS NT\CURRENTVERSION\TERMINAL SERVER\INSTALL\REFHIVE\HEWLETT-PACKARD\DEMFILEDATA

025c: Object: e6b3c0f8  GrantedAccess: 0003001f Entry: e7e0d4b8
Object: e6b3c0f8  Type: (926e5548) Key
    ObjectHeader: e6b3c0e0 (old version)
        HandleCount: 1  PointerCount: 1
        Directory Object: 00000000  Name: \REGISTRY\MACHINE\SOFTWARE\MICROSOFT\WINDOWS NT\CURRENTVERSION\TERMINAL SERVER\INSTALL\REFHIVE

To confirm that what I was seeing was correct I used the standalone version of volatility to see what handles were opened by process id 584 as shown in the table below.

Offset(V)
Pid
Handle
Access
Type
Details
0xe665ea68
584
0x24c
0x3001f
Key
MACHINE\SOFTWARE\MICROSOFT\WINDOWS NT\CURRENTVERSION\TERMINAL SERVER\INSTALL\REFHIVE\HEWLETT-PACKARD\DEMFILEDATA
0xe6b3c0f8
584
0x25c
0x3001f
Key
MACHINE\SOFTWARE\MICROSOFT\WINDOWS NT\CURRENTVERSION\TERMINAL SERVER\INSTALL\REFHIVE

The output of both of the tools convinced me that I was on the right path. 

A check on Google turned up a few articles here and here  that showed that problems arose in a Terminal Server environment whenever msiexec.exe was enumerating these printer registry keys. The difference here was that those articles indicated that the Terminal Server environment was slowing down while in my case the Terminal Server was crashing.

I opened up the registry editor and I navigated to the registry keys that were the cause of the problem. I then attempted to delete the keys as outlined in the articles and the Terminal Server crashed. Bingo! I was on the right path. After the Terminal Server came back online I tried to take ownership of the keys so that I could delete them, and once again the server crashed.

It seems that the only way that I could delete the registry keys was to use an offline method to access the registry. I used the Windows installation CD to boot up the Terminal Server and used the system recovery options to startup regedit.exe. I loaded the software hive from  the C: drive and deleted the corrupt printer registry keys. Once I had done that I unloaded the registry hive and rebooted the Terminal Server. When the Terminal Server came back online I logged back in, started up regedit.exe, and sure enough those registry keys were gone.

I asked the Help Desk to put the affected users back on the Terminal Server and so far there have been no further crashes.

So what was the cause of the problem? It seems that whenever the users logged unto the Terminal Server environment a msiexec.exe process will run and attempt to install printer drivers. As a part of the installation process msiexec.exe will attempt to enumerate the printer registry keys and once it tried to access the corrupt area of the printer registry the Terminal Server would reboot.

In the earlier part of my career if I had this problem I would have little choice but to rebuild the Terminal Server, because I would not have been able to find out what was the cause of the BSOD. Now I can use the Windows Debugging Tools to analyze a memory dump and find out what is going wrong with the server.