Skip to main content

Analyzing server 2008 R2 dwp crash dump file

 Reading a crash dump file is far from intuitive and I spent a great deal of the morning learning about debugging. So here is what I did to read the dump file.

First, you need to install the debugging tools from here. Choose the version that corresponds to your architecture. This install will take a long time depending on your network speed. Important is that you include the WinDbg.exe because that is the tool we will be using.

Next, you need to download the symbol files. Note that you can also use the symbol server from Microsoft but it is faster to have a copy of the symbol files on your hard drive. Download them here. Just download them all. And this will also take a long time because the Symbol files are huge.
Next! Open C:\Program Files\Debugging Tools for Windows (x86)\WinDb.exe.





Choose File -> Open -> Symbol File Path


Type: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols like this:



Now press CTRL+D to open the DWP file! Very exciting.



Now, if you enter !analyze -v like this:


And you’ll get more information about the crash. In my case:

Code:
8: kd> !analyze -v*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************


USER_MODE_HEALTH_MONITOR (9e)
One or more critical user mode components failed to satisfy a health check.
Hardware mechanisms such as watchdog timers can detect that basic kernel
services are not executing. However, resource starvation issues, including
memory leaks, lock contention, and scheduling priority misconfiguration,
may block critical user mode components without blocking DPCs or
draining the nonpaged pool.
Kernel components can extend watchdog timer functionality to user mode
by periodically monitoring critical applications. This bugcheck indicates
that a user mode health check failed in a manner such that graceful
shutdown is unlikely to succeed. It restores critical services by
rebooting and/or allowing application failover to other servers.
Arguments:
Arg1: fffffa8038f3ab30, Process that failed to satisfy a health check within the
configured timeout
Arg2: 00000000000004b0, Health monitoring timeout (seconds)
Arg3: 0000000000000000
Arg4: 0000000000000000


Debugging Details:
------------------


PROCESS_OBJECT: fffffa8038f3ab30


CUSTOMER_CRASH_COUNT: 1


DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP


BUGCHECK_STR: 0x9E


PROCESS_NAME: System


CURRENT_IRQL: 2


LAST_CONTROL_TRANSFER: from fffff880030b76a5 to fffff80001a98d00


STACK_TEXT:
fffff880`0253d518 fffff880`030b76a5 : 00000000`0000009e fffffa80`38f3ab30 00000000`000004b0 00000000`00000000 : nt!KeBugCheckEx
fffff880`0253d520 fffff800`01aa4652 : fffff880`0253d600 00000000`00000000 00000000`40800088 00000000`00000001 : netft!NetftWatchdogTimerDpc+0xb9
fffff880`0253d570 fffff800`01aa44f6 : fffff880`030c4100 00000000`03023940 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0x66
fffff880`0253d5e0 fffff800`01aa43de : 00000729`6e09a2ce fffff880`0253dc58 00000000`03023940 fffff880`02517d88 : nt!KiProcessExpiredTimerList+0xc6
fffff880`0253dc30 fffff800`01aa41c7 : 000001c5`99d9f3c1 000001c5`03023940 000001c5`99d9f3fd 00000000`00000040 : nt!KiTimerExpiration+0x1be
fffff880`0253dcd0 fffff800`01a90a2a : fffff880`02515180 fffff880`025202c0 00000000`00000000 fffff880`01368420 : nt!KiRetireDpcList+0x277
fffff880`0253dd80 00000000`00000000 : fffff880`0253e000 fffff880`02538000 fffff880`0253dd40 00000000`00000000 : nt!KiIdleLoop+0x5a


STACK_COMMAND: kb


FOLLOWUP_IP:
netft!NetftWatchdogTimerDpc+b9
fffff880`030b76a5 cc int 3


SYMBOL_STACK_INDEX: 1


SYMBOL_NAME: netft!NetftWatchdogTimerDpc+b9


FOLLOWUP_NAME: MachineOwner


MODULE_NAME: netft


IMAGE_NAME: netft.sys


DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bc48a


FAILURE_BUCKET_ID: X64_0x9E_netft!NetftWatchdogTimerDpc+b9


BUCKET_ID: X64_0x9E_netft!NetftWatchdogTimerDpc+b9


Followup: MachineOwner

---------


Explanation: USER_MODE_HEALTH_MONITOR (9e) is the bug check code I need to investigate. For a complete list of bugcheck codes look here:
http://msdn.microsoft.com/en-us/library/ff542347%28v=VS.85%29.aspx

And now all that is left for me to say is: ‘happy debugging’.
Oh here are some helpful links:
http://blogs.technet.com/b/askcore/archive/2009/06/12/why-is-my-2008-failover-clustering-node-blue-screening-with-a-stop-0x0000009e.aspx

http://blogs.msdn.com/b/ntdebugging/archive/tags/hangs/

Comments

  1. Slots games on Google Play | FilmFileEurope
    Welcome to FilmFileEurope, bj 사이트 the leading source for the 플렉스 홀덤 latest news, 토토 사이트 제작 방법 샤오 미 tips, videos, 365 벳 photos 안전토토사이트 윈윈 and quotes. Visit our website.

    ReplyDelete

Post a Comment

Popular posts from this blog

Integration with vCloud Director failing after NSXT upgrade to 4.1.2.0 certificate expired

  Issue Clarification: after upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services.   Issue Verification: after Upgrade from 3.1.3 to 4.1.2.0 observed certificate to be expired related to various internal services.   Root Cause Identification: >>we confirmed the issue to be related to the below KB NSX alarms indicating certificates have expired or are expiring (94898)   Root Cause Justification:   There are two main factors that can contribute to this behaviour: NSX Managers have many certificates for internal services. In version NSX 3.2.1, Cluster Boot Manager (CBM) service certificates were incorrectly given a validity period of 825 days instead of 100 years. This was corrected to 100 years in NSX 3.2.3. However any environment originally installed on NSX 3.2.1 will have the internal CBM Corfu certs expire after 825 regardless of upgrade to the fixed version or not. On NSX-T 3.2.x interna...

Calculate how much data can be transferred in 24 hours based on link speed in data center

  In case you are planning for migration via DIA or IPVPN link and as example you have 200Mb stable speed so you could calculate using the below formula. (( 200Mb /8)x60x60x24) /1024/1024 = 2TB /per day In case you have different speed you could replace the 200Mb by any rate to calculate as example below. (( 5 00Mb /8)x60x60x24) /1024/1024 =  5.15TB  /per day So approximate each 100Mb would allow around 1TB per day.

Device expanded/shrank messages are reported in the VMkernel log for VMFS-5

    Symptoms A VMFS-5 datastore is no longer visible in vSphere 5 datastores view. A VMFS-5 datastore is no longer mounted in the vSphere 5 datastores view. In the  /var/log/vmkernel.log  file, you see an entry similar to: .. cpu1:44722)WARNING: LVM: 2884: [naa.6006048c7bc7febbf4db26ae0c3263cb:1] Device shrank (actual size 18424453 blocks, stored size 18424507 blocks) A VMFS-5 datastore is mounted in the vSphere 5 datastores view, but in the  /var/log/vmkernel.log  file you see an entry similar to: .. cpu0:44828)LVM: 2891: [naa.6006048c7bc7febbf4db26ae0c3263cb:1] Device expanded (actual size 18424506 blocks, stored size 18422953 blocks)   Purpose This article provides steps to correct the VMFS-5 partition table entry using  partedUtil . For more information see  Using the partedUtil command line utility on ESX and ESXi (1036609) .   Cause The device size discrepancy is caused by an incorrect ending sector for the VMFS-5 partition on the ...