How to diagnose a bad video card?

Discussion in 'Hardware' started by solo_voyager, Mar 31, 2006.

  1. solo_voyager

    solo_voyager Private E-2

    I have an ATI Radeon X600 Pro video card in my PC. I have had it for ~6 mos. I am experiencing random crashes that always begin with the monitor shutting off while the sound continues for a short while. After a short interval the PC will reboot. Most recently the monitor's shutting off was also accompanied by a crackling static sound from the speakers. The crashes are happening more and more often.

    I have gone completely through my system several times and am convinced the problem is not caused by malware of any kind.

    I finally became convinced this is probably a video card problem when I installed a fairly demanding flight sim and the incessant crashes made it dificult to run the sim. A question to a sim forum brought answers blaming the video card.

    So, my question:

    How do I determine that the card is the cause of the trouble and not something else?

    Thx for any direction on this.
     
  2. Steeev

    Steeev Corporal

    One way would be to try it in a mate's computer.

    It could be a number of things, not enough cooling/airflow, causing overheating and shutdown, or leaking capacitors on the motherboard are two that spring to mind.
    You could try downloading and running a program such as Motherboard Monitor or Speedfan to show your CPU and system temperatures, and reading some previous threads on acceptable temperatures, and have a good look at the motherboard for blown and leaking capacitors.

    Here's an article about bad capacitors:
    http://www.pcstats.com/articleview.cfm?articleID=195
     
  3. Bold Eagle

    Bold Eagle MajorGeek

    Well look in your device manager and see if video card is giving any indication! Check your Event monitor and sus out recent yellow!

    Have you updated your hardware drivers (e.g. Catalyst v6.3) or gone to the games website and updated "patches".

    Provide a bit more info if you want quality advice.
     
  4. solo_voyager

    solo_voyager Private E-2

    OK, here is the situation:
    Last June [6-05] I did my first DIY "build-a-PC-project". Here's what is in it:

    P4 630 3.0MHz
    ASUS P5AD2-E Premium
    Kingston KVR533D2N4K2/1G [2x512]
    ATI RADEON X600 Pro
    Creative SB AUDIGY2 ZS
    Seagate 7200.7 NCQ 160GB [x2]
    ANTEC NeoPower 480

    Then, in November [11-05], because I was running short on RAM and storage, plus worried about overheating, I added:
    Kingston KVR533D2N4K2/1G [2x512]
    Seagate 7200.8 NCQ 250GB
    Inlet fan to case front.
    The new HDD is installed as an external HDD.
    [I tried to put the above as a signature. I not sure it has taken. So, I’ve added here, just in case.]

    As near as I can remember, I had no problems with crashing prior to these last additions. I could be wrong, though. After thinking about it again, I am also open to the idea that it may be a MB problem. There have been times when the BIOS has reset to the default settings during the reboot after a crash, although not recently. RAM may be the culprit, but I don’t think so.

    I have had ASUS AiBooster, ASUS PCProbe, SpeedFan, SiSoft Sandra, Everest and several other hardware utilities installed at one time or another through the “try to tune it up” phase. All has gone well except for trying to get the Intel SATA AHCI installed to enable NCQ for the HDDs. That has proven to be too much trouble. Maybe at a later date I’ll try again.
    In response to your posts above I’ve reinstalled and/or updated all of the above utilities again. Temps and fan speeds seem to be OK.
    Per PCProbe and FanSpeed:

    CPU fan [1] = 3550 – 3570 rpm
    Case inlet fan [2] = 1170 – 1180 rpm
    PS fan [3] = 960 - 980 rpm
    Case exhaust fan [4] = not monitored

    CPU temp = 52*C/122*F
    MB temp = 46*C/115*F

    HDD0 = 32*C/90*F
    HDD1 = 41*C/106*F
    HDD3 = not monitored {SMART shows 41*C to 47*C low to high]

    MBM has given some weird temps. I think they are wrong because I didn’t get a good match for my MB during MBM’s setup.
    I have not run anything other than to get the above info so far.

    I have updated video and sound drivers to the current available within the last 2 weeks.

    I’ve gone into the Event Viewer, but there so many warning and error logs through the last month that I think I’ll need to go look after the next crash to find what I’m looking for. There are a lot of Yellow Dhcp, tcip with a few W32Time Warnings along with many Red Service Control Mgr, DCOM, W32Time with a few i8042ptr and atapi Errors. The DHC warnings show a pattern of 2 to 5 errors following after them. But, I cannot link anything directly to the crashes yet.

    Device Manager shows no problems other than with the Silicon Image SATARaid drivers. Because I have no need for a RAID array, I have not been worried about it. If needed, fixing it is no problem other than the time to do it.

    I think that about covers things for now. I haven’t had a chance to look at the capacitor article yet. I’m getting ready to take off for the weekend and will check it out by Monday.
    I hope this gives you something to work with in trying to help get this figured out.
    Thx
     
  5. solo_voyager

    solo_voyager Private E-2

    Read that article on the bad capacitors. It prompted me to turn the PC off and pull the cover. I don't think that is the problem. One, these are still quite new. Two, they're a different brand. Three, They don't show any sign of pressure building internally that I can see.
    But, I did find that the rear exhaust fan [the unmonitored one] is bad. I'll replace it by Monday with one that can and will be moitored.
     
  6. Bold Eagle

    Bold Eagle MajorGeek

    Tell you what my graphics were a bit "weird" recently so I uninstalled my drivers (CC, DD, WDM, etc) using a good uninstalattion program (could use ATI uninstallation tool, ensuring REG is clean) and reinstalled and also the latest vga driver and things have sorted themselves.
     
  7. solo_voyager

    solo_voyager Private E-2

    Thx for the heads up on the driver installations. I'll look into that aspect. I've got the new exhaust fan installed. Air is "whooshing" through the case now. If things were overheating because of the bad fan, I should see some benefit very soon.

    But, in the meantime I have to start a new thread for another problem. I stupidly shot myself in the foot Friday evening as I was putting the case back together.

    Thx all for your help. I'll get back to this one as things develop.
     
  8. solo_voyager

    solo_voyager Private E-2

    OK, I got it to crash again. I found an entry in the Event Viewer. It was not an Error or Warning. It was merely an Information entry. It said:

    The computer has rebooted from a bugcheck.
    The bugcheck was: 0x100000ea (0x87d938c8, 0x87e0eba8, 0xa6dfecbc, 0x00000001).
    A dump was saved in: C:\WINDOWS\Minidump\Mini040306-01.dmp.


    I found the dump file. Then, I figured I needed to get the Windows Debugger installed to be able to read it. So, I did. Then, I figured I needed the Symbols downloaded and installed too. So, I did that too.
    I ran WinDbg and got this:

    Microsoft (R) Windows Debugger Version 6.6.0003.5
    Copyright (c) Microsoft Corporation. All rights reserved.


    Loading Dump File [C:\WINDOWS\Minidump\Mini040306-01.dmp]
    Mini Kernel Dump File: Only registers and stack trace are available

    Symbol search path is: *** Invalid ***
    ****************************************************************************
    * Symbol loading may be unreliable without a symbol search path. *
    * Use .symfix to have the debugger choose a symbol path. *
    * After setting your symbol path, use .reload to refresh symbol locations. *
    ****************************************************************************
    Executable search path is:
    *********************************************************************
    * Symbols can not be loaded because symbol path is not initialized. *
    * *
    * The Symbol Path can be set by: *
    * using the _NT_SYMBOL_PATH environment variable. *
    * using the -y <symbol_path> argument when starting the debugger. *
    * using .sympath and .sympath+ *
    *********************************************************************
    Unable to load image ntoskrnl.exe, Win32 error 2
    *** WARNING: Unable to verify timestamp for ntoskrnl.exe
    *** ERROR: Module load completed but symbols could not be loaded for ntoskrnl.exe
    Windows XP Kernel Version 2600 (Service Pack 2) MP (2 procs) Free x86 compatible
    Product: WinNt, suite: TerminalServer SingleUserTS
    Kernel base = 0x804d7000 PsLoadedModuleList = 0x805624a0
    Debug session time: Mon Apr 3 19:11:35.843 2006 (GMT-8)
    System Uptime: 0 days 3:09:55.555
    *********************************************************************
    * Symbols can not be loaded because symbol path is not initialized. *
    * *
    * The Symbol Path can be set by: *
    * using the _NT_SYMBOL_PATH environment variable. *
    * using the -y <symbol_path> argument when starting the debugger. *
    * using .sympath and .sympath+ *
    *********************************************************************
    Unable to load image ntoskrnl.exe, Win32 error 2
    *** WARNING: Unable to verify timestamp for ntoskrnl.exe
    *** ERROR: Module load completed but symbols could not be loaded for ntoskrnl.exe
    Loading Kernel Symbols
    ................................................................................................................................................................
    Loading User Symbols
    Loading unloaded module list
    ...............
    Unable to load image ati2mtag.sys, Win32 error 2
    *** WARNING: Unable to verify timestamp for ati2mtag.sys
    *** ERROR: Module load completed but symbols could not be loaded for ati2mtag.sys
    ERROR: FindPlugIns 8007007b
    *******************************************************************************
    * *
    * Bugcheck Analysis *
    * *
    *******************************************************************************

    Use !analyze -v to get detailed debugging information.

    BugCheck 100000EA, {87d938c8, 87e0eba8, a6dfecbc, 1}

    ***** Kernel symbols are WRONG. Please fix symbols to do analysis.

    *** WARNING: Unable to verify timestamp for ati2cqag.dll
    *** ERROR: Module load completed but symbols could not be loaded for ati2cqag.dll
    Unable to load image watchdog.sys, Win32 error 2
    *** WARNING: Unable to verify timestamp for watchdog.sys
    *** ERROR: Module load completed but symbols could not be loaded for watchdog.sys
    *************************************************************************
    *** ***
    *** ***
    *** Your debugger is not using the correct symbols ***
    *** ***
    *** In order for this command to work properly, your symbol path ***
    *** must point to .pdb files that have full type information. ***
    *** ***
    *** Certain .pdb files (such as the public OS symbols) do not ***
    *** contain the required information. Contact the group that ***
    *** provided you with these symbols if you need this command to ***
    *** work. ***
    *** ***
    *** Type referenced: watchdog!_DEFERRED_WATCHDOG ***
    *** ***
    *************************************************************************
    *************************************************************************
    *** ***
    *** ***
    *** Your debugger is not using the correct symbols ***
    *** ***
    *** In order for this command to work properly, your symbol path ***
    *** must point to .pdb files that have full type information. ***
    *** ***
    *** Certain .pdb files (such as the public OS symbols) do not ***
    *** contain the required information. Contact the group that ***
    *** provided you with these symbols if you need this command to ***
    *** work. ***
    *** ***
    *** Type referenced: nt!_KPRCB ***
    *** ***
    *************************************************************************
    Probably caused by : ati2mtag.sys ( ati2mtag+8d9f )

    Followup: MachineOwner
    ---------



    I am lost as to how to get this setup in order to read the dump file. Can anyone point me in the right direction to go on to the next step in getting this figured out?
     
  9. Bold Eagle

    Bold Eagle MajorGeek

    I'm at a loss here as well Solo, but wrt relation to uninstall and re-install of Vid Drivers there is an app called Drive Cleaner Pro, it will take care of all the registry entries after standard uninstallation and before reinstallation. It's free and I thinks it here otherwise google it.

    Hopefully someone more experienced will be able to answer the debug info or possibly, if no responses, start another thread specifically on that one.
     
  10. solo_voyager

    solo_voyager Private E-2

    I've just had a spontaneous crash, not one I've induced. I think your right. I need to begin another thread specifically aimed at understanding how the Windows Degugger works. I've made a little progress in understanding how to use it, but have reached another roadblock.
    Thx
     
  11. solo_voyager

    solo_voyager Private E-2

    PS
    I will give Drive Cleaner Pro a shot too.
    Thx
     
  12. Rikky

    Rikky Wile E. Coyote - One of a kind

  13. solo_voyager

    solo_voyager Private E-2

    I've attached the minidump files. 040306 was the sim crash while 040406 was the spontaneous crash.

    Also, I've found an "extra" temp sensor. It's a thermistor attached to a LED readout on the case. It was being used to monitor the internal case temp which was running ~85-90*F. I've placed it between the sound and vid cards touching the GPU.
    When the PC starts the temp quickly rises to ~140*F. When a less demanding sim is run the temp quickly rises to ~150*F. When the sim is shut down the temp slowly drops to ~145*F and stays there. I'm a bit leery of running the more demanding sim in case the problem is that I'm simply overloading the vid card. When I work up the nerve, I'll post the numbers.
    Thx
     

    Attached Files:

  14. solo_voyager

    solo_voyager Private E-2

    It just crashed again. Temp of the GPU went up to 175*F. Minidump attached.
     

    Attached Files:

  15. Rikky

    Rikky Wile E. Coyote - One of a kind

    The sim crash looks like it was caused by ub1394.sys I'm guessing your firewire driver,check around to seee if the game has any known bluescreen issues and install the latest patches

    The two spontaneous crashed were caused by ati2mtag.sys your video card driver,probably due to overheating,is your fan spinning up when the card is under load,let us know what happens after you use drivercleaner 'follow the readme to the letter' and install the latest driver :)
     
  16. solo_voyager

    solo_voyager Private E-2

    Well, I spent the morning shuffling the PCI cards around so that I could see what the fan on the vid card was doing ... not much of anything as far as I could tell. It feels very stiff when turned by hand and doesn't turn at all under power.

    I find it kinda scarry that I've had 2 fans fail in the last little bit, the case fan and the GPU fan.

    I've spent the balance of the morning chasing info on the card. It may be an ATI, but it is actually a Sapphire. Now I understand what they mean by OEM. The supplier doesn't have a replacement in stock, but will handle getting the the warranty work done. I cannot be shut down for that kind of time frame. I'll pickup another card, then send this one in to be fixed, then keep it for a backup.

    Thx to all for helping me with this while I fumbled around trying to understand what was going on.
     
  17. Rikky

    Rikky Wile E. Coyote - One of a kind

    Theres your problem then,you shouldnt have any trouble getting it repaired,yeh contact the store and also create a sapphire support ticket :)

    http://www.sapphiretech.com/en/support/createticket.php?pcrc=2
     
  18. Bold Eagle

    Bold Eagle MajorGeek

    Rikkys' the Champ.
     

MajorGeeks.Com Menu

Downloads All In One Tweaks \ Android \ Anti-Malware \ Anti-Virus \ Appearance \ Backup \ Browsers \ CD\DVD\Blu-Ray \ Covert Ops \ Drive Utilities \ Drivers \ Graphics \ Internet Tools \ Multimedia \ Networking \ Office Tools \ PC Games \ System Tools \ Mac/Apple/Ipad Downloads

Other News: Top Downloads \ News (Tech) \ Off Base (Other Websites News) \ Way Off Base (Offbeat Stories and Pics)

Social: Facebook \ YouTube \ Twitter \ Tumblr \ Pintrest \ RSS Feeds