Stumped on a crashing issue

Discussion in 'Hardware' started by jackforester, Sep 13, 2009.

  1. jackforester

    jackforester Private E-2

    Hi guys,

    So, here is what is going on. I just built a new machine for a friend of mine, everything went well, no DOA, yada yada yada. We got it together, got Windows Vista x64 installed, installed drivers (which for some reason took forever), and then we decided to benchmark it. We bench'd it with FutureMark and ended up getting of CPU:49087 GPU:28945.

    The problem started sometime after running FutureMark. The machine crashed on the guy during WoW, he skipped the opening cinematic and the thing just went down. It wouldn't turn back on, so he called me and I had him clear CMOS, and try it again. He still didn't get anything, and I won't be able to get my hands on the machine until Wednesday at the earliest, so we just arranged something and went on our ways. He called back and said he tried resetting the CMOS again, and that time it worked and the machine came up. He got back on WoW, played for a minute, no crash. So we thought all was good.

    Later he changed some settings in a game he was playing, and the machine tanked again.

    Anyone know what might be causing this? I am pretty sure it has to be hardware, just not sure WHAT hardware. I am going to check his PSU, and make sure the voltages are right, then I am just going go start pulling parts.

    His setup is this:
    Asus Rampage II Extreme
    Intel Core i7 975 Extreme Edition
    12GB Patriot Viper DDR3
    (3) Asus GTX285s in Tri-SLI
    ThermalTake 1200w PSU (can't remember which one, but it was a high-end one)

    I did have one idea, but I don't know if it is even possible. The motherboard came with a "TweakIt" utility for overclocking, and I am maybe thinking it is trying to automatically OC the machine, causing a crash.

    Sorry it's so early in the game and I can't tell you what I've tested yet, but I want to get a jump on it before I start working on it, and some good ideas of stuff to check would be sweet so I don't overlook something stupid.

    Thanks a ton in advance.
     
  2. Steppenwolf

    Steppenwolf Private E-2

    maybe you overclocked without enough ram
     
  3. jackforester

    jackforester Private E-2

    We haven't overclocked, but I'm pretty sure the amount of RAM isn't a problem, considering it has 12GB.

    Threw some ideas around this morning, we are thinking it's underpowered, gonna pull one of the GFX cards and see what happens.
     
  4. Bold Eagle

    Bold Eagle MajorGeek

    I assume you have checked device manager and ensured all hardware has no ! or ? occurring.

    Furthermore, I hope you looked at Hardware Monitor in BIOS before entering OS and checking the CPU temps here.

    Have all updates for OS been completed?

    Jeepers fist thing to do is check your temps and monitor hardware to ensure everything is running well, HS is keeping CPU cool, etc, etc. Ensure all 3 Video Cards temps are fine as well. Frankly I would do some benching for at least 20mins with something like OCCT and generate the temps, volts and fan speed graphs so you can watch what is occurring over time as the system get's loaded. Next run something like FurRendering Benchmark to push the Video Cards with GPU-z to monitor the temps and make sure all 3 are generating nice ranges. Moreover, all of the RAM should be tested with memtest86+ then you can assume that nothing is doa.

    No offence, but I thought ThermalTake made very average PSU's? and then you have 3 high end Video Cards running off them which is reasonable to assume this is the problem.
     
  5. jackforester

    jackforester Private E-2

    Bold Eagle,

    I plan on running monitoring temps, we checked everything in BIOS before booting and it was running at an appropriate temperature, and actually quite cool. I will check out OCCT, i have never used it, i've always used FutureMark/Aquamak or SuperPI for CPU.

    As far as Thermaltake goes, i've always liked them. I think a lot of their lower-end PSUs are just that, they are kind of generic usage PSUs, but i've never had a problem with their high-end ones. You definitely get what you pay for, and this was no budget PSU.

    http://www.newegg.com/Product/Product.aspx?Item=N82E16817153054&Tpk=thermaltake toughpower 1200w

    That is the link to the PSU. It has nice reviews, and it seems a lot of other people are running SLI/Quad-SLI with it, but I think overall we may be underpowered, but just barely, at least that is the idea some of the people doing it with me have.

    I'm still on RAM or Mobo, mostly due to the fact that both of the crashes have been during loading sections, and not graphically intense areas. Plus, to me anyway, if it was the PSU hosing I would think the system would turn right back on, since it just shut down due to not having enough power.
     
  6. Bold Eagle

    Bold Eagle MajorGeek

    Okay here is the torture tester with excellent graphing OCCT (OverClock Checking Tool) 3.0.0.

    Here is a Power Supply Tool from Newegg and I did a coarse summary of the hardware you had and it came out ~1145 watts, but I could only get it to use 6Gb RAM:

    http://educations.newegg.com/tool/psucalc/index.html

    SuperPi is great for shorter runs and have used it heaps but it doesn't plot and graph the data overtime like OCCT will for Volts, Temps, etc.

    Here in post 6 are examples of the graphs you can generate and get a good idea of how the system is performing over time:

    http://www.hardwarecanucks.com/foru...oling-project-101-lowered-cpu-load-9-10c.html

    For Video I like to use this as it loads the cards very quickly and you can shrink the window so as to have GPU-z running in the same screen and watch the Video Cards over time:

    OpenGL Fur Rendering Benchmark

    GPU-Z 0.34

    Memtest86+ can be a slow pain in the arse but there is a windows based memory tester which is very good as well, will search for it tommorow.
     
  7. Bold Eagle

    Bold Eagle MajorGeek

    Found the MemTest app:

    MemTest .

    You will have to run many instances of it simultaneously to capture all of the RAM (you don't need to buy it to do this), if you have Task Manager open at the same time you can see the RAM getting used up. Maybe just test 6Gb at a time.
     
  8. jackforester

    jackforester Private E-2

    Wow Bold Eagle, thanks a ton for the helpful programs.

    I have been digging around, and your mention of my TT choice being a bad one started making me second guess myself. Even though it is a high-end PSU, it looks like the sustained wattage/amperage may not be enough. We are going to pull one of the GFX cards and run it through its paces to see what happens, and if it stays golden, we are going to have to get a different PSU, or a surrogate one to run one of the cards. The only thing I have found that may be appropriate is a 1250w PC Power and Cooling, or a 1200w Antec that doesn't come out for about another ~1.5 months.

    This machine is a complete monster though, this is the first time I have built something so extreme. I have built a ton of high-end machines, but nothing that pushes the envelope in just about everything, I can't believe I made such the juvenile mistake and didn't check the +12V rails, which I think may be the problem.
     
  9. jackforester

    jackforester Private E-2

    Thanks for the memtest link too, posted whilel I was typing my last post. I was looking for it last night and never did find it.

    Should I assume Memtest86 still takes ages to run? It's going to be rough on 12GB :\.
     
  10. Bold Eagle

    Bold Eagle MajorGeek

    It would be at least an overnight session to test the 12Gb. PC&C are top of their game and often "king of the hill" but I will have a look tomorrow for TriSLI PSU reviews and check some trusted sources for some info. Hopefully you can send the other PSU back. A final thought and what some enthusiasts do is run dual PSU's with such a system, one dedicated for the Graphics alone.
     
  11. jackforester

    jackforester Private E-2

    I was considering dual PSUs, but I'd like to try and stick to a single if it is an option.

    Thanks for helping, it's truly a wonderful thing. I have been looking around for what PSUs people are running with tri-SLI, and I've seen a few running the thermaltake, and several running a Silverstone, and a few with the PC&C. The Silverstone has some bad reviews on Newegg that make me a bit apprehensive. The Thermaltake is even recommended by nVidia for running tri-SLI 285s. I am going to swap some stuff around on the PSU since it is modular and see if I can seperate out the rails for a dedicated rail to each card.
     
  12. jackforester

    jackforester Private E-2

    Anyone ever tried one of these:
    http://www.axiontech.com/prdt.php?item=9306

    or something like it? It seems like it would be a better fit then having a PSU lying next to the case. I'm turned off by Thermaltake now that i've had bad luck with this one, or at least that is my assumption, but supposedly the 650w will run 3 280s, so I can only assume it has enough power to take the load of one of our GTX285s. Plus it has 54A on the 12V Rails combined, which is almost enough amperage to run 3 285s.
     
  13. Bold Eagle

    Bold Eagle MajorGeek

    Wow your obviously at the bleeding edge there as it is very hard to find any "decent" 3 way SLI PSU roundups, shootouts and or reviews but I did find links to nVidia SLI Zone and listings of their suggested PSU's for differing builds:

    http://www.slizone.com/object/slizone_3waysli.html

    Certified SLI-Ready Power Supplies.

    Importantly if we select the 3x GTX285s in Tri-SLI it only "specifies" a specific model of the ThermalTake 1200w PSU, that being the ToughPower 1200w w0133.

    http://www.thermaltakeusa.com/Product.aspx?C=1245&ID=1512

    As can be seen from there site there are at least 3 different 1200w models but only that specific one is recommended for the 3x285's:

    http://www.thermaltake.com/product/Power/ToughPower/w0216/w0216.asp
    http://www.thermaltake.com/product/Power/ToughPower/w0133/w0133.asp
    http://www.thermaltake.com/product/power/toughpower/w0156/w0156.asp

    There are the 1500w models to be considered.
     
  14. Bold Eagle

    Bold Eagle MajorGeek

    I shouldv'e said this at the onset make sure any and all OC options in BIOS and within OS are "switched off" until system stability is resolved (i.e. passing 1 hour run of OCCT). Furthermore you may want to go into BIOS and manually set all of the RAM timings and vDIMM, CPU vCore and Southbridge vMCH (I believe, no NB anymore on I7 mobo's?). Leaving these settings on AUTO can sometimes lead to the BIOS adding 5-10% which could be giving you vDroop spikes and thus crashes.

    Looking at the mobo verifys it will support upto 24Gb of RAM;

    http://www.asus.com/product.aspx?P_ID=W7i5W4Pw4fH22Mih&templete=2#
     
  15. jackforester

    jackforester Private E-2

    The w0133RU is the Thermaltake model we picked up, which was the one recommended by Nvidia.

    This has been a crazy build for me, firstly being someone coming up to me and saying "I want a $5000 computer", second being trying to diagnose such a monster.

    Wednesday is the day I actually get my hands back on the machine, and you have helped me a ton Bold Eagle, I keep clicking thanks, but I wish I could click it more.

    I intended on disabling any kind of auto-tuning in the BIOS, as even my Gigabyte motherboard has it. I'm not sure if the i7 has it, but if it does, i'm also going to disable SpeedStep before running any kind of stress tests.
     
  16. Bold Eagle

    Bold Eagle MajorGeek

    Look one thing keeps bugging me. To the best of my knowledge and current understanding for the extreme high end systems you want a "single, solid 12v rail" that delivers high Amps (A) to the Video Cards. Although the "w0133RU" is on the recommended PSU list is has 4x12v Rails with the power distributed:

    Thermaltake Toughpower W0133RU 1200W ATX12V / EPS12V SLI Certified CrossFire Ready 80 PLUS Certified Modular Active PFC Power Supply - Retail

    We can note that the PC&C Turbo-Cool 1200 T12W does offer a single rail 12v@100A, moreover we can note that it is on the higher GTX295's from the Certified PSU listings but this is for the dual 295's:

    http://www.pcpower.com/power-supply/turbo-cool-1200.html
    http://www.newegg.com/Product/Product.aspx?Item=N82E16817703012

    I'm not sure if this would solve the problem or not, it is only certified upto tri 280's.

    Another point of interest is when using the Power Calculator from newegg you can note as you increase the number of 2Gb RAM modules it goes from 1140w-1198w, he has 12Gb I can only get it to calculate 8Gb. Maybe you want to remove 6Gb of RAM for the time being and get is running and stable.
     
    Last edited: Sep 15, 2009
  17. jackforester

    jackforester Private E-2

    That was bothering me as well. I know when I was hooking everything up, I was being stupid and didn't balance the load of the cards, I know I have 1 1/2 cards on 1 rail, 1/2 a card on another, and another card on its own rail.

    That max draw of the 285s is ~17.5A, and Im pretty sure the rail with the 1 1/2 card is on a 20A rail. So if it is a lack of power causing the drop, it should be that first group on 12V1, if thats infact where it is, and its not one of the 36A rails. I've actually found a better PSU calculator, and it is giving me ~570w draw, so with about 80% efficiency on the PSU, my total output from the PSU would be about 960w, well over the estimated usage. However, I think the wattage is high enough, and if it is the PSU it's going to be due to the fact that I do not have the cards load balanced across the rails.

    http://extreme.outervision.com/psucalculatorlite.jsp

    That is the calculator I was using to come up with the ~570.
     
  18. jackforester

    jackforester Private E-2

    EDIT: I did the math wrong, the 80% effecieny would have to do with the power coming out of the wall, not the power coming into the machine. The machine is going to get 1200w. But either way, the calculated load is roughly at or maybe a little over the recommendation of 50% of PSU capacity, which is why newegg gives you such a high number.
     
  19. jackforester

    jackforester Private E-2

    I've been crunching the numbers and thinking it over, and if I balance the loads properly, the current power supply is more than enough to handle the load. I'm going to balance them, and if it still has problems, I can only assume we have a dead rail, or a faulty motherboard. If it was RAM, Video Card or something else, we would have the same symptoms, but we would be able to turn it back on, and at least get a POST.
     
  20. Bold Eagle

    Bold Eagle MajorGeek

    It looks like you have done your homework and the "Newegg" calculator seems "over simplified" at best. Here are some power consumption charts for Quad SLI runs (i.e. 2xgpu per video card) and they are coming in near the figures you are posting:

    http://www.pcper.com/article.php?aid=655&type=expert&pid=9

    I'm going to have a good look at a buddy of mines site and do some more exploration. He does a Quad XFire run (2xDual GPU 4870's) with the Corsair HX1000:

    http://i4memory.com/f80/share-your-dfi-ut-x58-t3eh8-overclocking-results-here-13273/#post103858

    Some of the best of Australia's OC crowd hang out here and are always discussing the bleeding edge.

    Damn I hate to say it but your at that stage were you may have to strip it down to primary components, minimal RAM, Video Card etc so you can get the system up and running to explore the issue. OCCT will generate Voltages over time so you can assess potential issues with the 12v Rails, vDroop etc and make some inferences into "power delivery", is it clean and solid or is it oscillating significantly with activity (especially loading). Ideally you may expect slight oscillation but greater than 5% vDroop or changes on the 12v Rails could indicate poor power delivery.
     
    Last edited: Sep 15, 2009
  21. Bold Eagle

    Bold Eagle MajorGeek

    Here is further support for your number crunching with tri-sli of 285's:

    http://www.tweaktown.com/articles/1727/nvidia_geforce_gtx_285_in_tri_sli_tested/index11.html

    Some good info on the mobo:

    Asus Rampage II Extreme manual - motherboard layout, memory configuration, bios info.

    Still looking for the best approach to systematically analyse this. I've been out of the loop for about 6 months, got laid off 12 months ago and had 2xHHD's fail and a CPU die, etc so I pulled my head in for a while. But I still have a lot of resources and acquired knowledge so hopefully I'm providing some support.
     
  22. necro61

    necro61 Sergeant

    Hey there, sounds like a tuff one to nail down...

    I'd check the thermal grease on the cpu, sounds like you know what your on about so assume the Power supply is upto spec, hopefully by a fair amount.

    I'd try a burn in 24hour test of some description see if it stays stable.

    You may want to also check if the bios has a setting for internal temp reading (possibly case air temperature) automatic shut down or similar, per chance when the three graphics cards are starting to work this is boosting the air temp beyond default and auto shutdown?....

    Not sure about Vista and if it has protected areas - files like XP has and what cause this may have... I hear stories of WoW accounts getting hijacked, hope this isnt a factor the machine getting attacked...probably not ..this isnt my field...

    anyways good luck with this one.:wave
     
  23. jackforester

    jackforester Private E-2

    I'm definitely no slouch when it comes to PCs, this particular one is just giving me a hard time. It's just such a pain the arse with exams right now, and having to wait until wed. to even get my hands back on the machine. The first thing i'm doing, if it boots, is going to be to recreate the crash. I need to see what happens before/during/after. Once it comes back up, i'm yanking it down to 2GB RAM, 1 VGA Card, and everything but SSD unhooked. If it comes up, it's going to get OCCT ran on it for an hour, shut down, add part, OCCT, shut down, add part, rinse repeat.

    With the PSU being Tri-SLI certified, it's making it hard for me to believe that the lack of power is the problem anymore. If anything, it's a bad rail or something along those lines, or a bad motherboard.
     
  24. Bold Eagle

    Bold Eagle MajorGeek

    I had a look at the users manual and on 15-16pp. it discusses the PCI-e connection. It emphasises to try and ensure that each card is connected to a specific Rail but from the table we can see that is impossible as 12V3 has 2x6pin & 1x8pin whereas 12v4 has 1x6pin and 2x8pin. So no matter which you do it you will have to have at least 1 card sharing the 12v3 and 12v4 rails. Both of these are delivering upto 72A to the cards.

    Instead of trying to "build the system up" I would reverse build for efficiency.

    1. Check all BIOS settings and ensure no OC is enabled, just go for the "plain Jane" setup for now.
    2. Remove the Video Card that would be sharing the rails and ensure the others are on a single rail with a 1x6 and 1x8 connector per card, try switching it on, if it works goto 4.
    3. If nothing go back into BIOS and manually set the CPU vCORE, RAM vDIMM and timings and vICH from the manufacturers spec sheets, try and boot.
    4. Remove RAM from DIMM slots A2, B2 & C2, replace 3rd Video Card and try and boot.

    Start with those steps and hopefully you get some life.

    Once you get some life you can run OCCT for about 12mins and you get enough data plotted to view a graph which should give you enough to make "inferences" into system voltages and CPU temps etc. They should be in the OCCT folder.
     
  25. Bold Eagle

    Bold Eagle MajorGeek

    From the graphs I believe your primary concerns will be vCore Droop (you want this to be less than 5% otherwise this is a good indicator of BSOD from CPU) and the 12v ripple %. Have a quick look at CPU core temps. Hope this is a good starting point.
     
  26. techsent

    techsent Corporal

    Hi jackforester,

    It may be the memory.

    http://usa.asus.com/product.aspx?P_ID=W7i5W4Pw4fH22Mih&templete=2

    Their memory support list displays a few Patriot entries but you'd have to cross check the part # with the installed ones.

    Also, the memory has to be installed into specific slots. *See their foot print notes in the .pdf

    and, depending on the board's bios version, it may need to be flashed. details are on the cpu support list web tab.

    Techsent
     
  27. jackforester

    jackforester Private E-2

    Diagnosed today, dead/dying power supply.

    Test the system with another power supply, and the system immediately turned on and ran like a champ. Had to reduce to a single 285 for my dinky 750w to power it, but the old power supply would not even power the single. As soon as the other one was hooked up, turned it on, ran OCCT, temps got a little high, but I think that was due to the fact of an open case with no fans, besides CPU fan, running.

    Hooked old PSU back up for final test, wouldn't come on again. RMA'd to newegg.
     
  28. Bold Eagle

    Bold Eagle MajorGeek

    Good to see you isolated the problem and hope it get's up ad running with the new PSU. OCCT will load and thus warm things up quickly and is a powerful tool used by many enthusiasts for stability testing. I noticed last night the latest version does the GPU as well and will try that out later myself.

    He should do the Memtesting as well to ensure all modules are good, there is nothing worse than have DATA transcription errors at the "wrong" time and corrupting a HDD.
     
  29. Bold Eagle

    Bold Eagle MajorGeek

    Keen to know if you have to beast up and running again jack.
     

MajorGeeks.Com Menu

Downloads All In One Tweaks \ Android \ Anti-Malware \ Anti-Virus \ Appearance \ Backup \ Browsers \ CD\DVD\Blu-Ray \ Covert Ops \ Drive Utilities \ Drivers \ Graphics \ Internet Tools \ Multimedia \ Networking \ Office Tools \ PC Games \ System Tools \ Mac/Apple/Ipad Downloads

Other News: Top Downloads \ News (Tech) \ Off Base (Other Websites News) \ Way Off Base (Offbeat Stories and Pics)

Social: Facebook \ YouTube \ Twitter \ Tumblr \ Pintrest \ RSS Feeds