Computer Hard Restarts Only When Playing Certain Random Games, No Bsod, At My Wit's End

Discussion in 'Hardware' started by NeedHelpAtWitsEnd, Apr 30, 2023.

  1. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    Specs:
    Mobo: MSI PRO Z790-P
    Graphics card: GTX 2080 Super
    CPU: Intel Core i7-13700KF
    RAM: DDR5 32GB (2x16GB)
    Cooler: NZXT Kraken X73
    Tower: Corsair 4000D
    Main SSD: SAMSUNG 980 PRO SSD 2TB M.2 (I have two other ones and an HDD, but I don't think they're relevant)
    Power: Corsair HX850

    I'm posting this in hardware but I have no idea what's causing it and it might be software.

    I've been having the weirdest problem with my computer. It's a new rig I put together a couple months back and it has an infuriating problem that only affects certain games. It goes like this:
    • the game runs perfectly fine for over an hour
    • the computer suddenly powers off and restarts with no BSOD
    • the game runs fine for around 20 minutes before another power loss and restart
    • now the game will run only for about minute or so before it happens, with 100% consistency
    The rapid power loss will happen no matter how long I wait between attempts, days or weeks later it will still lose power almost immediately upon trying to play the game (with one exception).

    Things I have tried:
    • making sure my drivers are up to date
    • verifying integrity of game files
    • uninstalling and reinstalling
    • moving the game to another SSD, this delayed the next power loss when playing the game by 20 minutes but then it started happening again
    • checking Windows Event Viewer, unfortunately all it ever says is "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."
    • installing openhardwaremonitor and watching the computer temperatures as I load a game and confirming that they stay completely low and stable all the way up until the power loss
    • removing one RAM stick, triggering a power loss, then swapping it out with the other RAM stick, and confirming that it still crashes
    • using https://www.newegg.com/tools/power-supply-calculator/ to confirm that my PSU is beefy enough for the hardware (it should be, it says 600-699 watts required and mine is 850)
    • running a full Windows virus scan which found nothing
    One of the most perplexing things about this is which games are affected. Here is a list of games that this has happened on:
    Disco Elysium
    112 Operator
    Harmony's Odyssey
    We Were Here Together
    Superfly
    After the Fall

    As you can see, with the exception of After The Fall they're not the most resource-heavy games. 112 Operator is literally 2D. Meanwhile, I play VR games like Beat Saber and resource-heavy games like Squad every day that never have this issue.

    One thing that did happen is that when Disco Elysium was doing this, I assumed it was a problem with the game, so I uninstalled it. A couple months later, when other games were doing the same thing, I reinstalled it, and then went over 30 hours without an issue before it happened again.

    I'm totally at a loss. I can't think of anything that could cause behavior like this. The absolute consistency with which it repros with certain random games seems to rule out every reasonable explanation I can think of. Does anyone have any ideas about what this might be, or any troubleshooting steps I can take?
     
  2. fleppen

    fleppen Gumshoe

  3. Digerati

    Digerati Major Geek Extraordinaire

    Did the problems start from day 1?

    "Certain" and "random" are contradictory words. It is either certain games, or random games. On those "other" games, can we assume the computer will run for hours and hours on end without issues?

    Are you 200% sure you did not insert an extra standoff under the board? Note that cases are designed to support 1000s of different motherboards of different sizes. That Corsair 4000 case supports ITX, µATX, ATX and EATX size boards, for example. So, it is common for cases to have more motherboard mounting points than some motherboards have mounting holes.

    A common mistake by the less experienced and distracted pros alike is to insert one or more extra standoff in the case under the motherboard. Any extra standoff creates the potential for an electrical “short” in one or more circuits. The results range from "nothing" (everything works perfectly) to odd "intermittent” problems, to "nothing" (as in nothing works at all :(). To add to the confusion, these issues may be intermittent, depending on heat, expansion/contraction of materials, as well as continuity/resistance through the contact point.

    Note the latest version of the ATX Form Factor standard hopes to eliminate these issues by dictating where standoffs will go, not just where they may go. But not all existing boards or cases comply with those latest standards. So, you might want to verify you only inserted a standoff where there is a corresponding motherboard mounting hole.

    I recommend taking everything out of the case and assembling the computer on a large, unfinished bread/cutting board to see if it remains stable there. Then inspect the case and verify again, only the necessary standoffs have been installed in the correct places.

    These are classic heat related symptoms so what are your temps? Note many things can cause the same symptoms, heat is just a common one. If the system crashed on the bread board, you might try blasting a desk fan on it. It if holds, try it back in the case.

    In these type crashes you often get no warning, error, Event log entry, or BSOD because the crash is so sudden, the OS does not have time to sense and then report the error.

    I would try a different PSU. Since everything inside the case depends on good, clean, stable power, you need to ensure you are providing it. Yes, 850W is plenty big. And the HX series has a decent reputation. But even the best models from the best makers can have a unit that fails prematurely.
     
  4. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    Thanks for the tool link, I opened it up and it automatically ran without issue: https://imgur.com/a/voVxAea

    Thanks for the reply! Yes, pretty close to it as I recall, but only for Disco Elysium. Other games started acting up a few weeks later.

    Yes, that's correct. The games affected seem to be random, but once a game is doing this, it is certain to continue. The majority of games are unaffected, and I play them for hours on end without this happening.

    Thanks for the tip, I just took a peek and confirmed that the standoffs are in the correct locations only, and there are no extra ones. I couldn't see any extra pieces there that could be causing shorts either.

    Going to be honest, that sounds terrifyingly intimidating to me, but I'll give that a try if everything else fails...

    I used occt, here are my idle temps, a little under 40C: https://imgur.com/a/q7TZdbt

    Then I used the load test, they seemed to stabilize a little under 85C: https://imgur.com/a/tbq1hWc

    I was planning on either trying to swap out the motherboard or the PSU to troubleshoot eventually, but I was siding more towards the motherboard because I didn't think that power issues would cause only particular games to do this. Do you think a bad PSU is more likely to cause these symptoms than a bad motherboard?
     
  5. fleppen

    fleppen Gumshoe

    Okay, so we can rule out a faulty CPU.

    I see that you've installed a lot of extra monitoring programs. What do the voltages do when you're idling in the BIOS/UEFI? The 3.3V, 5V, and 12V should be stable and near their advertised values. Differences of 0.1V for a very limited time are allowed, and some flutter (in the 0.050V line) is acceptable.

    Can you then test the PSU with OCCT? https://www.ocbase.com/download This has a dedicated PSU stresstest built in.
     
  6. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

  7. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    Update, I ran memtest86 overnight and it passed. Uploading the log file anyways.
     

    Attached Files:

  8. foogoo

    foogoo Major "foogoo" Geek

    When this has happened to me, it was a faulty water cooler and the other time a faulty power supply.
     
  9. Digerati

    Digerati Major Geek Extraordinaire

    I would not be happy with 85°C but, unless these tired old eyes deceive me, I see a 94°C in your screen shot. I definitely would not be happy with that.
     
    the mekanic likes this.
  10. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    I think I will try replacing the PSU. I was using it in my previous rig, but the power requirements weren't as high, so maybe this is just exposing a problem with it.

    I still don't think it's related to the problem since the temperatures are completely steady immediately before the power loss, but would you expect better temperatures for my hardware?
     
  11. the mekanic

    the mekanic Major Mekanical Geek

    Anything over 80 Celsius could do some damage. Over 90 degrees is asking for it, as it were.

    Stable temperatures when under full load should be 70-80 tops in my book. 60s is highly preferable. Remember, the BIOS will shut down the machine at a predetermined threshold.
     
    Last edited: May 1, 2023
  12. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    I see, alright that's definitely concerning. I just left the thermal paste that came pre-spread on the cooler, maybe it isn't the best quality though. I'll try applying new stuff.

    When I was troubleshooting to see if it could be temperature-related I had openhardwaremonitor's output on one side of the screen and the game on the other, and I was watching very carefully to see if the temperature went up at all after the game had loaded before the power loss, and it looked rock steady the entire time. Is it possible for the temperature to suddenly spike so quickly that it triggers a shutdown without being visible on the graph?

    I went ahead and ordered another PSU to see if that gets rid of the problem.
     
  13. the mekanic

    the mekanic Major Mekanical Geek

    I wouldn't go hog wild and spend time doing the paste, unless you really want to. Does your BIOS have error logs?
     
  14. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    Ok yeah I can hold off on that since the temperatures are usually fine. I'm not sure where to find BIOS error logs, this is what my BIOS looks like: https://imgur.com/a/XJCVN0g

    I tried poking around but I didn't see anything.
     
  15. the mekanic

    the mekanic Major Mekanical Geek

    Found this image. Looks like it might be available in EZ-Mode. Or, that could just be a changelog.

    Thinking about it, have you double checked your RAM memory for bad bits?
    https://i.imgur.com/z8K9FXb.png
     
  16. Digerati

    Digerati Major Geek Extraordinaire

    I agree with the mekanic. The OEM TIM (thermal interface material) used by the CPU and cooler makers may not be "the best", it is still more than adequate AS LONG AS the cured bond between the mating surfaces is not broken. And that is unlikely or you probably would have excessive heat issues even when idle.

    Remove the side panel of your computer case and blast on high speed a desk fan in there and see how your temps and performance look.
     
  17. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    Damn, I was excited to take a look at the log but unfortunately my BIOS screen doesn't have it, it must be a slightly different version: https://imgur.com/a/a5rmJpO

    I mention in my post that I had repro'd the issue using first one and then the other RAM stick only, but I also just went ahead and ran memtest86 overnight and everything passed, so I think it's definitely not the RAM.

    Thanks for the advice, I actually don't have a fan right now but out of curiosity I kept the heat monitoring open while playing Squad, which is a relatively demanding game, and saw that the CPU temperatures stayed right at 60! So I think I'm good for temperature during normal use at least.

    The new PSU will be here on Sunday, so I can swap it out then and see if it fixes the issue...
     
  18. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    The new PSU got here and as I was setting it up I noticed something. When I was initially putting everything together, I only had one power cable available for the CPU power, even though there are two ports for CPU_PWR1 and CPU_PWR2. I did a quick search online and it sounded like one would be fine, but I just did another search and apparently each one of them provides 235 Watts of power, and the 13700K CPU can draw up to 253 Watts! So, I probably just needed to get another cable and plug in CPU_PWR2 as well. Does this sound like a likely explanation?

    Edit: I'm not sure why the CPU stress test didn't repro the issue, but maybe it depends on usage patterns? In any case, I'll see if I get any more crashes.
     
  19. the mekanic

    the mekanic Major Mekanical Geek

    Plausible.
     
  20. NeedHelpAtWitsEnd

    NeedHelpAtWitsEnd Private E-2

    Ok, I've been playing a bunch of stuff that previously would pretty reliably cause power losses and haven't had a single problem. I think plugging in the CPU_PWR2 has resolved it. Thanks for all the help everyone!
     
    fleppen likes this.

MajorGeeks.Com Menu

Downloads All In One Tweaks \ Android \ Anti-Malware \ Anti-Virus \ Appearance \ Backup \ Browsers \ CD\DVD\Blu-Ray \ Covert Ops \ Drive Utilities \ Drivers \ Graphics \ Internet Tools \ Multimedia \ Networking \ Office Tools \ PC Games \ System Tools \ Mac/Apple/Ipad Downloads

Other News: Top Downloads \ News (Tech) \ Off Base (Other Websites News) \ Way Off Base (Offbeat Stories and Pics)

Social: Facebook \ YouTube \ Twitter \ Tumblr \ Pintrest \ RSS Feeds