Hey everyone,

I’m running into a frustrating issue and could use some guidance on how to pinpoint the faulty component.

My system completely locks up every few hours. It’s not just a DE crash; the entire machine becomes unresponsive. The mouse and keyboard are completely dead (no cursor movement, Caps Lock key doesn’t toggle). I’ve tried waiting 10-15 minutes to see if it recovers, but it never does.

REISUB does not work. Holding Alt + SysRq and pressing the keys in order does nothing. The only way out is a hard reset using the case button.

The last time this happened, I ended up buying components for a new computer and replaced them one by one until I found the faulty one. I’d rather try a more targeted approach this time. Though if it takes too much effort, I do have another computer I can fall back on.

Any advice on how to diagnose this efficiently? Logs to check, stress tests to run, or hardware to suspect first?

Thanks in advance!

  • devtoolkit_api@discuss.tchncs.de
    link
    fedilink
    arrow-up
    2
    ·
    13 hours ago

    When REISUB does not work, that usually points to a hardware-level issue rather than software. Here is my debugging checklist for hard freezes:

    Step 1: Rule out RAM

    • Boot a live USB and run memtest86+ overnight. Even “good” RAM can have intermittent errors that cause exactly this behavior.

    Step 2: Check thermals

    • Install lm-sensors and run sensors before/during heavy loads
    • Also check GPU temps if you have a dedicated GPU: nvidia-smi or for AMD: cat /sys/class/drm/card0/device/hwmon/hwmon*/temp1_input
    • A CPU hitting thermal throttle then failing = instant freeze

    Step 3: GPU driver

    • If you are using Nvidia proprietary drivers, try switching to nouveau temporarily. Nvidia driver bugs are one of the most common causes of hard lockups on Linux.
    • Check dmesg | grep -i nvidia or dmesg | grep -i gpu after reboot

    Step 4: Kernel logs from previous boot

    • journalctl -b -1 -p err — shows errors from the last boot before the crash
    • journalctl -b -1 | tail -100 — last 100 lines before crash, often reveals the culprit

    Step 5: SSH test

    • Set up SSH from another device. Next time it freezes, try to SSH in. If SSH works but display is dead = GPU/display issue. If SSH also fails = kernel panic or hardware.

    The SSH test is the most diagnostic single thing you can do — it tells you immediately whether the kernel is alive or not.