Is there a daemon that will kill any processes using above a specified % of CPU? I’m having issues where a system is sometimes grinding to a halt due to high CPU usage. I’m not sure what process is doing it (can’t htop as system is frozen); ideally I’d like a daemon that automatically kills processes using more than a given % of CPU, and then logs what process it was for me to look back on later. Alternatively something that just logs processes that use a given % of CPU so that I may look back on it after restarting the system.

The system is being used as a server so it’s unattended a lot of the time; it’s not a situation where I did something on the computer and then CPU usage went up.

Edit: Thanks to the comments pointing out it might be a memory leak instead of CPU usage that’s the issue. I’ve set up earlyoom which seems to have diagnosed the problem as a clamd memory leak. I’ve been running clamd on the server for ages without problems so might be the result of an update; I’ve disabled it for now, and will keep monitoring the situation to see if earlyoom catches anything else, or if the problem keeps occurring I’ll try some of the other tools people have suggested.

  • FishFace@piefed.social
    link
    fedilink
    English
    arrow-up
    6
    ·
    10 hours ago

    An almost-complete lockup on Linux is basically always due to running out of memory and having to hit swap. A system can run at 100% CPU and still be usable, but when it hits 100% memory, it will not be usable. For a desktop system, that means keystrokes, if they are registered at all, won’t be registered until minutes have passed. For a server, it will mean all requests time out.

    Unfortunately, Linux’s approach to memory management firstly allows this to happen and secondly fails to solve it once it does happen. What is supposed to happen is that the “OOM killer” wakes up and kills off a process to free up memory. That may theoretically happen if you left the machine on for a year, but what actually happens is that the amount of memory needed to run programs exceeds the amount of physical RAM, but swap is still available, so the OOM killer doesn’t give a shit. At this point many, many operations in programs are taking several orders of magnitude longer than they should do because instead of fetching a value from memory they need to:

    1. context switch to the kernel
    2. find some memory to write to disk, and write it
    3. find the requested memory on disk, and read it into memory
    4. context switch back to the process

    So while your PC is running 100-1000x slower than it normally would, the OOM killer is doing nothing. If you manage to consume all your swap space, then, and only then, will the OOM killer wake up and kill something. It may kill the right thing, or it may not.

    The modern approach is to use a user-space OOM daemon which monitors memory and swap usage and aggressively kills processes before that happens. Unfortunately, this tends to result in killing your (high-memory) web browser, or the whole desktop session.

    Sucks. Get more RAM for your sever maybe.

    • non_burglar@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      9 hours ago

      but what actually happens is that the amount of memory needed to run programs exceeds the amount of physical RAM, but swap is still available, so the OOM killer doesn’t give a shit.

      Stop giving technical advice, you don’t know what you’re talking about.