• AnyOldName3@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    19 hours ago

    That study doesn’t seem to support the point you’re trying to use it to support. First it’s talking about machines with error correcting RAM, which most consumer devices don’t have. The whole point of error correcting RAM is that it tolerates a single bit flip in a memory cell and can detect a second one and, e.g. trigger a shutdown rather than the computer just doing what the now-incorrect value tells it to (which might be crashing, might be emitting an incorrect result, or might be something benign). Consumer devices don’t have this protection (until DDR5, which can fix a single bit flip, but won’t detect a second, so it can still trigger misbehaviour). Also, the data in the tables gives figures around 10% for the chance of an individual device experiencing an unrecoverable error per year, which isn’t really that often, especially given that most software is buggy enough that you’d be lucky to use it for a year with only a 10% chance of it doing something wrong.

    • SinAdjetivos@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      11 hours ago

      it’s talking about machines with error correcting RAM, which most consumer devices don’t have.

      It’s a paper from 2009 talking about “commodity servers” with ECC protection. Even back then it was fairly common and relatively cheap to implement though it was more often integrated into the CPU and/or memory controller. Since 2020 with DDR5 it’s mandatory to be integrated into the memory as well.

      gives figures around 10% for the chance of an individual device experiencing an unrecoverable error per year, which isn’t really that often

      Yes, that’s my point. Your claim of “computers have nearly no redundancy” is complete bullshit.

      • AnyOldName3@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        8 hours ago

        It wasn’t originally my claim - I replied to your comment as I was scrolling past because it had a pair of sentences that seemed dodgy, so I clicked the link it cited as a source, and replied when the link didn’t support the claim.

        Specifically, I’m referring to

        A single bit flipped by a gamma ray will not cause any sort of issue in any modern computer. I cannot overstate how often this and other memory errors happen.

        This just isn’t correct:

        • loads of modern computers don’t use DDR5 or ECC variants of older generations at all, so don’t have any error-correcting memory. If the wrong bit flips, they just crash.
        • loads of modern computers don’t exclusively use DDR5, e.g. graphics memory (which didn’t have error correction until GDDR7 but can still cause serious problems, e.g. if a bit flips in a command buffer and makes the GPU write back to the wrong address in main memory, overwriting something important), and various caches (SRAM is vulnerable to bit flips from various kinds of radiation, too). If the wrong bit flips, they just crash.
        • Compared to other computer problems that can put the wrong data into memory, like experiencing a bug because a programmer made a mistake, or even just a part wearing out from age, memory errors are really rare, so anything implying normal people need to care is thoroughly overstating their prevalence.
        • SinAdjetivos@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          2 hours ago

          It wasn’t originally my claim

          Sorry, I wasn’t paying attention and missed that. I apologize.

          loads of modern computers don’t use DDR5 or ECC variants of older generations at all, so don’t have any error-correcting memory. If the wrong bit flips, they just crash.

          Integrated memory ECC isn’t the only check, it’s an extra redundancy. The point of that paper was to show how often single bit errors occur within one part of a computer system.

          memory errors are really rare

          Right, because of redundancies. It takes 2 simultaneous bit flips in different regions of the memory in order to cause a memory error and it’s still ~10% chance annually according to the paper I cited.