• SleeplessCityLights@programming.dev
    link
    fedilink
    English
    arrow-up
    5
    ·
    18 hours ago

    There is an unsolvable compute problem. The average PC on earth has multiple bit-flips a year from cosmic rays. The space hardened chips we use are 50nm and the chips used from inference are 4 to 6nm. 50nm is far more cosmic ray resistant than 6nm because of the transistor size. Are we supposed to think making H100s with a 65nm process is possible? The speed of light creates a die size limitation as well.

        • SleeplessCityLights@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          They are not using 6nm process chips in the ISS. The computers themselves were made before that process existed. An off-the-shelf space hardened computer system uses 65nm process. Cosmic rays is a very general term, it covers basically everything that flies around in space, that includes sources like the sun, which is hammering everything in the solar system with rays. Outside of the atomsphere there are so many more cosmic rays that non-space hardened computers can not even make calculations. Combined with the difference between the bit flip rate when you make transistors 10 x smaller is also fucked up high. One CPU cycle will have enough erros to make the computer useless. It’s a multi-faceted problem and when the largest limiting factor is weight & size, it can’t be solved with scaling.

        • NotMyOldRedditName@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          12 hours ago

          How would that even work with inference where the expected output will be different between 3 runs with the same input?

    • TauZero@mander.xyz
      link
      fedilink
      English
      arrow-up
      2
      ·
      18 hours ago

      The way I see it is they are doing inference, not transfiring bank account balances. I’d be curious to see some actual experimental data, but I’d expect LLMs to skip past bit flips same way you shrug and move on from spelling errors. At worst you can do your critical calculation in triplicate on your 6nm nodes (with redo upon dissensus) and reduce your bit error from 4/year (or 4000/year or whatever have you in orbit) to (4/year)^3

      • i_am_not_a_robot@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 hours ago

        A bit flip might change a 0 to a 1 or a 1 to an infinity. Even if you could just do everything three times, that triples the hardware and energy costs compared to terrestrial computing.

      • SleeplessCityLights@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 hours ago

        I don’t think you don’t understand the difference between the amout of cosmic rays, which are basically any flying particle, on Earth compared to space. Small nodes would be dealing with multiple per cpu cycle. Multiple could be 1 million a second, I am trying to figure out a way to measure. It would be something like distance from atomsphere(rate of total particles) x probably of an object the size of a transitior getting hit(rate of collions). I could probably find the bit-flip rate for an off-the-shelf space resistant chip and infere the rate for the size I need, but there are other factors. A bit will not flip on every collision, shrinking transistors exponentially increases this.