• Jesus_666@lemmy.world
    link
    fedilink
    English
    arrow-up
    5
    ·
    17 hours ago

    The newer CPU generations come with cores optimized for this stuff (referred to as an NPU). It actually seems to work fairly well for the kind of model you’d run locally.

    Barring that, a typical laptop dGPU will also work, although not super efficiently since they often don’t exceed 8 GB of VRAM and thus can’t run most models without partially offloading them to the CPU.

    Of course a laptop with a dGPU and NPU cores will make the offloading less painful. So yeah, workable for most reasonably-sized models.

    • NotMyOldRedditName@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      17 hours ago

      Models can split loads across a discrete GPU and CPU/RAM.

      Its not as fast as if you can load it all in the GPU, but it gives you more options. Its been quite common for a long time.