• Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 hours ago

    You seem to be unaware that it only takes about four NVIDIA HGX H100 nodes (32 GPUs) to train something like qwen3.5:122b. That model is about as good as ChatGPT was six months to a year ago (for the usual use cases). That would take a long ass time though (over a year) so you’d want probably 50-100 HGX H100s (or lots of the newer, cheaper ARM-based hardware devices).

    The weights for qwen3.5:122b are open. That means that if you’ve got the hardware (loads of universities and non-profits have waaaay TF more than 4 HGX H100 nodes) you can continue modern AI development. Everything you need is right there on Huggingface! Deepseek’s stuff is also open I think but I forget. Aside: In my head, I hold the qwen models as “the gold standard” based on many articles I’ve read about them but AI moves so fast, there might be better stuff out on any given day! I haven’t read AI news in like a week so I could be all wrong and qwen3.5 is now sooo obsolete, hehe (that’s how it feels to follow AI news, anyway 🤣).

    Even more interesting: qwen3.5:122b isn’t just an LLM. It does visual reasoning (e.g. give it a picture of a plant and ask it to identify it, count the number of screws in an image, estimate distances, etc) as well as the usual LLM stuff. You can read all about it here:

    https://ollama.com/library/qwen3.5:122b

    …and if you install ollama and spend $20 on ollama.com’s cloud service you can actually try it out without having to own enough GPUs to cover the 245+GB requirement. I highly recommend that service! You can try out all the latest & greatest models on your local PC (or phone!) for any purpose you want for a $20. Whenever a new model is out they usually have it up on their servers within a day or two and it’s fast, too.

    FYI: I’ve used ollama cloud to evaluate models for coding (web dev with Python back end) and qwen3.5:122b is fantastic. It’s not as good as Claude Opus 4.6 but it’s close (and cheap) enough that you can just make up for the mistakes with extra instances that check the output with a critical eye (the latest trick in AI-based coding to get good output).

    For reference, the University of Texas at Austin has data centers with 4,000 NVIDIA Blackwell (B200/GB200) GPUs, Harvard has 1,144 GPUs, and the University of Cambridge & Bristol (in the UK) has some monstrous mix of Intel and AMD GPUs. All three are perfectly capable of training new models from scratch or using continuing development on existing open-weight models like Deepseek and Qwen.

    Generative AI isn’t going anywhere. Furthermore, advancements in that space happen so fast that it’s likely that in a few years we won’t need so many GPUs/VRAM to train models. Especially if ternary models (and similar, like Google’s TurboQuant tech) take off.

    I know this is a long comment but I want to point something else out: If OpenAI and Anthropic go bust, that would flood the market with cheap GPUs. It would be a total price collapse and you can bet your ass that clever universities and service providers (like Amazon compute, but 3rd party) would snap those up and bring down prices across the board.

    • XLE@piefed.social
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      37 minutes ago

      I’m sorry if I was unclear the first two times I asked, but when I said:

      care to link me to all these great models from academics and open-source institutions?

      I was interested in the models you’re currently using, not the ones you’re speculating about. Hopefully it goes without saying that “open” weights are precompiled closed-source blobs, and "Open"AI is anything but, etc.

      I’m aware new models are trained at the speed of light and hardware is going obsolete faster than it can be put on racks, which is already a problem, so I would love to believe your theory about inexpensive AI GPUs but those very same companies are already going into debt without selling their current stock.

      Edit: we have a reason to not assume GPUs will suddenly become cheap.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        I literally said I’m using qwen3.5:122b for coding. I also use GLM-5 but it’s slightly slower so I generally stick with qwen.

        It’s right there, in ollama’s library: https://ollama.com/library/qwen3.5:122b

        The weights and everything else for it are on Huggingface: https://huggingface.co/Qwen/Qwen3.5-122B-A10B

        This is not speculation. That’s what I’m actually using nearly every day. It’s not as good as Claude Code with Opus 4.6 but it’s about 90% of the way there (if you use it right). When GLM-5 came out that’s when I cancelled my Claude subscription and just stuck with Ollama Cloud.

        I can use gpt-oss:20b on my GPU (4060 Ti 16GB)—and it works well—but for $20/month, the ability to use qwen3.5 and GLM-5 are better options.

        I still use my GPU for (serious) image generation though. Using ChatGPT (DALL-E) or Gemini (Nano Banana) are OK for one-offs but they’re slow AF compared to FLUX 2 and qwen’s image models running locally. I can give it a prompt and generate 32 images in no time, pick the best one, then iterate from there (using some sophisticated ComfyUI setups). The end result is a superior image than what you’d get from Big AI.