Taalas HC1: 17,000 tokens/sec on Llama 3.1 8B vs Nvidia H200’s 233 tokens/sec. 73x faster at one-tenth the power. Each chip runs ONE model, hardwired into the transistors.

  • Hazelnoot [she/her]@beehaw.org
    link
    fedilink
    English
    arrow-up
    15
    ·
    9 hours ago

    Each chip runs ONE model, hardwired into the transistors.

    That’s… that’s an ASIC. That’s literally just an ASIC… with all the tradeoffs and compromises that come with it.

    • TehPers@beehaw.org
      link
      fedilink
      English
      arrow-up
      6
      ·
      7 hours ago

      Shh you’ll pop the bubble if you start talking sensibly. It’s not an ASIC—it’s a specialized piece of hardware optimized to execute a model with unparalleled performance. Now buy my entire stock of them and all the supply for the next two years please.

      (Figuring out the compose combination for an emdash took longer than I’d like to admit lol)

  • notabot@piefed.social
    link
    fedilink
    English
    arrow-up
    67
    ·
    20 hours ago

    Dedicated, single purpose, chip designs are always going to be faster and more efficient to run than general purpose ones. The question will be what the environmental, and financial costs will be of updating to a new model. With a general purpose design it’s just a case of liading sone new code. With a model that’s baked into the silicon you have to design and manufacture new chips, then install them.

    I can see this being useful in certain niche usecases where requirements are not going to change, but it sounds rather limiting in the general case.

    • MagicShel@lemmy.zip
      link
      fedilink
      English
      arrow-up
      20
      ·
      19 hours ago

      A lot of the models we have are about as good as they are going to get. I mean, ChatGPT 5 isn’t appreciably better than ChatGPT 4. Hook one of those models or even one not as strong to a purpose-built RAG pipeline and a controller to run as mesh of interconnected prompts and agents, and you’ll blow away general purpose chatbots in niche areas in terms of cost, efficiency, and performance.

      The question then becomes, to what purpose can you put this super fast, dedicated machine that performs certain small-scopes, simple tasks really well, but also fucks up often enough that you can’t depend on it. To what tasks could you set a bot that does stuff with minimal competence let’s say 90% of the time, and the other 10%, doesn’t create even bigger problems?

      That domain exists, but it’s thin and narrow.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        9
        ·
        17 hours ago

        To what tasks could you set a bot that does stuff with minimal competence let’s say 90% of the time, and the other 10%, doesn’t create even bigger problems?

        Sounds like a typical human to me.

        A chip like this would be perfect for an autonomous robot. Drone, humanoid, whatever - something that still needs to be able to handle itself when it’s cut off from outside control. Always nice to have an internet connection to draw on a bigger, more capable “brain” somewhere else, but if that connection is lost you want it to be able to carry on with whatever it’s doing and not just flop over limply.

        • MagicShel@lemmy.zip
          link
          fedilink
          English
          arrow-up
          6
          ·
          edit-2
          17 hours ago

          Sure. It excels in cases where 60-90% success rate is better than nothing. If you have a smart mine that doesn’t detonate on civilians, 50% success is better than 0. It reduces civilian casualties by 50%, which is still awful, but if you’re going to plant mines it’s better than entirely indiscriminate. Use cases definitely exist. A false positive means it doesn’t detonate on one soldier but might on the next — still an effective deterrent. A false negative means it blows up a kid, which a dumb mine would also do anyway.

          It’s just generally not in situations most people are generally thinking about. You have to imagine cases where there is some upside and no downside. It doesn’t work in a context of say, auto-breaking a car if a pedestrian is detected because a false positive is going to cause accidents and probably kill people even if in other circumstances it does save lives.

          • BlameThePeacock@lemmy.ca
            link
            fedilink
            English
            arrow-up
            5
            ·
            16 hours ago

            A lot of ai hallucinations can be resolved by simply running the results through additional prompts automatically, then checking the various results against each other or against reference material.

            Many agentic systems already do that with a limited number of follow up/check steps, but they’re often restricted by acceptable response times or just sheer costs.

            I managed to get copilot in excel to run a 43 prompt chain in just a little under 10 minutes the other day. The result was exactly what I needed.

            If you have 73 times the output, you can potentially afford to do that kind of processing in an acceptable time frame and cost level.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            2
            ·
            17 hours ago

            Why doesn’t it work in those contexts? It’s better than nothing in those contexts too. I’d rather have a car with onboard intelligence to take over than an uncontrolled one.

            I think you’re letting the perfect be the enemy of the good, here. There are plenty of situations where you don’t need a robot to behave perfectly. People don’t behave perfectly.

            • MagicShel@lemmy.zip
              link
              fedilink
              English
              arrow-up
              4
              ·
              edit-2
              17 hours ago

              No, it doesn’t work in this context because false positive is worse than nothing. False negative is better than nothing. Zero sum. Obviously it depends where you set the threshold of false positive and false negative. I imagined a very simple scenario the first time.

              If even only .001% of the time, you’re going to cause a shit load accidents. You’re going to average a car slamming on the breaks for no reason like every… 2 minutes would be .12, 20 would be 1.2, 200 would be 12% 800 would be 48%, so you’re going to have every car slam on their breaks every 12-15 hours of drive time. That would be an absolute mess.

              • FaceDeer@fedia.io
                link
                fedilink
                arrow-up
                1
                ·
                16 hours ago

                I have no idea what you’re thinking the scenario is here. The alternative is an uncontrolled car, I think I’d rather it had at least some brains behind the decisions it’s making.

                • MagicShel@lemmy.zip
                  link
                  fedilink
                  English
                  arrow-up
                  4
                  ·
                  16 hours ago

                  How does it decide the car is uncontrolled? That’s a failure scenario, too.

                  I’m not even sure what you’re arguing. I said from the get go that there are niche cases where AI is nothing but positive. You seem to be arguing that there are a bunch more cases. Fine. Maybe the niche is slightly less thin and narrow than I think. Cool.

    • morto@piefed.social
      link
      fedilink
      English
      arrow-up
      6
      ·
      19 hours ago

      fpgas can sort of be a middle ground, but i don’t know if they’re capable of running llms

      • bryndos@fedia.io
        link
        fedilink
        arrow-up
        2
        ·
        13 hours ago

        Is there such a thing as modular fpga so that you could “plug in” another one and add more gates, sort of daisy chain them? I don’t know if such interfaces exist , sounds like it might need lots of bandwidth.

        • iceberg314@midwest.social
          link
          fedilink
          arrow-up
          1
          ·
          20 minutes ago

          I bet you could! The interface and literally be what ever you want with FPGAs. You’d just have to keep things organized and program them one at a time I think

        • morto@piefed.social
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 hours ago

          I know very little about fpgas, so I can’t answer your question, but let’s hope someone else can

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    24
    ·
    18 hours ago

    The HC1 chip doesn’t load model weights from memory. It etches them directly into the transistors. Every weight becomes a physical circuit.

    That’s one way to avoid memory bandwidth constraints!

  • altphoto@lemmy.today
    link
    fedilink
    arrow-up
    9
    ·
    20 hours ago

    Hopefully the low cost per kill drones get more affordable. Maybe load up Linux into one of those things and just break off the murderous knives.

  • dieICEdie@lemmy.org
    link
    fedilink
    arrow-up
    7
    ·
    21 hours ago

    This would be great if you could have a machine that would allow you to swap chips… and then they only charge < 50 USD for each chip.

    • boonhet@sopuli.xyz
      link
      fedilink
      arrow-up
      2
      ·
      7 hours ago

      Can’t be that cheap unfortunately if they maxed out the die area. Though it is an older node so maybe not as expensive as flagship GPU chips and shit

          • MagicShel@lemmy.zip
            link
            fedilink
            English
            arrow-up
            2
            ·
            19 hours ago

            The thing that differentiates ChatGPT and Claude is likely more the RAG pipeline that backs them and feeds them context. The models really aren’t getting better, we’re just getting better at using them to break tasks down into units so small AI can figure it out. I’d bet a GPT 5 model or a Claude Opus 4.6 model would last 5, maybe 10 years before you really start to notice its capabilities are falling behind. I’ll bet you could use GPT 4o for 5-10 years and it would be fine.

          • dieICEdie@lemmy.org
            link
            fedilink
            arrow-up
            1
            ·
            19 hours ago

            But if they could make it so the chip is the only thing that is obsolete, That could be recycled pretty easily, or resold.