…and I still don’t get it. I paid for a month of Pro to try it out, and it is consistently and confidently producing subtly broken junk. I had tried doing this before in the past, but gave up because it didn’t work well. I thought that maybe this time it would be far along enough to be useful.

The task was relatively simple, and it involved doing some 3d math. The solutions it generated were almost write every time, but critically broken in subtle ways, and any attempt to fix the problems would either introduce new bugs, or regress with old bugs.

I spent nearly the whole day yesterday going back and forth with it, and felt like I was in a mental fog. It wasn’t until I had a full night’s sleep and reviewed the chat log this morning until I realized how much I was going in circles. I tried prompting a bit more today, but stopped when it kept doing the same crap.

The worst part of this is that, through out all of this, Claude was confidently responding. When I said there was a bug, it would “fix” the bug, and provide a confident explanation of what was wrong… Except it was clearly bullshit because it didn’t work.

I still want to keep an open mind. Is anyone having success with these tools? Is there a special way to prompt it? Would I get better results during certain hours of the day?

For reference, I used Opus 4.6 Extended.

  • cecilkorik@lemmy.ca
    link
    fedilink
    English
    arrow-up
    79
    ·
    edit-2
    19 hours ago

    No, I think you do get it. That’s exactly right. Everything you described is absolutely valid.

    Maybe the only piece you’re missing is that “almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.

    /s but also not /s because this is the unfortunate reality we live in now. We’re all going to eat slop and sooner or later we’re going to be forced to like it.

    • pinball_wizard@lemmy.zip
      link
      fedilink
      arrow-up
      3
      ·
      9 hours ago

      almost right, but critically broken in subtle ways” turns out to actually be more than good enough for many people and many purposes. You’re describing the “success” state.

      Exactly. The consequences are at worst a problem for “future me”, and at best “somebody else’s problem”.

      AI didn’t create this reality, but it’s certainly moved it into the spotlight and to “center stage.”

    • GiorgioPerlasca@lemmy.ml
      link
      fedilink
      arrow-up
      7
      ·
      18 hours ago

      Or maybe we will be forced to switch off LLMs and start solving the bugs introduced by their usage using our minds.

      • cecilkorik@lemmy.ca
        link
        fedilink
        English
        arrow-up
        13
        ·
        17 hours ago

        As a professional software developer, I truly hope that is the case (and I plan to charge at least 10x my current rate after the AI bubble pops when I’m looking for my next job as I expect there to be a massive shortage of people skilled enough to actually deal with the nightmare spaghetti AI code bases)

        Fun times ahead.

        • tohuwabohu@programming.dev
          link
          fedilink
          arrow-up
          8
          ·
          14 hours ago

          It will be interesting (read as: bad) times to get to that point and I agree. The Junior market is basically not existent ever since coding agents appeared, stripping the industry of its future Seniors. We will be chained to our desks.