Mid-last year, I had GPT (maybe 5.0 or 5.1) try to find the source of a bug. Naturally, this code didn’t have tests and git bisect wouldn’t work, and it was a UI interaction bug for which I’m not even really qualified to write a test for, so I asked Codex to bisect between dates X and Y to find the commit that introduced this bug. Codex immediately told me the offending commit was after this date range (which couldn’t possibly be correct). On telling Codex this was wrong, it then told me some commit that was obviously also not the offending commit once or twice. On telling it those were wrong, it then told me the offending commit was some plausible looking commit. When I asked it to prove or disprove its theory, it told me that it wrote a test and confirmed that the alleged commit was the breaking commit.

I then asked it to show me by making a video with the full developer end-to-end stack in the normal browser test environment. It claimed that it didn’t have permissions to do that (which was a lie), but it could make video of the execution of the repro before and after the commit in playwright with the appropriate test code. The video was convincing and showed the feature working properly before the commit and failing to work after the commit. Something about this didn’t feel right, so I tried reproducing the issue by hand before and after the commit and found out that the whole thing was a fabrication. The video made it look like Codex had reproduced the bug, but it was an artificial browser environment that was designed to create a fake repro, not the real environment.

Like I said, because this was non-ironically such a great experience, I immediately thought to myself, “how can I get more of this?”

  • atzanteol@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    4
    ·
    20 hours ago

    The mistake is in assuming the AI is perfect and will be correct all the time.

    If you’re relying on it to be correct and not verifying its output, you’re doing it wrong.

    It’s like doing a search and finding posts in forums. Sometimes what you find is wrong or not appropriate for your situation.

    AI doesn’t replace your need to do critical thinking.

    • HaraldvonBlauzahn@feddit.orgOP
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      15 hours ago

      The mistake is in assuming the AI is perfect and will be correct all the time.

      If you’re relying on it to be correct and not verifying its output, you’re doing it wrong.

      I think unless you are a total beginner, proper verification will frequently take about as long, or longer than writing it yourself.

      Like it’s harder to read even good and correct legacy code, than to write new code.