The ARC Prize organization designs benchmarks which are specifically crafted to demonstrate tasks that humans complete easily, but are difficult for AIs like LLMs, “Reasoning” models, and Agentic frameworks.

ARC-AGI-3 is the first fully interactive benchmark in the ARC-AGI series. ARC-AGI-3 represents hundreds of original turn-based environments, each handcrafted by a team of human game designers. There are no instructions, no rules, and no stated goals. To succeed, an AI agent must explore each environment on its own, figure out how it works, discover what winning looks like, and carry what it learns forward across increasingly difficult levels.

Previous ARC-AGI benchmarks predicted and tracked major AI breakthroughs, from reasoning models to coding agents. ARC-AGI-3 points to what’s next: the gap between AI that can follow instructions and AI that can genuinely explore, learn, and adapt in unfamiliar situations.

You can try the tasks yourself here: https://arcprize.org/arc-agi/3

Here is the current leaderboard for ARC-AGI 3, using state of the art models

  • OpenAI GPT-5.4 High - 0.3% success rate at $5.2K
  • Google Gemini 3.1 Pro - 0.2% success rate at $2.2K
  • Anthropic Opus 4.6 Max - 0.2% success rate at $8.9K
  • xAI Grok 4.20 Reasoning - 0.0% success rate $3.8K.

ARC-AGI 3 Leaderboard
(Logarithmic cost on the horizontal axis. Note that the vertical scale goes from 0% to 3% in this graph. If human scores were included, they would be at 100%, at the cost of approximately $250.)

https://arcprize.org/leaderboard

Technical report: https://arcprize.org/media/ARC_AGI_3_Technical_Report.pdf

In order for an environment to be included in ARC-AGI-3, it needs to pass the minimum “easy for humans” threshold. Each environment was attempted by 10 people. Only environments that could be fully solved by at least two human participants (independently) were considered for inclusion in the public, semi-private and fully-private sets. Many environments were solved by six or more people. As a reminder, an environment is considered solved only if the test taker was able to complete all levels, upon seeing the environment for the very first time. As such, all ARC-AGI-3 environments are verified to be 100% solvable by humans with no prior task-specific training

  • dblsaiko@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 days ago

    You don’t (normally) have to train people to be afraid of bears or heights or loneliness or boredom. You also don’t (normally) have to train people to have empathy or compassion.

    So what are you implying about people who don’t experience these?

    • ExFed@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 days ago

      What am I implying? That their machinery is abnormal and they likely need assistance to live normal, healthy lives. That’s literally why the fields of psychiatry and psychology exist: healthy people don’t need doctors and therapists. Do you disagree?

      • sp3ctr4l@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 days ago

        Introverts exist, and are… very often fine with solitude, prefer it generally over socializing.

        But they are generally fine at participating in society and living normal lives.

        Healthy people… do need doctors … and therapists.

        A person can outwardly appear to be healthy… and actually not be.

        Preventative medicine, regular checkups, your body changes as you grow, and habits you develop in your youth may need significant reworking.

        Therapy can give otherwise healthy people a method of exploring their inner selves more fully or more consistently… they can teach them frameworks for understanding and dealing with other kinds of people, for being better able to deal with kinds of trauma they have not yet experienced.

        Also… same with physical health… people with some nascent mental problems or patterns forming… probably won’t be obvious to a non specialist, untill it gets more severe.

        • ExFed@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          1 day ago

          Introverts exist, and are… very often fine with solitude, prefer it generally over socializing.

          Definitely! I am one :) but I still desire the presence of friends from time to time (and usually in small groups).

          A person can outwardly appear to be healthy… and actually not be.

          Yup! There’s always a nonzero chance you’re not as healthy as you think you are (let’s call it the quantum theory of health: everyone is in a superposition of being both healthy and unhealthy at the same time), especially as we change due to age, making us unfamiliar with our own bodies… I’d tell you about my own challenges here, but that’d be TMI.

          And, yes, that’s why we go to regular checkups with someone who has a better perspective to judge “healthiness” (side note: doctors aren’t perfect, so visiting them too frequently can be worse than never at all; there’s a “healthy” cadence to checkups).

          Therapy can give otherwise healthy people a method of exploring their inner selves more fully or more consistently…

          This boils down to the definition of “healthy”. It even becomes a philosophical question that’s really hard to answer… Is it healthy to live a sedentary lifestyle? Is it healthy to exercise too much? Is it healthy to not know TIPP, in case you (or a loved one) gets a panic attack? Is it healthy to ignore yourself? Ignore others? Is it healthy to mention quantum superposition in a conversation about health? ;)

          But, yes, I agree. Life’s as messy and diverse and as hard to sum up as everybody whose ever lived, but yet we carry on … I hope that’s healthy.

          Edit: typo, and missing a hint that I’m making a joke about me over-generalizing physics concepts

            • ExFed@programming.dev
              link
              fedilink
              English
              arrow-up
              1
              ·
              22 hours ago

              Fair enough; the Internet is a silly place full of distracted, armchair philosophers. However, my entire point was that an LLM doesn’t rely on machinery in the same way that a human brain does. That doesn’t make AI “worse” or “better” overall, but it does make it an awful replacement for humans.