AI Is Scheming, and Stopping It Won’t Be Easy, OpenAI Study Finds

fubarx@lemmy.world · 9 hours ago

MentalEdge@sopuli.xyz · edit-2 3 hours ago

Seems like it’s a technical term, a bit like “hallucination”.

It refers to when an LLM will in some way try to deceive or manipulate the user interacting with it.

There’s hallucination, when a model “genuinely” claims something untrue is true.

This is about how a model might lie, even though the “chain of thought” shows it “knows” better.

It’s just yet another reason the output of LLMs are suspect and unreliable.