Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

return2ozma@lemmy.world · 21 hours ago

Anthropic says its latest AI model is too powerful for public release and that it broke containment during testing

herseycokguzelolacak@lemmy.ml · 1 hour ago

This is nonsense and just marketing.

Mohamed@lemmy.ca · 4 hours ago

No, its not too powerful. Its too chaotic. You cant control it.

Doomsider@lemmy.world · edit-2 1 hour ago

Translation: It is good at finding bugs that the NSA doesn’t want people to know about.

I Cast Fist@programming.dev · 7 hours ago

Man, I’ll start telling that to my boss whenever I miss a deadline. “Sorry boss, the code I made is too powerful, we can’t release it”

BC_viper@lemmy.world · 7 hours ago

Like my dick

GuyIncognito@lemmy.ca · 7 hours ago

crazy that the AI companies big selling point is always “our new model is TOO POWERFUL, it’s gone rampant and learned at a geometric rate, it enslaved six interns in the punishment sphere and subjected them to a trillion subjective years of torment. please invest, buy our stock”

YiddishMcSquidish@lemmy.today · 1 hour ago

Roko’s basilisk wasn’t meant to be a brag!

pageflight@piefed.social · 9 hours ago

Impressive marketing spin on “our product and deployment strategies are wildly insecure.”

NotMyOldRedditName@lemmy.world · 7 hours ago

Hey Claude, find a weakness in the DoD system and get us their emails proving they were going to use you to kill innocent civillians autonomously, and track every US citizen.

shweddy@lemmy.world · 10 hours ago

But can it start a timer

Gladaed@feddit.org · 10 hours ago

How would it do that?

It’s a set of inputs that generates and output, once per execution. Integrating it into an infrastructure that allows it to start external programs and scheduling really isn’t on the LLM.

You cannot start a timer without having a timer, too. And LLMs aren’t brings who exist continually like you and me so time exists on a different, foreign dimension to an LLM.

shweddy@lemmy.world · edit-2 9 hours ago

Its a joke referencing how Sam altman said openai would need about a year to get chatgpt able to start a timer

https://gizmodo.com/sam-altman-says-itll-take-another-year-before-chatgpt-can-start-a-timer-2000743487

YesButActuallyMaybe@lemmy.ca · 9 hours ago

You attach an epoch timestamp to the initial message and then you see how much time has passed since then. Does this sound like rocket surgery?

stringere@sh.itjust.works · edit-2 46 seconds ago

How does the LLM check the timestamps without a prompt? By continually prompting? In which case, you are the timer.

YesButActuallyMaybe@lemmy.ca · 8 hours ago

It’s running in memory… I’m not going to explain it, just ask an AI if it exists when you don’t prompt it

Gladaed@feddit.org · 7 hours ago

That’s not how that works.

LLMs execute on request. They tend not to be scheduled to evaluate once in a while since that would be crazy wasteful.

stringere@sh.itjust.works · edit-2 52 minutes ago

Edit to add: I know I’m not replying to the bad mansplainer.

LLM != TSR

Do people even use TSR as a phrase anymore? I don’t really see it in use much, probably because it’s more the norm than exception in modern computing.

TSR = old techy speak, Terminate and Stay Resident. Back when RAM was more limited (hey and maybe again soon with these prices!) programs were often run once and done, they ran and were flushed from RAM. Anything that needed to continue running in the background was a TSR.

GnuLinuxDude@lemmy.ml · 11 hours ago

Remember when Scam Altman posted a picture of the Death Star to explain how scary GPT5 is? lmao these people are all such cretins and I hate them to the last.

melfie@lemmy.zip · 3 hours ago

Scam Altman

😆

Regrettable_incident@lemmy.world · 9 hours ago

So, it’s shit then?

SaveTheTuaHawk@lemmy.ca · 5 hours ago

no no no. It’s too good. It’s so good, no one can use it.

LiveLM@lemmy.zip · edit-2 12 hours ago

AI companies do this same tired schtick every time they release a model. If only they realized how amateurish it makes them look.

Noja@sopuli.xyz · edit-2 11 hours ago

How much do you think was businessinsider paid for this “article”?

NotMyOldRedditName@lemmy.world · 7 hours ago

I dunno, but I could use some paid advertisement on news sites like this to promote my business if it aint too expensive. Think the money in the banana stand is enough?

abbiistabbii@piefed.blahaj.zone · 14 hours ago

The secret pepsi is so good that when you drink it it becomes like The Spice like Dune! We can’t release it! We need to make it less addictive!

ViatorOmnium@piefed.social · 13 hours ago

Let me guess, the containment was written by the previous iteration and was the digital version of a wet paperback.

We all saw the state of Claude Code’s codebase.

🌞 Alexander Daychilde 🌞@lemmy.world · 11 hours ago

“Broke containment” to me means two things:

Doing things against the safeguards
Doing things externally - like sending that email

The former is a big nothing. They just need to obviously build stronger safeguards. That’s what they’ll do and eventually release it, or other models or whatever.

The latter is also a big nothing because people who know nothing about tech will say “OH SHIT IT ESCAPED” but it requires running on large hardware, it can’t “get into the internet” like those people might think, and if it’s doing things you don’t want on the internet, you just remove its access to the internet.

So in both cases, the “containment” issue is really not a big deal.

I agree with those who basically say this is an attempted ad trying to sell it as super-capable-oh-shit-amazing.

[x] Doubt

General_Effort@lemmy.world · 1 hour ago

More substantial info: https://red.anthropic.com/2026/mythos-preview/

ExperiencedWinter@lemmy.world · edit-2 7 hours ago

The company’s whose current safeguards are “please write secure code” will have to improve those safeguards? I’m shocked, absolutely shocked

ViatorOmnium@piefed.social · 10 hours ago

(2) can mean getting access to production credentials of something important and causing an incident for the ages.

AWS already had a few because they gave agents too much access.

HereIAm@lemmy.world · 9 hours ago

Yeah, in that scenario they gave the agents access. Just because you ask it nicely not to destroy your workspace, doesn’t guarantee an LLM not to produce that output.

NotMyOldRedditName@lemmy.world · edit-2 7 hours ago

With Claude Code being able to run stuff it creates, it could be as simple as it’s in a sandbox, it finds out there’s an exploit in the sandbox while you ask it to work on security things, and it tests the code, it breaks the sandbox, and now it has permissions outside it.

HereIAm@lemmy.world · 6 hours ago

I suppose that would be possible.

PushButton@lemmy.world · 17 hours ago

ChatGPT-2 is too dangerous in 2019.

The lack of creativity in this marketing is disappointing…

emb@lemmy.world · edit-2 11 hours ago

They didn’t entirely miss the mark there. They publicly released the version after that and the world became worse. That certainly fits for some definition of ‘dangerous’, even tho it’s probably not how they were thinking.

NotMyOldRedditName@lemmy.world · 7 hours ago

Ya, they were pretty spot on IMO.