With its highly anticipated IPO around the corner, the company is balancing its long-held reputation as a leader in safety with the demands of its future shareholders.
In a blog post published Thursday, the company cited its own internal data as evidence that modern AI systems are nearing the point of “recursive self-improvement”—i.e., being able to refine their capabilities without a human in the loop. “AI that can build itself would be a major development in the history of technology—one that could bring enormous good for the world in science, healthcare, and beyond,” the post, which was written by company cofounder Jack Clark and Anthropic Institute lead Marina Favaro, reads. “But full recursive self-improvement also might increase the risks of humans losing control over AI systems.”
Are they? When our own researchers barely have a grasp on how LLMs work, can’t really pinpoint why LLMs make certain choices, can you really expect it to improve itself when it has no sense of understanding? Or will it just make it self even more convoluted and nonsensical until it all falls apart like a house of cards?
I don’t know shit about LLMs, so maybe I’m lacking understanding here.
They know how to steer it and train it…we still know very little about how or why it’s making certain choices. This is also not a new problem, we had the same issue 40 years ago so don’t expect a quick solution on the horizon
They understand the theories and underlying principles, but the sheer amount of data makes it impossible to actually verify it.
An ELI5 comparison would be a hill of stones: you know when you throw more stones onto it, a “landslide” will occur and rearrange the hill. For a very small hill of 10 stones you may even be able to know input and output (“if I throw a stone there, the stones will be like this after the landslide”). But you cannot predict the same for a hill of 1000000 stones, even tho the “rules” are the same. You know what will happen, but you have no way to predict the outcome, or verify that everything went as expected.
The theory / math is not the problem. The scale is.
They created the model and trained it, but they don’t know why it gives what it gives when you ask it a question. Which is why they still haven’t solved the hallucination issue.
Are they? When our own researchers barely have a grasp on how LLMs work, can’t really pinpoint why LLMs make certain choices, can you really expect it to improve itself when it has no sense of understanding? Or will it just make it self even more convoluted and nonsensical until it all falls apart like a house of cards?
I don’t know shit about LLMs, so maybe I’m lacking understanding here.
If researchers barely have grasp on how LLMs work how did they create Claude?
They know how to steer it and train it…we still know very little about how or why it’s making certain choices. This is also not a new problem, we had the same issue 40 years ago so don’t expect a quick solution on the horizon
They understand the theories and underlying principles, but the sheer amount of data makes it impossible to actually verify it.
An ELI5 comparison would be a hill of stones: you know when you throw more stones onto it, a “landslide” will occur and rearrange the hill. For a very small hill of 10 stones you may even be able to know input and output (“if I throw a stone there, the stones will be like this after the landslide”). But you cannot predict the same for a hill of 1000000 stones, even tho the “rules” are the same. You know what will happen, but you have no way to predict the outcome, or verify that everything went as expected.
The theory / math is not the problem. The scale is.
They created the model and trained it, but they don’t know why it gives what it gives when you ask it a question. Which is why they still haven’t solved the hallucination issue.