With its highly anticipated IPO around the corner, the company is balancing its long-held reputation as a leader in safety with the demands of its future shareholders.
They know how to steer it and train it…we still know very little about how or why it’s making certain choices. This is also not a new problem, we had the same issue 40 years ago so don’t expect a quick solution on the horizon
They understand the theories and underlying principles, but the sheer amount of data makes it impossible to actually verify it.
An ELI5 comparison would be a hill of stones: you know when you throw more stones onto it, a “landslide” will occur and rearrange the hill. For a very small hill of 10 stones you may even be able to know input and output (“if I throw a stone there, the stones will be like this after the landslide”). But you cannot predict the same for a hill of 1000000 stones, even tho the “rules” are the same. You know what will happen, but you have no way to predict the outcome, or verify that everything went as expected.
The theory / math is not the problem. The scale is.
They created the model and trained it, but they don’t know why it gives what it gives when you ask it a question. Which is why they still haven’t solved the hallucination issue.
If researchers barely have grasp on how LLMs work how did they create Claude?
They know how to steer it and train it…we still know very little about how or why it’s making certain choices. This is also not a new problem, we had the same issue 40 years ago so don’t expect a quick solution on the horizon
They understand the theories and underlying principles, but the sheer amount of data makes it impossible to actually verify it.
An ELI5 comparison would be a hill of stones: you know when you throw more stones onto it, a “landslide” will occur and rearrange the hill. For a very small hill of 10 stones you may even be able to know input and output (“if I throw a stone there, the stones will be like this after the landslide”). But you cannot predict the same for a hill of 1000000 stones, even tho the “rules” are the same. You know what will happen, but you have no way to predict the outcome, or verify that everything went as expected.
The theory / math is not the problem. The scale is.
They created the model and trained it, but they don’t know why it gives what it gives when you ask it a question. Which is why they still haven’t solved the hallucination issue.