Ask ChatGPT to estimate the carbs in your lunch. Now ask it again. And again. Five hundred times. You’d expect the same answer each time. It’s the same photo, the same model, the same question. But you won’t get the same answer. Not even close — and the differences are large enough to cause a
The fact that it uses a non-trivial neural network.
If it was simply a rate count of based on a corpus of how much time each word is followed by each it wouldn’t be stronger than keyboard word predictions. To make accurate suggestions requires emergence of primitive reasoning on the semantics of the tokens, LLM neural networks (transformers) can be analyzed to find subnetworks dedicated to modeling reality. It is still probability, but saying it’s just probability is not faithful
You could also say that it chooses what will be the next word it will say to you.
It has a few words to choose from, which it has selected in relation to the previously spoken words, your question and previous interactions (the context).
The probability you’re talking about (a number) could also be seen as it’s preference among those words.
I’m not sure the probability vocabulary/analogy is necessarily the best one. The best might be to not employ any analogy at all, but then you have to dig deeper into the subject to form yourself an informed opinion.
This series of videos explains it better than I do : https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
Can you explain how it’s more than probability? It’s using a neural network to guess the most likely next token, isn’t it?
The fact that it uses a non-trivial neural network. If it was simply a rate count of based on a corpus of how much time each word is followed by each it wouldn’t be stronger than keyboard word predictions. To make accurate suggestions requires emergence of primitive reasoning on the semantics of the tokens, LLM neural networks (transformers) can be analyzed to find subnetworks dedicated to modeling reality. It is still probability, but saying it’s just probability is not faithful
You could also say that it chooses what will be the next word it will say to you. It has a few words to choose from, which it has selected in relation to the previously spoken words, your question and previous interactions (the context). The probability you’re talking about (a number) could also be seen as it’s preference among those words. I’m not sure the probability vocabulary/analogy is necessarily the best one. The best might be to not employ any analogy at all, but then you have to dig deeper into the subject to form yourself an informed opinion. This series of videos explains it better than I do : https://www.youtube.com/watch?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi