Idk if this information can help us. But. I’m a cognitive researcher. Although linguistics is not my area I do know a little bit about linguistics and human typing errors. Humans use language in constantly evolving creative ways. Difficult for LLMs to adapt to.
There are also specific types of errors that humans make that are kind of unique to us.
These types of errors can be indicators of a real human. Because humans make them somewhat randomly (kind of, I explain farther down). They are More likely to make them based on how tired they are and from “priming”. Which neither of those can exist in a language model.
Ok, so what “errors” am I talking about ? (By errors I mean language that deviates from grammar rules).
LLM models are largely trained on books and essays.
Not on natural dialogue.
Writing like how we talk is harder for LLMs to interpret. They aren’t terrible at it when what we say is simple commands. But once it deviates from that, the LLM just pulls out keywords and does the best it can. Making errors.
“So, what you think is that it’s really the others ? Like, I don’t know what you mean”.
Example of how a person may talk. Others can understand that sentence, based on the context of conversation , but it makes zero sense isolated. It’s Very difficult for LLMs to understand that type of sentence.
LLMs struggle with adaptive language.
This includes slang, invented creative use of words, and with common verbal grammar errors that become trendy or as a result of some other cause.
For instance. I might say in real life “that pizza was fire”.
You know what I mean. LLM model might think I meant the pizza was cooked in a fire oven or burnt. Or maybe it was spicy.
If I use an emoji for the pizza or fire, The LLM struggles even more to define an appropriate response/interpretation.
LLMs don’t actually interpret anything. So I do not mean that in the literal sense. I’m still talking about pattern matching.
Just to clarify.
Anyway.
Slang and sayings change very fast. Humans can keep up with them. LLMs struggle because they change meaning quickly and go out of style as fast as they come in.
Another human error is when we use a word that “looks” similar to the correct word but is not semantically related.
it’s not a similar meaning word. It actually makes no sense. But it “looks” like the correct word.
For example. Someone might be describing a “platonic” relationship and use the word “planting”.
These words both start with “pl”, are about the same length, And the g in planting has a “c” shape within it.
If you see text with these type of errors it’s likely a human.
Another common human error is editing errors.
For instance you might have noticed sometimes the 2nd or even 3rd word in my sentences have uppercase. This is due to me editing the text to add a better start to the sentence after I already wrote it.
And I can’t be bothered to remove the incorrect capital letter.
This is something a human would do. And it’s location would make sense to other humans. Because we understand intuitively how language can be reduced. A LLM does not. we rarely reduce language in books, essays, or even in typing. However, we do , do it a lot in natural verbal conversations.
Also unusual punctuation can help with identifying human. I actually like using semicolon because when I talk I often add an after thought to the end of sentences; just like this.
But semicolons are rarely used. Some people flat out dislike them. They are very very rarely used in books or essays.
The appropriate use of semi colons might be useful for identification. LLMs may not have enough material to use them properly. But it’s easy for a human to apply semicolon appropriately.
Anyway. I worried putting this out would potentially be used against us. But I also don’t think LLMs can side step these issues. If they try to add errors, it’s going to result in incoherent garble. This is Because humans errors are not statistically systematic. Though they do follow systematic cognitive errors that can be predicted if you understand how priming works. But not at a level that a LLM could do.
More so they can be backwards predicted. Not forward predicted.
I can recognize the error and make some likely predictions what caused it. But I can not predict an error that has not occured yet based on possible causes because the possible causes are virtually un measurable and can’t be identified.
Like the example of platonic vs planting. If I saw such an error I would know it was caused because the words “look” similar. But if I was trying to create an error, it’s much more difficult. That example was even tricky to come up with and I have a creative human brain (a LLM is not creative by it’s very nature).
Hope that’s not too confusing. Wow this is getting long.
If anyone who reads this has any questions, or thoughts on the topic, please comment.
Idk if this information can help us. But. I’m a cognitive researcher. Although linguistics is not my area I do know a little bit about linguistics and human typing errors. Humans use language in constantly evolving creative ways. Difficult for LLMs to adapt to.
There are also specific types of errors that humans make that are kind of unique to us.
These types of errors can be indicators of a real human. Because humans make them somewhat randomly (kind of, I explain farther down). They are More likely to make them based on how tired they are and from “priming”. Which neither of those can exist in a language model.
Ok, so what “errors” am I talking about ? (By errors I mean language that deviates from grammar rules).
LLM models are largely trained on books and essays. Not on natural dialogue.
Writing like how we talk is harder for LLMs to interpret. They aren’t terrible at it when what we say is simple commands. But once it deviates from that, the LLM just pulls out keywords and does the best it can. Making errors.
“So, what you think is that it’s really the others ? Like, I don’t know what you mean”.
Example of how a person may talk. Others can understand that sentence, based on the context of conversation , but it makes zero sense isolated. It’s Very difficult for LLMs to understand that type of sentence.
LLMs struggle with adaptive language.
This includes slang, invented creative use of words, and with common verbal grammar errors that become trendy or as a result of some other cause.
For instance. I might say in real life “that pizza was fire”. You know what I mean. LLM model might think I meant the pizza was cooked in a fire oven or burnt. Or maybe it was spicy.
If I use an emoji for the pizza or fire, The LLM struggles even more to define an appropriate response/interpretation.
LLMs don’t actually interpret anything. So I do not mean that in the literal sense. I’m still talking about pattern matching. Just to clarify.
Anyway.
Slang and sayings change very fast. Humans can keep up with them. LLMs struggle because they change meaning quickly and go out of style as fast as they come in.
Another human error is when we use a word that “looks” similar to the correct word but is not semantically related.
it’s not a similar meaning word. It actually makes no sense. But it “looks” like the correct word.
For example. Someone might be describing a “platonic” relationship and use the word “planting”.
These words both start with “pl”, are about the same length, And the g in planting has a “c” shape within it.
If you see text with these type of errors it’s likely a human.
Another common human error is editing errors.
For instance you might have noticed sometimes the 2nd or even 3rd word in my sentences have uppercase. This is due to me editing the text to add a better start to the sentence after I already wrote it.
And I can’t be bothered to remove the incorrect capital letter.
This is something a human would do. And it’s location would make sense to other humans. Because we understand intuitively how language can be reduced. A LLM does not. we rarely reduce language in books, essays, or even in typing. However, we do , do it a lot in natural verbal conversations.
Also unusual punctuation can help with identifying human. I actually like using semicolon because when I talk I often add an after thought to the end of sentences; just like this.
But semicolons are rarely used. Some people flat out dislike them. They are very very rarely used in books or essays. The appropriate use of semi colons might be useful for identification. LLMs may not have enough material to use them properly. But it’s easy for a human to apply semicolon appropriately.
Anyway. I worried putting this out would potentially be used against us. But I also don’t think LLMs can side step these issues. If they try to add errors, it’s going to result in incoherent garble. This is Because humans errors are not statistically systematic. Though they do follow systematic cognitive errors that can be predicted if you understand how priming works. But not at a level that a LLM could do.
More so they can be backwards predicted. Not forward predicted.
I can recognize the error and make some likely predictions what caused it. But I can not predict an error that has not occured yet based on possible causes because the possible causes are virtually un measurable and can’t be identified.
Like the example of platonic vs planting. If I saw such an error I would know it was caused because the words “look” similar. But if I was trying to create an error, it’s much more difficult. That example was even tricky to come up with and I have a creative human brain (a LLM is not creative by it’s very nature).
Hope that’s not too confusing. Wow this is getting long.
If anyone who reads this has any questions, or thoughts on the topic, please comment.