What I see in the modern models is that you can often ask them to write a program or script to do a task and they can do that successfully much better than doing the task itself directly - once they have debugged the program it is usually 100% reliable for the specified tasks. Ask them to do those simple tasks directly and you get all kinds of creatively wrong answers.
What I see in the modern models is that you can often ask them to write a program or script to do a task and they can do that successfully much better than doing the task itself directly - once they have debugged the program it is usually 100% reliable for the specified tasks. Ask them to do those simple tasks directly and you get all kinds of creatively wrong answers.