Car Wash Test on 53 leading AI models: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

fubarx@lemmy.world · 1 day ago

pimpampoom@lemmy.zip · 3 hours ago

They didn’t take into account the “thinking mode” most model pass when thinking is activated

Kyuuketsuki@sh.itjust.works · edit-2 2 hours ago

Sure they did. They even had a notation on the results table that grok passed expect when reasoning mode was off.

ETA: they even posted all the reasoning texts for the models they tested