

Challenge there being that seems to have proven elusive. It’s not too surprising, but trying to use machine learning for robotics is actually really hard.
Driving is much easier, training data with video, audio, and other sensor input complete with how the human manipulated steering and two pedals.
But direct human interaction with the environment is both much more complicated than three controls and is not instrumented. They are trying to build training data from remote operators, but it turns out we aren’t very good at controlling these things remotely anywhere close to acting directly. We are terrible teachers and there’s a fraction of the actionable data that other more successful models had to work with.
If an AI sees a video of someone doing something, it can make a similar video, but can’t model how that might map to what it would see as unrelated motor and hydraulic operation.



In fact, if the models are ingesting this, they will get dumber because training on LLM output degrades things.