Skip to content

When will robotics have its "ChatGPT moment"?

Photo of Wayne Radinsky
Hosted By
Wayne R.
When will robotics have its "ChatGPT moment"?

Details

Hellooooo futurists!

We're going to have another meeting, and as a "Saturday afternoon picnic in the park" like we did back in June which was so much fun, and the topic will be "When will robotics have its ChatGPT moment?" This is the event I was originally planning to do in July but due to schedule conflicts had to postpone it. I hope you all will be able to make it on September 23rd as I'm able to slot it in on that day. Sorry for the somewhat late announcement (about 1.5 weeks away).

What do I mean by "ChatGPT moment"? Well, for researchers in the AI field, ChatGPT was only an incremental improvement on language models that came before it, and it's sudden popularity, exploding rapidly to ~100 million users, came as a surprise. Language models crossed a threshold where they suddenly became useful in a way they had not been until that point.

If you look at the field of robotics, robots are used in manufacturing of everything, and so in that sense are already in widespread use, but we still don't see robots in our daily lives. It is my hypothesis that we have not reached the critical threshold in capability that will enable this. When we do, I think we will see an explosion in popularity and use. Maybe not as rapidly as ChatGPT, because robots are made of atoms, and are not an online service, and have to be manufactured and distributed. But it will probably still be very fast.

What I plan to bring to the discussion is a proposal by Yann LeCun to make robotics systems that can learn vision through "self-supervised" learning, analogous to how the GPT models learned language. The GPT models were trained by first breaking language down into "tokens", which represent words or word parts, then the model was tasked with predicting the next token. It was able to do this for massive amounts of text found in books, news articles, and from the internet. Crucially, manual human labeling was not required.

Yann LeCun proposes that predicting whole frames of video in a similar manner is out of reach, because tokens are small and simple but video frames are huge. As a workaround, he is developing a system that extracts features from video and the model is challenged to predict the features. This enables "self-supervised" learning analogous to large language models (LLMs) to take place. This -- or something like it -- will enable robots to massively "scale up" their learning.

This will be the focus of my talk; as always, you all are welcome to bring your own unique knowledge and insights (and to tell me I'm completely wrong!). We have a small-group discussion format that lends itself well to open-ended discussions.

The meeting spot is the bench by the big tree that you find by starting at the south end of the parking lot and walking further due south through the park. However, since we live in the age of GPS, I have an even better way to tell you all the meeting spot! Just punch 40.011575,-105.254092 into Google Maps (or mapping system of your choice) and flip to "aerial" or "satellite" view. It'll drop a pin on the map and show you exactly where to go.

See You There!

Photo of Boulder Future Salon group
Boulder Future Salon
See more events
Scott Carpenter Park
1505 30th St · Boulder, CO