PROVIDENCE, R.I. [Brown University] — The black and yellow robot, meant to resemble a large dog, stood waiting for directions. When they came, the instructions weren’t in code but instead in plain English: “Visit the wooden desk exactly two times; in addition, don’t go to the wooden desk before the bookshelf.”
Four metallic legs whirred into action. The robot went from where it stood in the room to a nearby bookshelf, and then, after a brief pause, shuffled to the designated wooden desk before leaving and returning for a second visit to satisfy the command.
Until recently, such an exercise would have been nearly impossible for navigation robots like this one to carry out. Most current software for navigation robots can’t reliably move from English, or any everyday language, to the mathematical language that its robots understand and can perform. And this gets even harder when the software has to make logical leaps based on complex or expressive directions (such as going to the bookshelf before the wooden desk) since that traditionally requires training on thousands of hours of data so that it knows what the robot is supposed to do when it comes across that particular type of command.
Advances in so-called large language models that run on artificial intelligence, however, are changing this. Giving robots newfound powers of understanding and reasoning are not only helping make experiments like this achievable but have computer scientists excited about transferring this type of success to environments outside of labs, such as people’s homes and major cities and towns around the world. For the past year, researchers at Brown University’s Humans to Robots Laboratory have been working on a system with this kind of potential and share it in a new paper that will be presented at the Conference on Robot Learning in Atlanta on November 8.
The research marks an important contribution toward more seamless communications between humans and robots, the scientists say, because the sometimes convoluted ways humans naturally communicate with each other usually pose problems when expressed to robots, often resulting in incorrect actions or a long planning lag.
“In the paper, we were particularly thinking about mobile robots moving around an environment,” said Stefanie Tellex, a computer science professor at Brown and senior author of the new study. “We wanted a way to connect complex, specific and abstract English instructions that people might say to a robot — like go down Thayer Street in Providence and meet me at the coffee shop, but avoid the CVS and first stop at the bank — to a robot’s behavior.”
The paper describes how the team’s novel system and software makes this possible by using A.I. language models, similar to those that power chatbots like ChatGPT, to devise an innovative method that compartmentalizes and breaks down the instructions to eliminate the need for the training data.
It also explains how the software provides navigation robots with a powerful grounding tool that has the ability to not only take natural language commands and generate behaviors, but is also able to compute the logical leaps a robot may need to make based on both context from the plain-worded instructions and what they say the robot can or can’t do and in what order.
“In the future, this has applications for mobile robots moving through our cities, whether a drone, a self-driving car or a ground vehicle delivering packages,” Tellex said. “Anytime you need to talk to a robot and tell it to do stuff, you would be able to do that and give it very rich, detailed, precise instructions.”
Tellex says the new system, with its ability to understand expressive and rich language, represents one of the most powerful language understanding systems for route directions that has ever been released, since it can essentially start working in robots without the need for training data. Traditionally, if developers wanted a robot to plot out and complete routes in Boston, for example, they would have to collect different examples of people giving instructions in the city — such as “travel through Boston Common but avoid the Frog Pond” — so the system knows what this means and can compute it to the robot. They have to do that training all over again if they want the robot to then navigate New York City.
The new level of sophistication found in the system the researchers created means it can operate in any new environment without a long training process. Instead, it only needs a detailed map of the environment.
“We basically go from language to actions that are conducted by the robot,” said Ankit Shah, a postdoctoral researcher in Tellex’s lab at Brown.