Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning
Abstract
Navigation in unfamiliar environments presents a major challenge for robots: while mapping and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy mapping and exploration. Humans can rapidly navigate new environments, particularly indoor environments that are laid out logically, by leveraging semantics -- e.g., a kitchen often adjoins a living room, an exit sign indicates the way out, and so forth. Language models can provide robots with such knowledge, but directly using language models to instruct a robot how to reach some destination can also be impractical: while language models might produce a narrative about how to reach some goal, because they are not grounded in real-world observations, this narrative might be arbitrarily wrong. Therefore, in this paper we study how the ``semantic guesswork'' produced by language models can be utilized as a guiding heuristic for planning algorithms. Our method, Language Frontier Guide (LFG), uses the language model to bias exploration of novel real-world environments by incorporating the semantic knowledge stored in language models as a search heuristic for planning with either topological or metric maps. We evaluate LFG in challenging real-world environments and simulated benchmarks, outperforming uninformed exploration and other ways of using language models.
Community
Proposes Language Frontier Guide (LFG): Using an LLM to bias exploration in unknown environments by grounding their semantic understanding; language models to score sub-goal candidates and use this as a heuristic planner (like PONI: Potential Functions for object-goal navigation); frontier-based exploration (FBE) made better with LLMs; episodic memory map (state of environment) can be a metric occupancy map (semantically labelled), or a topology map (images with label); navigation is low-level control policy. We could use next-token log likelihoods (logprobs) of LLMs but APIs (GPT-3/3.5/4, Claude, PaLM, etc.) usually don’t have that and they’re incompatible with CoT prompting. A VLM gives textual sub-goal descriptor; for each sub-goal, sample LLM multiple times with custom prompt template (positive and negative) with CoT explanations. A structured query (containing observations) grounds goal prediction for LLM; using negatives help counter cases where LLM is not confident. Heuristic function is positive minus negative (weighed) minus distance. Subgoal sampling (scoring) in algorithm 1, FBE modification (LFG) in algorithm 2. Evaluated on Habitat ObjectNav challenge (HM3D dataset); outperforms against DD-PPO, FBE, SemExp, OVRL-v2, greedy LLM, L3MVN. Appendix has hyperparameters, implementation details (used GPT-3.5 LLM); NoMaD low-level policy for obstacle avoidance; appendix B has prompts. From UC Berkeley, Google DeepMind (Sergey Levine).
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper