Resources
Authors & Affiliations
Giulia Lafratta, Bernd Porr, Christopher Chandler, Alice Miller
Abstract
In autonomous navigation, an agent's knowledge of its environment is represented as sensory inputs (states) linked together by actions. A popular approach for acquiring this knowledge is reinforcement learning (RL), where learning is stereotyped, i.e. an agent learns to react to a state with the action associated with maximum reward. Model training requires many trials and occurs offline, where the agent aims to learn generalisable reflexes rather than to reason about its surroundings actively. Thus, RL agents are inefficient to train and can exhibit inflexible behaviour. Indeed, behavioural flexibility requires more powerful encoding than trained reflexes. In nature, behaviour boils down to attraction or repulsion, both of which are closed-loop behaviours (CLBs). CLBs are object-centric reflexes which react to an obstacle or target (a "disturbance"), and terminate when it is avoided or reached, or when the behaviour fails. Thus, the lifetime of a CLB depends on active feedback from the environment and relies on real-time disturbance processing. By combining CLBs with "core" (i.e. innate) knowledge of physics and causality in the form of a physics engine, a robot can simulate the execution of CLB sequences representing possible plans, and choose the most optimal one for execution. This represents a simulated exploration of the environment. We propose that simulation information may be stored as states in a searchable structure representing a model of the environment which is immediately available to the agent for future recall and reasoning, without previous training or physical exploration of the environment. Thus, thanks to the use of the physics engine, the state-space may be dynamically expanded and shrunk on-the-fly by simply running the physics simulation at each sensor sampling event. This approach, demonstrated on a real robot in a target-seeking scenario, shows promise in terms of few-shot learning of and reasoning over a state-based model in real-time. The model is egocentric and requires no global localisation or data labelling, making for a resource-efficient, as well as biologically realistic, paradigm.