Resources
Authors & Affiliations
Reidar Riveland,Alex Pouget
Abstract
One of humans’ most astonishing cognitive feats is the dual ability to interpret linguistic instructions to perform novel tasks in very few practice trials and conversely produce a linguistic description for a task once it has been learned. In contrast, it typically takes thousands of trials for animals to learn even the simplest behavioral tasks. To explore the neural mechanisms that underpin these remarkable abilities, we trained recurrent neural networks to perform a set of common psychophysical tasks simultaneously with task type information provided by the output of a pre-trained transformer architecture processing natural language instructions. To test the extent to which these models can use language to generalize performance to unseen tasks, we trained models on 14 tasks and tested on 2 held out tasks. The key question is whether the networks exhibit 0-shot learning, i.e., the ability to perform the held out tasks solely by being told what to do. We found that this architecture achieves 0-shot performance of 80% correct (and 90% 3-shot) using S-BERT compared to 27% performance for a model that encodes tasks with orthogonal rule vectors. Examining the first 2 PCs of sensorimotor activity across tasks revealed a highly structured representation aligned along task defined axis, even for previously unseen tasks. Finally, if we allow a network to learn through trial and error, without linguistics instructions, we can invert the network’s language comprehension and train a decoder to produce a linguistic description of how it solved the task. Strikingly, when produced instructions are provided to a second network trained to perform all tasks with instructions, the second network achieves near perfect performance (97% on average). To our knowledge, this is the first neural model demonstrating how the compositional nature of language leads to strong 0-shot generalization in sensorimotor networks.