ePoster

Many, but not all, deep neural network audio models predict auditory cortex responses and exhibit hierarchical layer-region correspondence

Greta Tuckute,Jenelle Feather,Dana Boebinger,Josh McDermott
COSYNE 2022(2022)
Lisbon, Portugal

Conference

COSYNE 2022

Lisbon, Portugal

Resources

Authors & Affiliations

Greta Tuckute,Jenelle Feather,Dana Boebinger,Josh McDermott

Abstract

Deep neural networks are commonly used as models of the ventral visual stream, but are less explored in audition. Prior work provided examples of audio-trained neural networks that produce good predictions of fMRI responses in auditory cortex, and exhibit correspondence between model stages and brain regions, but left it unclear the extent to which these results would generalize to other audio neural network models. We evaluated brain-model correspondence for a wide range of publicly available high-performing audio neural network models along with a set of models that we trained on four different tasks. We used two different fMRI datasets of responses to natural sounds to assess replicability. We found that most tested models out-predicted previous “shallow” spectrotemporal filter models of auditory cortex, and exhibited a systematic layer-region correspondence, with middle layers best predicting primary auditory cortex and deep layers best predicting non-primary cortex. However, some state-of-the-art models produced notably worse brain predictions, including recent speech-to-text and audio captioning systems developed for engineering purposes. The results support the hypothesis that hierarchical models optimized for auditory tasks often learn representational transformations that coarsely resemble those in auditory cortex, but indicate that models derived for engineering purposes can deviate substantially from biological systems.

Unique ID: cosyne-22/many-deep-neural-network-audio-models-c949c033