Input Modality
input modality
Comparing supervised learning dynamics: Deep neural networks match human data efficiency but show a generalisation lag
Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge—that is, the behavioral changes and intermediate stages observed during the acquisition—is less often directly and empirically compared. In this talk, I'm going to report a detailed investigation of the learning dynamics in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment to align learning-relevant conditions such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Comparisons across the entire learning process indicate that DNNs demonstrate a level of data efficiency comparable to human learners, challenging some prevailing assumptions in the field. However, our results also reveal representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.
Do you hear what I see: Auditory motion processing in blind individuals
Perception of object motion is fundamentally multisensory, yet little is known about similarities and differences in the computations that give rise to our experience across senses. Insight can be provided by examining auditory motion processing in early blind individuals. In those who become blind early in life, the ‘visual’ motion area hMT+ responds to auditory motion. Meanwhile, the planum temporale, associated with auditory motion in sighted individuals, shows reduced selectivity for auditory motion, suggesting competition between cortical areas for functional role. According to the metamodal hypothesis of cross-modal plasticity developed by Pascual-Leone, the recruitment of hMT+ is driven by it being a metamodal structure containing “operators that execute a given function or computation regardless of sensory input modality”. Thus, the metamodal hypothesis predicts that the computations underlying auditory motion processing in early blind individuals should be analogous to visual motion processing in sighted individuals - relying on non-separable spatiotemporal filters. Inconsistent with the metamodal hypothesis, evidence suggests that the computational algorithms underlying auditory motion processing in early blind individuals fail to undergo a qualitative shift as a result of cross-modal plasticity. Auditory motion filters, in both blind and sighted subjects, are separable in space and time, suggesting that the recruitment of hMT+ to extract motion information from auditory input includes a significant modification of its normal computational operations.