Human Observers

State-of-the-art machine vision models can predict human recognition memory for complex scenes with astonishing accuracy. In this talk I present work that investigated how memorable scenes are actually remembered and experienced by human observers. We found that memorable scenes were recognized largely based on recollection of specific episodic details but also based on familiarity for an entire scene. I thus highlight current limitations in machine vision models emulating human recognition memory, with promising opportunities for future research. Moreover, we were interested in what observers specifically remember about complex scenes. We thus considered the functional role of eye-movements as a window into the content of memories, particularly when observers recollected specific information about a scene. We found that when observers formed a memory representation that they later recollected (compared to scenes that only felt familiar), the overall extent of exploration was broader, with a specific subset of fixations clustered around later to-be-recollected scene content, irrespective of the memorability of a scene. I discuss the critical role that our viewing behavior plays in visual memory formation and retrieval and point to potential implications for machine vision models predicting the content of human memories.

SeminarNeuroscience

Neural and computational principles of the processing of dynamic faces and bodies

Martin Giese

University of Tübingen

Jul 7, 2020

Body motion is a fundamental signal of social communication. This includes facial as well as full-body movements. Combining advanced methods from computer animation with motion capture in humans and monkeys, we synthesized highly-realistic monkey avatar models. Our face avatar is perceived by monkeys as almost equivalent to a real animal, and does not induce an ‘uncanny valley effect’, unlike all other previously used avatar models in studies with monkeys. Applying machine-learning methods for the control of motion style, we were able to investigate how species-specific shape and dynamic cues influence the perception of human and monkey facial expressions. Human observers showed very fast learning of monkey expressions, and a perceptual encoding of expression dynamics that was largely independent of facial shape. This result is in line with the fact that facial shape evolved faster than the neuromuscular control in primate phylogenesis. At the same time, it challenges popular neural network models of the recognition of dynamic faces that assume a joint encoding of facial shape and dynamics. We propose an alternative physiologically-inspired neural model that realizes such an orthogonal encoding of facial shape and expression from video sequences. As second example, we investigated the perception of social interactions from abstract stimuli, similar to the ones by Heider & Simmel (1944), and also from more realistic stimuli. We developed and validated a new generative model for the synthesis of such social interaction, which is based on a modification of human navigation model. We demonstrate that the recognition of such stimuli, including the perception of agency, can be accounted for by a relatively elementary physiologically-inspired hierarchical neural recognition model, that does not require the assumption of sophisticated inference mechanisms, as postulated by some cognitive theories of social recognition. Summarizing, this suggests that essential phenomena in social cognition might be accounted for by a small set of simple neural principles that can be easily implemented by cortical circuits. The developed technologies for stimulus control form the basis of electrophysiological studies that can verify specific neural circuits, as the ones proposed by our theoretical models.

ePoster

Bayesian integration of audiovisual speech by DNN models is similar to human observers

Haotian Ma, Xiang Zhang, Zhengjia Wang, John F. Magnotti, Michael S. Beauchamp

COSYNE 2025

Human Observers

human observers

Comparing supervised learning dynamics: Deep neural networks match human data efficiency but show a generalisation lag

Learning to see Stuff

Exploring Memories of Scenes

Neural and computational principles of the processing of dynamic faces and bodies

Bayesian integration of audiovisual speech by DNN models is similar to human observers

World Wide highlights

Analytics consent required