human observers
Latest
Learning to see Stuff
Materials with complex appearances, like textiles and foodstuffs, pose challenges for conventional theories of vision. How does the brain learn to see properties of the world—like the glossiness of a surface—that cannot be measured by any other senses? Recent advances in unsupervised deep learning may help shed light on material perception. I will show how an unsupervised deep neural network trained on an artificial environment of surfaces that have different shapes, materials and lighting, spontaneously comes to encode those factors in its internal representations. Most strikingly, the model makes patterns of errors in its perception of material that follow, on an image-by-image basis, the patterns of errors made by human observers. Unsupervised deep learning may provide a coherent framework for how many perceptual dimensions form, in material perception and beyond.
Neural and computational principles of the processing of dynamic faces and bodies
Body motion is a fundamental signal of social communication. This includes facial as well as full-body movements. Combining advanced methods from computer animation with motion capture in humans and monkeys, we synthesized highly-realistic monkey avatar models. Our face avatar is perceived by monkeys as almost equivalent to a real animal, and does not induce an ‘uncanny valley effect’, unlike all other previously used avatar models in studies with monkeys. Applying machine-learning methods for the control of motion style, we were able to investigate how species-specific shape and dynamic cues influence the perception of human and monkey facial expressions. Human observers showed very fast learning of monkey expressions, and a perceptual encoding of expression dynamics that was largely independent of facial shape. This result is in line with the fact that facial shape evolved faster than the neuromuscular control in primate phylogenesis. At the same time, it challenges popular neural network models of the recognition of dynamic faces that assume a joint encoding of facial shape and dynamics. We propose an alternative physiologically-inspired neural model that realizes such an orthogonal encoding of facial shape and expression from video sequences. As second example, we investigated the perception of social interactions from abstract stimuli, similar to the ones by Heider & Simmel (1944), and also from more realistic stimuli. We developed and validated a new generative model for the synthesis of such social interaction, which is based on a modification of human navigation model. We demonstrate that the recognition of such stimuli, including the perception of agency, can be accounted for by a relatively elementary physiologically-inspired hierarchical neural recognition model, that does not require the assumption of sophisticated inference mechanisms, as postulated by some cognitive theories of social recognition. Summarizing, this suggests that essential phenomena in social cognition might be accounted for by a small set of simple neural principles that can be easily implemented by cortical circuits. The developed technologies for stimulus control form the basis of electrophysiological studies that can verify specific neural circuits, as the ones proposed by our theoretical models.
Bayesian integration of audiovisual speech by DNN models is similar to human observers
COSYNE 2025
human observers coverage
3 items