ePoster

Probing Motion-Form Interactions in the Macaque Inferior Temporal Cortex and Artificial Neural Networks for Complex Scene Understanding

Jean de Dieu Uwisengeyimana, Kohitij Kar
COSYNE 2025(2025)
Montreal, Canada

Conference

COSYNE 2025

Montreal, Canada

Resources

Authors & Affiliations

Jean de Dieu Uwisengeyimana, Kohitij Kar

Abstract

Traditionally, the processing of object motion and form has been attributed to the dorsal and ventral visual pathways, respectively. However, recent studies challenge this strict dichotomy. For example, Hong et al. (2016) demonstrated that the ventral stream represents spatial information, while Ramezanpour et al. (2024) showed that object motion could be decoded from the inferior temporal (IT) cortex. These findings prompt further exploration of how the ventral pathway supports the integration of motion and form, particularly in naturalistic environments. In our study, we aimed to investigate this integration using neurophysiological recordings from rhesus macaques and several artificial neural networks (ANNs). We hypothesized that camouflaged scenes would create situations where object motion aids the detection of form-based attributes that are otherwise less perceptible in stationary scenes. To test this, we presented 132 videos from the Moving Camouflaged Animals (MoCA) dataset to two rhesus macaques, recording neural activity at 95 reliable IT sites using a neuron reliability threshold: 0.4. Videos were shown for 500 ms and included moving camouflaged objects and stationary frames. Our results show that motion enhanced the representation of form-related attributes in the IT cortex. For instance, decoding object size from neural responses yielded a marginally higher correlation with moving stimuli (Pearson R = 0.63) than stationary frames ( R = 0.57). Similar trends were observed for object X- and Y-positions. Additionally, object speed was significantly predicted by IT population responses ( R = 0.26, p = 0.002). We evaluated several image-based and video-based ANNs, finding that models like S3D, R2plus1D_18, and Swin3D outperformed others. Video-based models strongly aligned with primate behavior, highlighting their ability to model both temporal and spatial dynamics. Our findings are the first steps to probing dynamic scene perception in biological and artificial systems, offering insights for developing vision systems capable of handling complex, dynamic environments.

Unique ID: cosyne-25/probing-motion-form-interactions-16c61dd0