Primate Vision
primate vision
Modeling Primate Vision (and Language)
Euclidean coordinates are the wrong prior for primate vision
The mapping from the visual field to V1 can be approximated by a log-polar transform. In this domain, scale is a left-right shift, and rotation is an up-down shift. When fed into a standard shift-invariant convolutional network, this provides scale and rotation invariance. However, translation invariance is lost. In our model, this is compensated for by multiple fixations on an object. Due to the high concentration of cones in the fovea with the dropoff of resolution in the periphery, fully 10 degrees of visual angle take up about half of V1, with the remaining 170 degrees (or so) taking up the other half. This layout provides the basis for the central and peripheral pathways. Simulations with this model closely match human performance in scene classification, and competition between the pathways leads to the peripheral pathway being used for this task. Remarkably, in spite of the property of rotation invariance, this model can explain the inverted face effect. We suggest that the standard method of using image coordinates is the wrong prior for models of primate vision.
Building System Models of Brain-Like Visual Intelligence with Brain-Score
Research in the brain and cognitive sciences attempts to uncover the neural mechanisms underlying intelligent behavior in domains such as vision. Due to the complexities of brain processing, studies necessarily had to start with a narrow scope of experimental investigation and computational modeling. I argue that it is time for our field to take the next step: build system models that capture a range of visual intelligence behaviors along with the underlying neural mechanisms. To make progress on system models, we propose integrative benchmarking – integrating experimental results from many laboratories into suites of benchmarks that guide and constrain those models at multiple stages and scales. We show-case this approach by developing Brain-Score benchmark suites for neural (spike rates) and behavioral experiments in the primate visual ventral stream. By systematically evaluating a wide variety of model candidates, we not only identify models beginning to match a range of brain data (~50% explained variance), but also discover that models’ brain scores are predicted by their object categorization performance (up to 70% ImageNet accuracy). Using the integrative benchmarks, we develop improved state-of-the-art system models that more closely match shallow recurrent neuroanatomy and early visual processing to predict primate temporal processing and become more robust, and require fewer supervised synaptic updates. Taken together, these integrative benchmarks and system models are first steps to modeling the complexities of brain processing in an entire domain of intelligence.