scene classification
Latest
Euclidean coordinates are the wrong prior for primate vision
The mapping from the visual field to V1 can be approximated by a log-polar transform. In this domain, scale is a left-right shift, and rotation is an up-down shift. When fed into a standard shift-invariant convolutional network, this provides scale and rotation invariance. However, translation invariance is lost. In our model, this is compensated for by multiple fixations on an object. Due to the high concentration of cones in the fovea with the dropoff of resolution in the periphery, fully 10 degrees of visual angle take up about half of V1, with the remaining 170 degrees (or so) taking up the other half. This layout provides the basis for the central and peripheral pathways. Simulations with this model closely match human performance in scene classification, and competition between the pathways leads to the peripheral pathway being used for this task. Remarkably, in spite of the property of rotation invariance, this model can explain the inverted face effect. We suggest that the standard method of using image coordinates is the wrong prior for models of primate vision.
scene classification coverage
1 items