Resources
Authors & Affiliations
Garrison Cottrell, Shubham Kulkarni, Martha Gahl
Abstract
The mapping from the visual field to V1 can be approximated by a log-polar transform. In this domain, scale is a left-right shift, and rotation is an up-down shift. When fed into a standard shift-invariant convolutional network (CNN), this provides scale and rotation invariance. Remarkably, despite the property of rotation invariance, this model provides a novel explanation for the inverted face effect. The effect comes about because, while the scale shift maps directly onto the flat cortex, rotation, as a circle group, does not. When the face is inverted, the features simply shift up and down, but they wrap around, which disrupts the configuration of the features, a key property of face recognition. Hence, when inverted, processing reverts to feature processing, consistent with the common psychological description of how humans process inverted faces. Hence, this peculiar anatomical property explains the reduction in performance in inverted face processing. On the other hand, the standard (Euclidean) representation typically used in CNNs leads to a catastrophic reduction in performance for inverted faces, inconsistent with human behavior.