Resources
Authors & Affiliations
Tiberiu Tesileanu,Alexander Genkin,Dmitri Chklovskii
Abstract
Motion detection is a fundamental task for the visual system, with cells as early as the retina showing selectivity to specific directions of motion. These cells typically have localized connectivity and receptive fields. Why not pool information from locations that are far apart to improve motion sensing?
Here we provide a normative model that relates the lack of distant connections in motion-sensitive cells to the statistics of natural videos, which exhibit localized patterns undergoing localized motion. These local motions largely conserve contrast in a visual scene, allowing us to treat the transformations between consecutive frames as rotations in the high-dimensional pixel space. Motion can occur at different speeds, so we focus on the infinitesimal generators for these transformations. We show that, when trained on patches from whitened natural videos, a sparse-coding approach learns receptive fields involving small sets of nearby pixels. For biological plausibility, we implement the sparse-coding step in our model using non-negative similarity matching, a method rooted in multidimensional scaling that starts from an optimization problem and produces circuits with local learning rules that perform their functions in an online setting. This makes our approach both normative and biologically plausible.
Our model shows that unsupervised training on natural videos prunes long-range connections between visual receptors, resulting in localized connectivity. This connectivity is dependent on the statistics of visual scenes during learning, allowing future experimental tests of our theory. Specifically, we predict that the organization of motion-detecting circuits in different species should depend on their visual environments.