Resources
Authors & Affiliations
Zitong Wang, Xiaoqi Zhang, Corentin Massot, Harold Rockwell, George Papandreou, Alan Yuille, Tai-Sing Lee
Abstract
Our ability to recognize objects and natural scenes whether depicted in photographs, cartoons, or just line drawings (a.k.a. visual cues) is remarkable. Here, we investigated the representation of a set of surface boundary shapes rendered in different visual cues by macaque V1 and V2 neurons. Specifically, we investigated whether the geometric structure of the population code could support the formation of a cue-invariant representation of abstract surface boundary concepts. A visual concept refers to an abstract feature, such as a boundary or contour, that remains consistent across different types of visual cue rendering. We measured the invariance of the geometric structure across visual cues using a cue-transfer decoding paradigm, i.e., decoding boundary concepts rendered in one cue using a decoder trained on another cue. We found significant cue-transfer decoding when the population codes were aligned via a Procrustes transformation to match each other across cues. The cue-invariance of surface boundary representation was the highest in V1, likely because of its higher resolution in feature dimensions and spatial locations. We observed a similar phenomenon in a model of the ventral visual stream (AlexNet). Cue-invariant boundary representation was degraded in V2 compared to V1, likely due to other invariances, such as translation and rotation invariance, developing along the ventral stream hierarchy. Eliminating the individual neurons’ tuning correlations across cues did not adversely affect cue-transfer decoding. The geometric structure was also preserved across different subpopulations of V1 or V2 neurons and between V1 and V2. The stability of the geometric structure increased with the number of neurons participating in the population code. We concluded that despite a trade-off between various forms of invariance along the ventral stream hierarchy, each visual area could optimally build a cue-invariant representation of abstract visual concepts by using the geometric structure of its population code.