Resources
Authors & Affiliations
Matt Whiteway,Anqi Wu,Mia Bramel,Kelly Buchanan,Catherine Chen,Neeli Mishra,Evan Schaffer,Andres Villegas,The International Brain Laboratory,Liam Paninski
Abstract
A popular approach to quantifying animal behavior from video data is through behavioral segmentation, wherein video frames are labeled as containing one or more discrete behavior classes, such as grooming or rearing. These behaviors are often manually labeled, which is time consuming and error prone. An alternative approach is to train a sequence model which learns to map behavioral features extracted from video frames to discrete behaviors, although supervised models still require manually labeled examples to learn from. In order to reduce the need for expensive manual labels in this supervised setting, we introduce a semi-supervised approach that takes advantage of the rich spatiotemporal structure in unlabeled frames to learn a stronger sequence model. This approach constructs a sequence model loss function with three terms: (1) a standard supervised loss that classifies a sparse set of hand labels; (2) a weakly supervised loss that classifies a set of easy-to-compute heuristic labels; and (3) a self-supervised loss that predicts the evolution of the behavioral features. We show how this approach can effectively leverage a large number of unlabeled frames to outperform fully supervised segmentation with fewer labeled frames across a variety of species, behaviors, and experimental paradigms: a head-fixed and spontaneously behaving fly; a head-fixed mouse performing a perceptual decision-making task from the International Brain Laboratory; a freely moving mouse in an open field arena; and two mice in a resident-intruder assay. Our approach thus provides a frame-by-frame estimate of an animal’s behavior, which can be crucial for understanding the effects of experimental and environmental manipulations, with less manual labeling effort.