ePoster

Disentangling latent representations of behavior from 3D pose

Joshua Wuand 8 co-authors

Presenting Author

Conference
COSYNE 2025 (2025)
Montreal, Canada

Conference

COSYNE 2025

Montreal, Canada

Resources

Authors & Affiliations

Joshua Wu, Hari Koneru, James Ravenel, Anshuman Sabath, James Roach, Shaun Lim, Michael Tadross, Alex Williams, Timothy Dunn

Abstract

The ability to recognize and interpret shifts in naturalistic behavioral expression across neural (dys)function is critical to systems neuroscience study. Recent developments in computer vision have enabled continuous measurements of 3D keypoints on freely moving animals. Common analysis objectives seek to cluster these time series into recurring and stereotyped behaviors (e.g., running or rearing) and compare their frequencies across experimental conditions. Currently, most behavioral analysis approaches are purely unsupervised and can be sensitive to nuisance variability, such as an animal's individual body shape, or over-segment due to continuous factors, such as speed. As a result, methods often produce action spaces in which desired biological signals are uninterpretably entangled with confounding or uninformative features. These difficulties are exacerbated in deep learning methods, which can model complex nonlinear dynamics in behavior. However, the low interpretability of such models has undermined their widespread use in neuroscience studies. Ideally, researchers could select specific behavioral variables to be isolated (i.e., "disentangled") along interpretable, linear dimensions. Here we present a weakly supervised disentanglement framework for constructing behavioral representations from 3D pose sequences using "scrubbed" conditional variational autoencoders (SC-VAE). Like previous disentanglement methods for behavioral video, SC-VAE defines separate subspaces for supervised and unsupervised latents. However, SC-VAE additionally applies specified linear or nonlinear functions as adversarial components to remove unwanted variable information from a representation. By doing so, SC-VAE ensures that different subspaces encode independent information, thereby constraining nuisance factors to separate subspaces in a user-specified way. We demonstrate the benefits of SC-VAE representations in motion synthesis, clustering, and disease identification in a mouse model of Parkinson's disease. This work contributes to a pressing need for more controllable and researcher-centric deep learning models useful for behavioral neuroscience research.

Unique ID: cosyne-25/disentangling-latent-representations-58c72385