ePoster

TweetyBERT, a self-supervised vision transformer to automate birdsong annotation

George Vengrovski, Miranda Rose Hulsey-Vincent, Melissa Bemrose, Tim Gardner
COSYNE 2025(2025)
Montreal, Canada

Conference

COSYNE 2025

Montreal, Canada

Resources

Authors & Affiliations

George Vengrovski, Miranda Rose Hulsey-Vincent, Melissa Bemrose, Tim Gardner

Abstract

Manual annotation of birdsong remains a significant bottleneck in systems neuroscience, especially for species like canaries with complex vocal repertoires. Existing unsupervised dimensionality reduction approaches, such as Uniform Manifold and Projection (UMAP), can be used to analyze stereotyped songs but require extensive preprocessing. Alternatively, current deep-learning models for song annotation still require extensive training data . We present TweetyBERT, a novel self-supervised deep neural network that learns to parse complex birdsongs without annotated training data. The process resembles the training of masked prediction models that underlie training large language models. In a nutshell, the model learns to predict masked segments of birdsong using the surrounding song context. In the process of solving the masked prediction task, the model spontaneously develops a latent space representation of song whose key features correspond to syllable classes. By clustering the internal states of TweetyBERT, we can obtain unique states that correspond to human-labeled canary phrases with a high degree of accuracy. TweetyBERT eliminates the need for manual annotation and unlike previous approaches, requires no parameter tuning or input from the experimenter. We validated TweetyBERT's practical utility by analyzing the effects of bilateral basal ganglia lesions on adult canary song. TweetyBERT processed thousands of songs per bird in a fully automated pipeline. This analysis uncovered a rare but extreme “stuttering” behavior in birds with basal ganglia lesions - a finding that would have been difficult to uncover through any manual annotation process. In this poster, we present these results and additional metrics characterizing the quality of the TweetyBERT annotations.

Unique ID: cosyne-25/tweetybert-self-supervised-vision-68bd4a62