ePoster

Hacking vocal learning with deep learning: flexible real-time perturbation of zebra finch song

Elizabeth O'Gorman, Drew Schreiner, Richard Mooney, John Pearson
COSYNE 2025(2025)
Montreal, Canada

Conference

COSYNE 2025

Montreal, Canada

Resources

Authors & Affiliations

Elizabeth O'Gorman, Drew Schreiner, Richard Mooney, John Pearson

Abstract

Juvenile male zebra finches learn to produce a single, highly stereotyped song and maintain this song over the course of their adult lives using auditory feedback. Much of what we know of the underlying learning process comes from studies of adult male zebra finches adapting to white noise feedback triggered by either high or low-pitch variants of harmonic stack syllables --- a single static perturbation of a single, simple syllable of a crystallized song. However, zebra finch song is spectrally and temporally rich, with numerous degrees of freedom, and the dynamic learning process must tackle this complexity. Thus, to fully characterize learning, new methods are needed for adaptively intervening in the process. To this end, we developed a pipeline to quantify and selectively manipulate syllable-level song-features within the high-dimensional space of song variants in real time. Using a software platform for adaptive experimentation, we acquired raw audio from adult male zebra finches as they practiced in sound-isolated boxes, computed spectrograms from fixed-width segments, and encoded and classified them using a pretrained variational autoencoder (VAE) augmented with a supervised classification layer. The resulting latent representations could then be used to flexibly trigger feedback by computing arbitrary functions of the embedded spectrograms. Using asynchronous and parallelized processing, analysis can be performed in 7.396 ms ± 0.917 ms per 120 ms of song, allowing us to update as frequently as every 10 ms. Even accounting for network latencies, the lag between data acquisition and feedback to the bird is 15.164 ms ± 2.409 ms, well within behaviorally and physiologically relevant timing. As a result, this pipeline can be used to study adaptations of song in response to algorithmically guided perturbations, allowing us to test fundamental reinforcement learning hypotheses in a tractable high-dimensional system.

Unique ID: cosyne-25/hacking-vocal-learning-with-deep-35b1c45b