ePoster

A robust machine learning pipeline for the analysis of complex nightingale songs

Mahalakshmi Ramadas, Jan Clemens, Daniela Vallentin
Bernstein Conference 2024(2024)
Goethe University, Frankfurt, Germany

Conference

Bernstein Conference 2024

Goethe University, Frankfurt, Germany

Resources

Authors & Affiliations

Mahalakshmi Ramadas, Jan Clemens, Daniela Vallentin

Abstract

The rise of interactive AI in contextual information processing has enhanced tools for characterizing audio and speech. Behavioral neuroscientists and ecologists studying animal communication benefit greatly from these advancements. However, a universal solution for categorizing vocalizations is difficult due to audio variability as well as inter-species diversity. Our work aims to provide a more generalizable approach by developing a pipeline to segment field recordings from nightingales, which have a complex vocal repertoire of over 200 songs with 1000 different syllables. To identify boundaries of complex syllables on variable recordings, we developed a semi-automated pipeline to segment and classify song syllables from wild nightingale recordings. First, we tested traditional signal amplitude-based syllable boundary detection. Although effective for most songbird data, these tools were inadequate for nightingale songs with characteristic intra-syllable reverberations. We compared machine learning tools such as WhisperSeg, Deep Audio Segmenter (DAS), and the Conformer architecture for segmentation. WhisperSeg, an extension of the Whisper Transformer for automatic speech recognition in humans, additionally trained on extensive animal vocalizations, proved the best for segmenting complex nightingale songs. For the syllable classification, we first used UMAP, a dimensionality reduction tool previously successful for classifying other bird song syllables. However, UMAP struggled with the vast repertoire and high variability of nightingale syllables, failing to generalize to new data. Embedding new syllables into the classic UMAP often resulted in misclassification of rare syllable types i.e., outliers. To identify outliers and iteratively improve the classification performance, we compared techniques like parametric UMAP and variational auto-encoder (VAE), and we also explored pre-trained image classifiers. This iterative method tailored for nightingale data was employed to classify their large syllable repertoire. We developed a robust pipeline for accurate syllable classification in nightingales by optimizing various machine-learning tools. This approach enhances our ability to analyze nightingale songs and opens possibilities for applying such tools to field recordings from other species with complex vocalizations, essential for understanding neuro-behavioral mechanisms underlying animal communication.

Unique ID: bernstein-24/robust-machine-learning-pipeline-1d7ab6a0