ePoster

Analyzing animal behavior with domain-adapted vision-language models

Valentin Gabeff, Sepideh Mamooler, Andy Bonnetto, Devis Tuia, Alexander Mathis
FENS Forum 2024(2024)
Messe Wien Exhibition & Congress Center, Vienna, Austria

Conference

FENS Forum 2024

Messe Wien Exhibition & Congress Center, Vienna, Austria

Resources

Authors & Affiliations

Valentin Gabeff, Sepideh Mamooler, Andy Bonnetto, Devis Tuia, Alexander Mathis

Abstract

Cameras allow recording animal behavior with high temporal and spatial resolution. However, cameras also produce large amounts of data, which become time-consuming to annotate. Hence, recent research has turned to machine-learning-assisted tools to automate animal behavior labeling. Existing Deep Learning methods commonly use a fixed set of behavioral classes and require many labeled instances per class to be trained. Large pretrained vision-language models, such as Contrastive Language Image Pretraining (CLIP), offer great promises to improve and facilitate this annotation process: natural language can be used to describe behavioral events with greater detail and pretrained models might readily identify rare events of interest. However, these large pretrained models fail to generalize to uncommon visual domains (i.e. grayscale recordings under IR light) and to the vocabulary of ethology and behavioral neuroscience. In this work, we show that adapting CLIP improves its capability to retrieve domain-specific events from complex queries, and explore how our fine-tuning strategy affects retrieval performance on unseen vocabulary. Additionally, we show that including domain-specific vocabulary without any paired images during training preserves retrieval performance on unseen vocabulary. We first illustrate the applicability of our method on the Snapshot Serengeti dataset which contains camera-trap shots of a diverse range of animal behavior in complex natural environments. We also apply our method to data recorded in a laboratory setting to highlight its broad potential for behavioral studies. Overall, our work outlines the potential of domain-adapted vision-language models to facilitate the annotation process of behavioral data with complex and multi-attribute queries.

Unique ID: fens-24/analyzing-animal-behavior-with-domain-adapted-1424add6