Resources
Authors & Affiliations
Anoushka Jain, Matthias Hennig, Simon Musall, Robyn Greene, Federico Suprio, Jake Swann, Chris Halcrow, Alexander Kleinjohann, Severin Graff, Juergen Gall, Bjorn Kampa, Sonja Grun, Alessio Buccino
Abstract
Electrophysiological recordings capture signals from hundreds of neurons simultaneously,
but isolating single-cell activity often requires manual curation due to limitations in spike-
sorting algorithms. As dataset sizes grow, the time and expertise required for accurate and
consistent human curation pose a major challenge for experimental labs.
To address this issue, we developed UnitRefine, a classification toolbox that leverages
diverse machine-learning algorithms to minimize manual curation efforts. Using acute
recordings with Neuropixel probes, we collected a large neural dataset with highly
reproducible experimental conditions and had multiple expert human curators label each
recording for reliable cluster identification. This carefully labeled dataset served as the
foundation for our automated curation system that learns from human annotations and
replicates curator decisions. UnitRefine incorporates existing and newly developed quality
metrics, including hyper-synchronous spiking events and drifts in firing rate, to automatically
separate noise from neural clusters with high accuracy.
To address inherent labeling imbalances between well-isolated single-cell clusters and
mixed-population activity, we implemented a cascading classification system. UnitRefine
uses a comprehensive hyperparameter optimization search across various classification
algorithms, including deep-learning and ensemble methods, to identify optimal model
parameters. Across recordings, optimized Random Forest decoder outperformed other
approaches, with up to 87\% accuracy for unseen recordings.
The broad applicability of UnitRefine is demonstrated by its successful performance across
diverse labs and recording conditions, including high-density probes in rats, clinical
recordings in epilepsy patients, and open datasets from the Allen Institute. Notably, labeling
just 20\% of the novel data significantly improved curation performance. UnitRefine is
specifically developed for broad community adoption, easy to use, and fully integrated into
SpikeInterface, allowing users to either apply our pre-trained models or generate new
decoders based on their own curation. New models can also be trained on custom metrics
and easily shared via HuggingFaceHub.