Dataset
dataset
Arcadia Science
Job Description: Data scientist specializing in analysis of large, high-dimensional datasets and associated methods. Apply techniques from statistics, machine learning, and computational biology to a variety of datatypes and datasets from across Arcadia's research organisms. Datasets might range from genomics, multi-omics, imaging, time course, mass spectrometry, and neural recordings. Coordinate with experimentalists from experimental design all the way to publication. Work with our publishing team to build interactive, sharable resources for the scientific community. Our ideal candidate would have a history of contributions in analysis of complex datasets, a curiosity to work on a variety of problems and data types, and a passion for open science. They would be able to share their expertise both within and outside of Arcadia, and they would be able to translate difficult concepts into runnable, sharable analysis. The Arcadia Story: We are a research and development company leveraging the biology of emerging research organisms. We were founded by Seemay Chou and Prachee Avasthi, scientists convinced there is a better way to explore the full potential of science: how discoveries can be both meaningful and profitable. We are building a team of in-house scientists to carry out active research programs and convene a broader scientific community with a visiting scholars and internship program. Visit our website at www.arcadiascience.com to learn more about our work and check out Seemay’s founding story here.
OpenNeuro FitLins GLM: An Accessible, Semi-Automated Pipeline for OpenNeuro Task fMRI Analysis
In this talk, I will discuss the OpenNeuro Fitlins GLM package and provide an illustration of the analytic workflow. OpenNeuro FitLins GLM is a semi-automated pipeline that reduces barriers to analyzing task-based fMRI data from OpenNeuro's 600+ task datasets. Created for psychology, psychiatry and cognitive neuroscience researchers without extensive computational expertise, this tool automates what is largely a manual process and compilation of in-house scripts for data retrieval, validation, quality control, statistical modeling and reporting that, in some cases, may require weeks of effort. The workflow abides by open-science practices, enhancing reproducibility and incorporates community feedback for model improvement. The pipeline integrates BIDS-compliant datasets and fMRIPrep preprocessed derivatives, and dynamically creates BIDS Statistical Model specifications (with Fitlins) to perform common mass univariate [GLM] analyses. To enhance and standardize reporting, it generates comprehensive reports which includes design matrices, statistical maps and COBIDAS-aligned reporting that is fully reproducible from the model specifications and derivatives. OpenNeuro Fitlins GLM has been tested on over 30 datasets spanning 50+ unique fMRI tasks (e.g., working memory, social processing, emotion regulation, decision-making, motor paradigms), reducing analysis times from weeks to hours when using high-performance computers, thereby enabling researchers to conduct robust single-study, meta- and mega-analyses of task fMRI data with significantly improved accessibility, standardized reporting and reproducibility.
Understanding reward-guided learning using large-scale datasets
Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.
FLUXSynID: High-Resolution Synthetic Face Generation for Document and Live Capture Images
Synthetic face datasets are increasingly used to overcome the limitations of real-world biometric data, including privacy concerns, demographic imbalance, and high collection costs. However, many existing methods lack fine-grained control over identity attributes and fail to produce paired, identity-consistent images under structured capture conditions. In this talk, I will present FLUXSynID, a framework for generating high-resolution synthetic face datasets with user-defined identity attribute distributions and paired document-style and trusted live capture images. The dataset generated using FLUXSynID shows improved alignment with real-world identity distributions and greater diversity compared to prior work. I will also discuss how FLUXSynID’s dataset and generation tools can support research in face recognition and morphing attack detection (MAD), enhancing model robustness in both academic and practical applications.
Expanding mechanisms and therapeutic targets for neurodegenerative disease
A hallmark pathological feature of the neurodegenerative diseases amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) is the depletion of RNA-binding protein TDP-43 from the nucleus of neurons in the brain and spinal cord. A major function of TDP-43 is as a repressor of cryptic exon inclusion during RNA splicing. By re-analyzing RNA-sequencing datasets from human FTD/ALS brains, we discovered dozens of novel cryptic splicing events in important neuronal genes. Single nucleotide polymorphisms in UNC13A are among the strongest hits associated with FTD and ALS in human genome-wide association studies, but how those variants increase risk for disease is unknown. We discovered that TDP-43 represses a cryptic exon-splicing event in UNC13A. Loss of TDP-43 from the nucleus in human brain, neuronal cell lines and motor neurons derived from induced pluripotent stem cells resulted in the inclusion of a cryptic exon in UNC13A mRNA and reduced UNC13A protein expression. The top variants associated with FTD or ALS risk in humans are located in the intron harboring the cryptic exon, and we show that they increase UNC13A cryptic exon splicing in the face of TDP-43 dysfunction. Together, our data provide a direct functional link between one of the strongest genetic risk factors for FTD and ALS (UNC13A genetic variants), and loss of TDP-43 function. Recent analyses have revealed even further changes in TDP-43 target genes, including widespread changes in alternative polyadenylation, impacting expression of disease-relevant genes (e.g., ELP1, NEFL, and TMEM106B) and providing evidence that alternative polyadenylation is a new facet of TDP-43 pathology.
Understanding reward-guided learning using large-scale datasets
Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.
Harnessing Big Data in Neuroscience: From Mapping Brain Connectivity to Predicting Traumatic Brain Injury
Neuroscience is experiencing unprecedented growth in dataset size both within individual brains and across populations. Large-scale, multimodal datasets are transforming our understanding of brain structure and function, creating opportunities to address previously unexplored questions. However, managing this increasing data volume requires new training and technology approaches. Modern data technologies are reshaping neuroscience by enabling researchers to tackle complex questions within a Ph.D. or postdoctoral timeframe. I will discuss cloud-based platforms such as brainlife.io, that provide scalable, reproducible, and accessible computational infrastructure. Modern data technology can democratize neuroscience, accelerate discovery and foster scientific transparency and collaboration. Concrete examples will illustrate how these technologies can be applied to mapping brain connectivity, studying human learning and development, and developing predictive models for traumatic brain injury (TBI). By integrating cloud computing and scalable data-sharing frameworks, neuroscience can become more impactful, inclusive, and data-driven..
Brain Emulation Challenge Workshop
Brain Emulation Challenge workshop will tackle cutting-edge topics such as ground-truthing for validation, leveraging artificial datasets generated from virtual brain tissue, and the transformative potential of virtual brain platforms, such as applied to the forthcoming Brain Emulation Challenge.
Brain Emulation Challenge Workshop
Brain Emulation Challenge workshop will tackle cutting-edge topics such as ground-truthing for validation, leveraging artificial datasets generated from virtual brain tissue, and the transformative potential of virtual brain platforms, such as applied to the forthcoming Brain Emulation Challenge.
Brain Emulation Challenge Workshop
Brain Emulation Challenge workshop will tackle cutting-edge topics such as ground-truthing for validation, leveraging artificial datasets generated from virtual brain tissue, and the transformative potential of virtual brain platforms, such as applied to the forthcoming Brain Emulation Challenge.
Brain Emulation Challenge Workshop
Brain Emulation Challenge workshop will tackle cutting-edge topics such as ground-truthing for validation, leveraging artificial datasets generated from virtual brain tissue, and the transformative potential of virtual brain platforms, such as applied to the forthcoming Brain Emulation Challenge.
Brain Emulation Challenge Workshop
Brain Emulation Challenge workshop will tackle cutting-edge topics such as ground-truthing for validation, leveraging artificial datasets generated from virtual brain tissue, and the transformative potential of virtual brain platforms, such as applied to the forthcoming Brain Emulation Challenge.
Learning and Memory
This webinar on learning and memory features three experts—Nicolas Brunel, Ashok Litwin-Kumar, and Julijana Gjorgieva—who present theoretical and computational approaches to understanding how neural circuits acquire and store information across different scales. Brunel discusses calcium-based plasticity and how standard “Hebbian-like” plasticity rules inferred from in vitro or in vivo datasets constrain synaptic dynamics, aligning with classical observations (e.g., STDP) and explaining how synaptic connectivity shapes memory. Litwin-Kumar explores insights from the fruit fly connectome, emphasizing how the mushroom body—a key site for associative learning—implements a high-dimensional, random representation of sensory features. Convergent dopaminergic inputs gate plasticity, reflecting a high-dimensional “critic” that refines behavior. Feedback loops within the mushroom body further reveal sophisticated interactions between learning signals and action selection. Gjorgieva examines how activity-dependent plasticity rules shape circuitry from the subcellular (e.g., synaptic clustering on dendrites) to the cortical network level. She demonstrates how spontaneous activity during development, Hebbian competition, and inhibitory-excitatory balance collectively establish connectivity motifs responsible for key computations such as response normalization.
A Comprehensive Overview of Large Language Models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.
Trends in NeuroAI - Meta's MEG-to-image reconstruction
Trends in NeuroAI is a reading group hosted by the MedARC Neuroimaging & AI lab (https://medarc.ai/fmri). Title: Brain-optimized inference improves reconstructions of fMRI brain activity Abstract: The release of large datasets and developments in AI have led to dramatic improvements in decoding methods that reconstruct seen images from human brain activity. We evaluate the prospect of further improving recent decoding methods by optimizing for consistency between reconstructions and brain activity during inference. We sample seed reconstructions from a base decoding method, then iteratively refine these reconstructions using a brain-optimized encoding model that maps images to brain activity. At each iteration, we sample a small library of images from an image distribution (a diffusion model) conditioned on a seed reconstruction from the previous iteration. We select those that best approximate the measured brain activity when passed through our encoding model, and use these images for structural guidance during the generation of the small library in the next iteration. We reduce the stochasticity of the image distribution at each iteration, and stop when a criterion on the "width" of the image distribution is met. We show that when this process is applied to recent decoding methods, it outperforms the base decoding method as measured by human raters, a variety of image feature metrics, and alignment to brain activity. These results demonstrate that reconstruction quality can be significantly improved by explicitly aligning decoding distributions to brain activity distributions, even when the seed reconstruction is output from a state-of-the-art decoding algorithm. Interestingly, the rate of refinement varies systematically across visual cortex, with earlier visual areas generally converging more slowly and preferring narrower image distributions, relative to higher-level brain areas. Brain-optimized inference thus offers a succinct and novel method for improving reconstructions and exploring the diversity of representations across visual brain areas. Speaker: Reese Kneeland is a Ph.D. student at the University of Minnesota working in the Naselaris lab. Paper link: https://arxiv.org/abs/2312.07705
Trends in NeuroAI - SwiFT: Swin 4D fMRI Transformer
Trends in NeuroAI is a reading group hosted by the MedARC Neuroimaging & AI lab (https://medarc.ai/fmri). Title: SwiFT: Swin 4D fMRI Transformer Abstract: Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Transformer architecture that can learn brain dynamics directly from fMRI volumes in a memory and computation-efficient manner. SwiFT achieves this by implementing a 4D window multi-head self-attention mechanism and absolute positional embeddings. We evaluate SwiFT using multiple large-scale resting-state fMRI datasets, including the Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD), and UK Biobank (UKB) datasets, to predict sex, age, and cognitive intelligence. Our experimental outcomes reveal that SwiFT consistently outperforms recent state-of-the-art models. Furthermore, by leveraging its end-to-end learning capability, we show that contrastive loss-based self-supervised pre-training of SwiFT can enhance performance on downstream tasks. Additionally, we employ an explainable AI method to identify the brain regions associated with sex classification. To our knowledge, SwiFT is the first Swin Transformer architecture to process dimensional spatiotemporal brain functional data in an end-to-end fashion. Our work holds substantial potential in facilitating scalable learning of functional brain imaging in neuroscience research by reducing the hurdles associated with applying Transformer models to high-dimensional fMRI. Speaker: Junbeom Kwon is a research associate working in Prof. Jiook Cha’s lab at Seoul National University. Paper link: https://arxiv.org/abs/2307.05916
Mathematical and computational modelling of ocular hemodynamics: from theory to applications
Changes in ocular hemodynamics may be indicative of pathological conditions in the eye (e.g. glaucoma, age-related macular degeneration), but also elsewhere in the body (e.g. systemic hypertension, diabetes, neurodegenerative disorders). Thanks to its transparent fluids and structures that allow the light to go through, the eye offers a unique window on the circulation from large to small vessels, and from arteries to veins. Deciphering the causes that lead to changes in ocular hemodynamics in a specific individual could help prevent vision loss as well as aid in the diagnosis and management of diseases beyond the eye. In this talk, we will discuss how mathematical and computational modelling can help in this regard. We will focus on two main factors, namely blood pressure (BP), which drives the blood flow through the vessels, and intraocular pressure (IOP), which compresses the vessels and may impede the flow. Mechanism-driven models translates fundamental principles of physics and physiology into computable equations that allow for identification of cause-to-effect relationships among interplaying factors (e.g. BP, IOP, blood flow). While invaluable for causality, mechanism-driven models are often based on simplifying assumptions to make them tractable for analysis and simulation; however, this often brings into question their relevance beyond theoretical explorations. Data-driven models offer a natural remedy to address these short-comings. Data-driven methods may be supervised (based on labelled training data) or unsupervised (clustering and other data analytics) and they include models based on statistics, machine learning, deep learning and neural networks. Data-driven models naturally thrive on large datasets, making them scalable to a plethora of applications. While invaluable for scalability, data-driven models are often perceived as black- boxes, as their outcomes are difficult to explain in terms of fundamental principles of physics and physiology and this limits the delivery of actionable insights. The combination of mechanism-driven and data-driven models allows us to harness the advantages of both, as mechanism-driven models excel at interpretability but suffer from a lack of scalability, while data-driven models are excellent at scale but suffer in terms of generalizability and insights for hypothesis generation. This combined, integrative approach represents the pillar of the interdisciplinary approach to data science that will be discussed in this talk, with application to ocular hemodynamics and specific examples in glaucoma research.
Enhancing Qualitative Coding with Large Language Models: Potential and Challenges
Qualitative coding is the process of categorizing and labeling raw data to identify themes, patterns, and concepts within qualitative research. This process requires significant time, reflection, and discussion, often characterized by inherent subjectivity and uncertainty. Here, we explore the possibility to leverage large language models (LLM) to enhance the process and assist researchers with qualitative coding. LLMs, trained on extensive human-generated text, possess an architecture that renders them capable of understanding the broader context of a conversation or text. This allows them to extract patterns and meaning effectively, making them particularly useful for the accurate extraction and coding of relevant themes. In our current approach, we employed the chatGPT 3.5 Turbo API, integrating it into the qualitative coding process for data from the SWISS100 study, specifically focusing on data derived from centenarians' experiences during the Covid-19 pandemic, as well as a systematic centenarian literature review. We provide several instances illustrating how our approach can assist researchers with extracting and coding relevant themes. With data from human coders on hand, we highlight points of convergence and divergence between AI and human thematic coding in the context of these data. Moving forward, our goal is to enhance the prototype and integrate it within an LLM designed for local storage and operation (LLaMa). Our initial findings highlight the potential of AI-enhanced qualitative coding, yet they also pinpoint areas requiring attention. Based on these observations, we formulate tentative recommendations for the optimal integration of LLMs in qualitative coding research. Further evaluations using varied datasets and comparisons among different LLMs will shed more light on the question of whether and how to integrate these models into this domain.
Spatial and Single Cell Genomics for Next Generation Neuroscience
The advent of next generation sequencing ushered in a ten-year period of exuberant technology development, enabling the quantification of gene expression and epigenetic features within individual cells, and within intact tissue sections. In this seminar, I will outline our technological contributions, beginning with the development of Drop-seq, a method for high-throughput single cell analysis, followed by the development of Slide-seq, a technique for measuring genome-wide expression at 10 micron spatial resolution. Using a combination of these techniques, we recently constructed a comprehensive cell type atlas of the adult mouse brain, positioning cell types within individual brain structures. I will discuss the major findings from this dataset, including emerging principles of neurotransmission, and the localization of disease gene signatures to specific cell types. Finally, I will introduce a new spatial technology, Slide-tags, that unifies single cell and spatial genomics into a single, highly scalable assay.
NII Methods (journal club): NeuroQuery, comprehensive meta-analysis of human brain mapping
We will discuss a recent paper by Taylor et al. (2023): https://www.sciencedirect.com/science/article/pii/S1053811923002896. They discuss the merits of highlighting results instead of hiding them; that is, clearly marking which voxels and clusters pass a given significance threshold, but still highlighting sub-threshold results, with opacity proportional to the strength of the effect. They use this to illustrate how there in fact may be more agreement between researchers than previously thought, using the NARPS dataset as an example. By adopting a continuous, "highlighted" approach, it becomes clear that the majority of effects are in the same location and that the effect size is in the same direction, compared to an approach that only permits rejecting or not rejecting the null hypothesis. We will also talk about the implications of this approach for creating figures, detecting artifacts, and aiding reproducibility.
Estimating repetitive spatiotemporal patterns from resting-state brain activity data
Repetitive spatiotemporal patterns in resting-state brain activities have been widely observed in various species and regions, such as rat and cat visual cortices. Since they resemble the preceding brain activities during tasks, they are assumed to reflect past experiences embedded in neuronal circuits. Moreover, spatiotemporal patterns involving whole-brain activities may also reflect a process that integrates information distributed over the entire brain, such as motor and visual information. Therefore, revealing such patterns may elucidate how the information is integrated to generate consciousness. In this talk, I will introduce our proposed method to estimate repetitive spatiotemporal patterns from resting-state brain activity data and show the spatiotemporal patterns estimated from human resting-state magnetoencephalography (MEG) and electroencephalography (EEG) data. Our analyses suggest that the patterns involved whole-brain propagating activities that reflected a process to integrate the information distributed over frequencies and networks. I will also introduce our current attempt to reveal signal flows and their roles in the spatiotemporal patterns using a big dataset. - Takeda et al., Estimating repetitive spatiotemporal patterns from resting-state brain activity data. NeuroImage (2016); 133:251-65. - Takeda et al., Whole-brain propagating patterns in human resting-state brain activities. NeuroImage (2021); 245:118711.
Programmed axon death: from animal models into human disease
Programmed axon death is a widespread and completely preventable mechanism in injury and disease. Mouse and Drosophila studies define a molecular pathway involving activation of SARM1 NA Dase and its prevention by NAD synthesising enzyme NMNAT2 . Loss of axonal NMNAT2 causes its substrate, NMN , to accumulate and activate SARM1 , driving loss of NAD and changes in ATP , ROS and calcium. Animal models caused by genetic mutation, toxins, viruses or metabolic defects can be alleviated by blocking programmed axon death, for example models of CMT1B , chemotherapy-induced peripheral neuropathy (CIPN), rabies and diabetic peripheral neuropathy (DPN). The perinatal lethality of NMNAT2 null mice is completely rescued, restoring a normal, healthy lifespan. Animal models lack the genetic and environmental diversity present in human populations and this is problematic for modelling gene-environment combinations, for example in CIPN and DPN , and identifying rare, pathogenic mutations. Instead, by testing human gene variants in WGS datasets for loss- and gain-of-function, we identified enrichment of rare SARM1 gain-of-function variants in sporadic ALS , despite previous negative findings in SOD1 transgenic mice. We have shown in mice that heterozygous SARM1 loss-of-function is protective from a range of axonal stresses and that naturally-occurring SARM1 loss-of-function alleles are present in human populations. This enables new approaches to identify disorders where blocking SARM1 may be therapeutically useful, and the existence of two dominant negative human variants in healthy adults is some of the best evidence available that drugs blocking SARM1 are likely to be safe. Further loss- and gain-of-function variants in SARM1 and NMNAT2 are being identified and used to extend and strengthen the evidence of association with neurological disorders. We aim to identify diseases, and specific patients, in whom SARM1 -blocking drugs are most likely to be effective.
Sampling the environment with body-brain rhythms
Since Darwin, comparative research has shown that most animals share basic timing capacities, such as the ability to process temporal regularities and produce rhythmic behaviors. What seems to be more exclusive, however, are the capacities to generate temporal predictions and to display anticipatory behavior at salient time points. These abilities are associated with subcortical structures like basal ganglia (BG) and cerebellum (CE), which are more developed in humans as compared to nonhuman animals. In the first research line, we investigated the basic capacities to extract temporal regularities from the acoustic environment and produce temporal predictions. We did so by adopting a comparative and translational approach, thus making use of a unique EEG dataset including 2 macaque monkeys, 20 healthy young, 11 healthy old participants and 22 stroke patients, 11 with focal lesions in the BG and 11 in the CE. In the second research line, we holistically explore the functional relevance of body-brain physiological interactions in human behavior. Thus, a series of planned studies investigate the functional mechanisms by which body signals (e.g., respiratory and cardiac rhythms) interact with and modulate neurocognitive functions from rest and sleep states to action and perception. This project supports the effort towards individual profiling: are individuals’ timing capacities (e.g., rhythm perception and production), and general behavior (e.g., individual walking and speaking rates) influenced / shaped by body-brain interactions?
Lifelong Learning AI via neuro inspired solutions
AI embedded in real systems, such as in satellites, robots and other autonomous devices, must make fast, safe decisions even when the environment changes, or under limitations on the available power; to do so, such systems must be adaptive in real time. To date, edge computing has no real adaptivity – rather the AI must be trained in advance, typically on a large dataset with much computational power needed; once fielded, the AI is frozen: It is unable to use its experience to operate if environment proves outside its training or to improve its expertise; and worse, since datasets cannot cover all possible real-world situations, systems with such frozen intelligent control are likely to fail. Lifelong Learning is the cutting edge of artificial intelligence - encompassing computational methods that allow systems to learn in runtime and incorporate learning for application in new, unanticipated situations. Until recently, this sort of computation has been found exclusively in nature; thus, Lifelong Learning looks to nature, and in particular neuroscience, for its underlying principles and mechanisms and then translates them to this new technology. Our presentation will introduce a number of state-of-the-art approaches to achieve AI adaptive learning, including from the DARPA’s L2M program and subsequent developments. Many environments are affected by temporal changes, such as the time of day, week, season, etc. A way to create adaptive systems which are both small and robust is by making them aware of time and able to comprehend temporal patterns in the environment. We will describe our current research in temporal AI, while also considering power constraints.
Multi-level theory of neural representations in the era of large-scale neural recordings: Task-efficiency, representation geometry, and single neuron properties
A central goal in neuroscience is to understand how orchestrated computations in the brain arise from the properties of single neurons and networks of such neurons. Answering this question requires theoretical advances that shine light into the ‘black box’ of representations in neural circuits. In this talk, we will demonstrate theoretical approaches that help describe how cognitive and behavioral task implementations emerge from the structure in neural populations and from biologically plausible neural networks. First, we will introduce an analytic theory that connects geometric structures that arise from neural responses (i.e., neural manifolds) to the neural population’s efficiency in implementing a task. In particular, this theory describes a perceptron’s capacity for linearly classifying object categories based on the underlying neural manifolds’ structural properties. Next, we will describe how such methods can, in fact, open the ‘black box’ of distributed neuronal circuits in a range of experimental neural datasets. In particular, our method overcomes the limitations of traditional dimensionality reduction techniques, as it operates directly on the high-dimensional representations, rather than relying on low-dimensionality assumptions for visualization. Furthermore, this method allows for simultaneous multi-level analysis, by measuring geometric properties in neural population data, and estimating the amount of task information embedded in the same population. These geometric frameworks are general and can be used across different brain areas and task modalities, as demonstrated in the work of ours and others, ranging from the visual cortex to parietal cortex to hippocampus, and from calcium imaging to electrophysiology to fMRI datasets. Finally, we will discuss our recent efforts to fully extend this multi-level description of neural populations, by (1) investigating how single neuron properties shape the representation geometry in early sensory areas, and by (2) understanding how task-efficient neural manifolds emerge in biologically-constrained neural networks. By extending our mathematical toolkit for analyzing representations underlying complex neuronal networks, we hope to contribute to the long-term challenge of understanding the neuronal basis of tasks and behaviors.
Linking GWAS to pharmacological treatments for psychiatric disorders
Genome-wide association studies (GWAS) have identified multiple disease-associated genetic variations across different psychiatric disorders raising the question of how these genetic variants relate to the corresponding pharmacological treatments. In this talk, I will outline our work investigating whether functional information from a range of open bioinformatics datasets such as protein interaction network (PPI), brain eQTL, and gene expression pattern across the brain can uncover the relationship between GWAS-identified genetic variation and the genes targeted by current drugs for psychiatric disorders. Focusing on four psychiatric disorders---ADHD, bipolar disorder, schizophrenia, and major depressive disorder---we assess relationships between the gene targets of drug treatments and GWAS hits and show that while incorporating information derived from functional bioinformatics data, such as the PPI network and spatial gene expression, can reveal links for bipolar disorder, the overall correspondence between treatment targets and GWAS-implicated genes in psychiatric disorders rarely exceeds null expectations. This relatively low degree of correspondence across modalities suggests that the genetic mechanisms driving the risk for psychiatric disorders may be distinct from the pathophysiological mechanisms used for targeting symptom manifestations through pharmacological treatments and that novel approaches for understanding and treating psychiatric disorders may be required.
Do we measure what we think we are measuring?
Tests used in the empirical sciences are often (implicitly) assumed to be representative of a target mechanism in the sense that similar tests should lead to similar results. In this talk, using resting-state electroencephalogram (EEG) as an example, I will argue that this assumption does not necessarily hold true. Typically EEG studies are conducted selecting one analysis method thought to be representative of the research question asked. Using multiple methods, we extracted a variety of features from a single resting-state EEG dataset and conducted correlational and case-control analyses. We found that many EEG features revealed a significant effect in the case-control analyses. Similarly, EEG features correlated significantly with cognitive tasks. However, when we compared these features pairwise, we did not find strong correlations. A number of explanations to these results will be discussed.
Pynapple: a light-weight python package for neural data analysis - webinar + tutorial
In systems neuroscience, datasets are multimodal and include data-streams of various origins: multichannel electrophysiology, 1- or 2-p calcium imaging, behavior, etc. Often, the exact nature of data streams are unique to each lab, if not each project. Analyzing these datasets in an efficient and open way is crucial for collaboration and reproducibility. In this combined webinar and tutorial, Adrien Peyrache and Guillaume Viejo will present Pynapple, a Python-based data analysis pipeline for systems neuroscience. Designed for flexibility and versatility, Pynapple allows users to perform cross-modal neural data analysis via a common programming approach which facilitates easy sharing of both analysis code and data.
Pynapple: a light-weight python package for neural data analysis - webinar + tutorial
In systems neuroscience, datasets are multimodal and include data-streams of various origins: multichannel electrophysiology, 1- or 2-p calcium imaging, behavior, etc. Often, the exact nature of data streams are unique to each lab, if not each project. Analyzing these datasets in an efficient and open way is crucial for collaboration and reproducibility. In this combined webinar and tutorial, Adrien Peyrache and Guillaume Viejo will present Pynapple, a Python-based data analysis pipeline for systems neuroscience. Designed for flexibility and versatility, Pynapple allows users to perform cross-modal neural data analysis via a common programming approach which facilitates easy sharing of both analysis code and data.
Malignant synaptic plasticity in pediatric high-grade gliomas
Pediatric high-grade gliomas (pHGG) are a devastating group of diseases that urgently require novel therapeutic options. We have previously demonstrated that pHGGs directly synapse onto neurons and the subsequent tumor cell depolarization, mediated by calcium-permeable AMPA channels, promotes their proliferation. The regulatory mechanisms governing these postsynaptic connections are unknown. Here, we investigated the role of BDNF-TrkB signaling in modulating the plasticity of the malignant synapse. BDNF ligand activation of its canonical receptor, TrkB (which is encoded for by the gene NTRK2), has been shown to be one important modulator of synaptic regulation in the normal setting. Electrophysiological recordings of glioma cell membrane properties, in response to acute neurotransmitter stimulation, demonstrate in an inward current resembling AMPA receptor (AMPAR) mediated excitatory neurotransmission. Extracellular BDNF increases the amplitude of this glutamate-induced tumor cell depolarization and this effect is abrogated in NTRK2 knockout glioma cells. Upon examining tumor cell excitability using in situ calcium imaging, we found that BDNF increases the intensity of glutamate-evoked calcium transients in GCaMP6s expressing glioma cells. Western blot analysis indicates the tumors AMPAR properties are altered downstream of BDNF induced TrkB activation in glioma. Cell membrane protein capture (via biotinylation) and live imaging of pH sensitive GFP-tagged AMPAR subunits demonstrate an increase of calcium permeable channels at the tumors postsynaptic membrane in response to BDNF. We find that BDNF-TrkB signaling promotes neuron-to-glioma synaptogenesis as measured by high-resolution confocal and electron microscopy in culture and tumor xenografts. Our analysis of published pHGG transcriptomic datasets, together with brain slice conditioned medium experiments in culture, indicates the tumor microenvironment as the chief source of BDNF ligand. Disruption of the BDNF-TrkB pathway in patient-derived orthotopic glioma xenograft models, both genetically and pharmacologically, results in an increased overall survival and reduced tumor proliferation rate. These findings suggest that gliomas leverage normal mechanisms of plasticity to modulate the excitatory channels involved in synaptic neurotransmission and they reveal the potential to target the regulatory components of glioma circuit dynamics as a therapeutic strategy for these lethal cancers.
Mesmerize: A blueprint for shareable and reproducible analysis of calcium imaging data
Mesmerize is a platform for the annotation and analysis of neuronal calcium imaging data. Mesmerize encompasses the entire process of calcium imaging analysis from raw data to interactive visualizations. Mesmerize allows you to create FAIR-functionally linked datasets that are easy to share. The analysis tools are applicable for a broad range of biological experiments and come with GUI interfaces that can be used without requiring a programming background.
Network science and network medicine: New strategies for understanding and treating the biological basis of mental ill-health
The last twenty years have witnessed extraordinarily rapid progress in basic neuroscience, including breakthrough technologies such as optogenetics, and the collection of unprecedented amounts of neuroimaging, genetic and other data relevant to neuroscience and mental health. However, the translation of this progress into improved understanding of brain function and dysfunction has been comparatively slow. As a result, the development of therapeutics for mental health has stagnated too. One central challenge has been to extract meaning from these large, complex, multivariate datasets, which requires a shift towards systems-level mathematical and computational approaches. A second challenge has been reconciling different scales of investigation, from genes and molecules to cells, circuits, tissue, whole-brain, and ultimately behaviour. In this talk I will describe several strands of work using mathematical, statistical, and bioinformatic methods to bridge these gaps. Topics will include: using artificial neural networks to link the organization of large-scale brain connectivity to cognitive function; using multivariate statistical methods to link disease-related changes in brain networks to the underlying biological processes; and using network-based approaches to move from genetic insights towards drug discovey. Finally, I will discuss how simple organisms such as C. elegans can serve to inspire, test, and validate new methods and insights in networks neuroscience.
Brain chart for the human lifespan
Over the past few decades, neuroimaging has become a ubiquitous tool in basic research and clinical studies of the human brain. However, no reference standards currently exist to quantify individual differences in neuroimaging metrics over time, in contrast to growth charts for anthropometric traits such as height and weight. Here, we built an interactive resource to benchmark brain morphology, www.brainchart.io, derived from any current or future sample of magnetic resonance imaging (MRI) data. With the goal of basing these reference charts on the largest and most inclusive dataset available, we aggregated 123,984 MRI scans from 101,457 participants aged from 115 days post-conception through 100 postnatal years, across more than 100 primary research studies. Cerebrum tissue volumes and other global or regional MRI metrics were quantified by centile scores, relative to non-linear trajectories of brain structural changes, and rates of change, over the lifespan. Brain charts identified previously unreported neurodevelopmental milestones; showed high stability of individual centile scores over longitudinal assessments; and demonstrated robustness to technical and methodological differences between primary studies. Centile scores showed increased heritability compared to non-centiled MRI phenotypes, and provided a standardised measure of atypical brain structure that revealed patterns of neuroanatomical variation across neurological and psychiatric disorders. In sum, brain charts are an essential first step towards robust quantification of individual deviations from normative trajectories in multiple, commonly-used neuroimaging phenotypes. Our collaborative study proves the principle that brain charts are achievable on a global scale over the entire lifespan, and applicable to analysis of diverse developmental and clinical effects on human brain structure.
Towards a More Authentic Vision of the (multi)Coding Potential of RNA
Ten of thousands of open reading frames (ORFs) are hidden within transcripts. They have eluded annotations because they are either small or within unsuspected locations. These are named alternative ORFs (altORFs) or small ORFs and have recently been highlighted by innovative proteogenomic approaches, such as our OpenProt resource, revealing their existence and implications in biological functions. Due to the absence of altORFs from annotations, pathogenic mutations within these are being ignored. I will discuss our latest progress on the re-analysis of large-scale proteomics datasets to improve our knowledge of proteomic diversity, and the functional characterization of a second protein coded by the FUS gene. Finally, I will explain the need to map the coding potential of the transcriptome using artificial intelligence rather than with conventional annotations that do not capture the full translational activity of ribosomes.
CaImAn: large-scale batch and online analysis of calcium imaging data
Advances in fluorescence microscopy enable monitoring larger brain areas in-vivo with finer time resolution. The resulting data rates require reproducible analysis pipelines that are reliable, fully automated, and scalable to datasets generated over the course of months. We present CaImAn, an open-source library for calcium imaging data analysis. CaImAn provides automatic and scalable methods to address problems common to pre-processing, including motion correction, neural activity identification, and registration across different sessions of data collection. It does this while requiring minimal user intervention, with good scalability on computers ranging from laptops to high-performance computing clusters. CaImAn is suitable for two-photon and one-photon imaging, and also enables real-time analysis on streaming data. To benchmark the performance of CaImAn we collected and combined a corpus of manual annotations from multiple labelers on nine mouse two-photon datasets. We demonstrate that CaImAn achieves near-human performance in detecting locations of active neurons.
NMC4 Short Talk: What can 140,000 Reaches Tell Us About Demographic Contributions to Visuomotor Adaptation?
Motor learning is typically assessed in the lab, affording a high degree of control over the task environment. However, this level of control often comes at the cost of smaller sample sizes and a homogenous pool of participants (e.g. college students). To address this, we have designed a web-based motor learning experiment, making it possible to reach a larger, more diverse set of participants. As a proof-of-concept, we collected 1,581 participants completing a visuomotor rotation task, where participants controlled a visual cursor on the screen with their mouse and trackpad. Motor learning was indexed by how fast participants were able to compensate for a 45° rotation imposed between the cursor and their actual movement. Using a cross-validated LASSO regression, we found that motor learning varied significantly with the participant’s age and sex, and also strongly correlated with the location of the target, visual acuity, and satisfaction with the experiment. In contrast, participants' mouse and browser type were features eliminated by the model, indicating that motor performance was not influenced by variations in computer hardware and software. Together, this proof-of-concept study demonstrates how large datasets can generate important insights into the factors underlying motor learning.
NMC4 Short Talk: Novel population of synchronously active pyramidal cells in hippocampal area CA1
Hippocampal pyramidal cells have been widely studied during locomotion, when theta oscillations are present, and during short wave ripples at rest, when replay takes place. However, we find a subset of pyramidal cells that are preferably active during rest, in the absence of theta oscillations and short wave ripples. We recorded these cells using two-photon imaging in dorsal CA1 of the hippocampus of mice, during a virtual reality object location recognition task. During locomotion, the cells show a similar level of activity as control cells, but their activity increases during rest, when this population of cells shows highly synchronous, oscillatory activity at a low frequency (0.1-0.4 Hz). In addition, during both locomotion and rest these cells show place coding, suggesting they may play a role in maintaining a representation of the current location, even when the animal is not moving. We performed simultaneous electrophysiological and calcium recordings, which showed a higher correlation of activity between the LFO and the hippocampal cells in the 0.1-0.4 Hz low frequency band during rest than during locomotion. However, the relationship between the LFO and calcium signals varied between electrodes, suggesting a localized effect. We used the Allen Brain Observatory Neuropixels Visual Coding dataset to further explore this. These data revealed localised low frequency oscillations in CA1 and DG during rest. Overall, we show a novel population of hippocampal cells, and a novel oscillatory band of activity in hippocampus during rest.
NMC4 Short Talk: Rank similarity filters for computationally-efficient machine learning on high dimensional data
Real world datasets commonly contain nonlinearly separable classes, requiring nonlinear classifiers. However, these classifiers are less computationally efficient than their linear counterparts. This inefficiency wastes energy, resources and time. We were inspired by the efficiency of the brain to create a novel type of computationally efficient Artificial Neural Network (ANN) called Rank Similarity Filters. They can be used to both transform and classify nonlinearly separable datasets with many datapoints and dimensions. The weights of the filters are set using the rank orders of features in a datapoint, or optionally the 'confusion' adjusted ranks between features (determined from their distributions in the dataset). The activation strength of a filter determines its similarity to other points in the dataset, a measure based on cosine similarity. The activation of many Rank Similarity Filters transforms samples into a new nonlinear space suitable for linear classification (Rank Similarity Transform (RST)). We additionally used this method to create the nonlinear Rank Similarity Classifier (RSC), which is a fast and accurate multiclass classifier, and the nonlinear Rank Similarity Probabilistic Classifier (RSPC), which is an extension to the multilabel case. We evaluated the classifiers on multiple datasets and RSC is competitive with existing classifiers but with superior computational efficiency. Code for RST, RSC and RSPC is open source and was written in Python using the popular scikit-learn framework to make it easily accessible (https://github.com/KatharineShapcott/rank-similarity). In future extensions the algorithm can be applied to hardware suitable for the parallelization of an ANN (GPU) and a Spiking Neural Network (neuromorphic computing) with corresponding performance gains. This makes Rank Similarity Filters a promising biologically inspired solution to the problem of efficient analysis of nonlinearly separable data.
NMC4 Short Talk: Hypothesis-neutral response-optimized models of higher-order visual cortex reveal strong semantic selectivity
Modeling neural responses to naturalistic stimuli has been instrumental in advancing our understanding of the visual system. Dominant computational modeling efforts in this direction have been deeply rooted in preconceived hypotheses. In contrast, hypothesis-neutral computational methodologies with minimal apriorism which bring neuroscience data directly to bear on the model development process are likely to be much more flexible and effective in modeling and understanding tuning properties throughout the visual system. In this study, we develop a hypothesis-neutral approach and characterize response selectivity in the human visual cortex exhaustively and systematically via response-optimized deep neural network models. First, we leverage the unprecedented scale and quality of the recently released Natural Scenes Dataset to constrain parametrized neural models of higher-order visual systems and achieve novel predictive precision, in some cases, significantly outperforming the predictive success of state-of-the-art task-optimized models. Next, we ask what kinds of functional properties emerge spontaneously in these response-optimized models? We examine trained networks through structural ( feature visualizations) as well as functional analysis (feature verbalizations) by running `virtual' fMRI experiments on large-scale probe datasets. Strikingly, despite no category-level supervision, since the models are solely optimized for brain response prediction from scratch, the units in the networks after optimization act as detectors for semantic concepts like `faces' or `words', thereby providing one of the strongest evidences for categorical selectivity in these visual areas. The observed selectivity in model neurons raises another question: are the category-selective units simply functioning as detectors for their preferred category or are they a by-product of a non-category-specific visual processing mechanism? To investigate this, we create selective deprivations in the visual diet of these response-optimized networks and study semantic selectivity in the resulting `deprived' networks, thereby also shedding light on the role of specific visual experiences in shaping neuronal tuning. Together with this new class of data-driven models and novel model interpretability techniques, our study illustrates that DNN models of visual cortex need not be conceived as obscure models with limited explanatory power, rather as powerful, unifying tools for probing the nature of representations and computations in the brain.
NMC4 Short Talk: Image embeddings informed by natural language improve predictions and understanding of human higher-level visual cortex
To better understand human scene understanding, we extracted features from images using CLIP, a neural network model of visual concept trained with supervision from natural language. We then constructed voxelwise encoding models to explain whole brain responses arising from viewing natural images from the Natural Scenes Dataset (NSD) - a large-scale fMRI dataset collected at 7T. Our results reveal that CLIP, as compared to convolution based image classification models such as ResNet or AlexNet, as well as language models such as BERT, gives rise to representations that enable better prediction performance - up to a 0.86 correlation with test data and an r-square of 0.75 - in higher-level visual cortex in humans. Moreover, CLIP representations explain distinctly unique variance in these higher-level visual areas as compared to models trained with only images or text. Control experiments show that the improvement in prediction observed with CLIP is not due to architectural differences (transformer vs. convolution) or to the encoding of image captions per se (vs. single object labels). Together our results indicate that CLIP and, more generally, multimodal models trained jointly on images and text, may serve as better candidate models of representation in human higher-level visual cortex. The bridge between language and vision provided by jointly trained models such as CLIP also opens up new and more semantically-rich ways of interpreting the visual brain.
NMC4 Short Talk: Directly interfacing brain and deep networks exposes non-hierarchical visual processing
A recent approach to understanding the mammalian visual system is to show correspondence between the sequential stages of processing in the ventral stream with layers in a deep convolutional neural network (DCNN), providing evidence that visual information is processed hierarchically, with successive stages containing ever higher-level information. However, correspondence is usually defined as shared variance between brain region and model layer. We propose that task-relevant variance is a stricter test: If a DCNN layer corresponds to a brain region, then substituting the model’s activity with brain activity should successfully drive the model’s object recognition decision. Using this approach on three datasets (human fMRI and macaque neuron firing rates) we found that in contrast to the hierarchical view, all ventral stream regions corresponded best to later model layers. That is, all regions contain high-level information about object category. We hypothesised that this is due to recurrent connections propagating high-level visual information from later regions back to early regions, in contrast to the exclusively feed-forward connectivity of DCNNs. Using task-relevant correspondence with a late DCNN layer akin to a tracer, we used Granger causal modelling to show late-DCNN correspondence in IT drives correspondence in V4. Our analysis suggests, effectively, that no ventral stream region can be appropriately characterised as ‘early’ beyond 70ms after stimulus presentation, challenging hierarchical models. More broadly, we ask what it means for a model component and brain region to correspond: beyond quantifying shared variance, we must consider the functional role in the computation. We also demonstrate that using a DCNN to decode high-level conceptual information from ventral stream produces a general mapping from brain to model activation space, which generalises to novel classes held-out from training data. This suggests future possibilities for brain-machine interface with high-level conceptual information, beyond current designs that interface with the sensorimotor periphery.
NMC4 Keynote: Latent variable modeling of neural population dynamics - where do we go from here?
Large-scale recordings of neural activity are providing new opportunities to study network-level dynamics with unprecedented detail. However, the sheer volume of data and its dynamical complexity are major barriers to uncovering and interpreting these dynamics. I will present machine learning frameworks that enable inference of dynamics from neuronal population spiking activity on single trials and millisecond timescales, from diverse brain areas, and without regard to behavior. I will then demonstrate extensions that allow recovery of dynamics from two-photon calcium imaging data with surprising precision. Finally, I will discuss our efforts to facilitate comparisons within our field by curating datasets and standardizing model evaluation, including a currently active modeling challenge, the 2021 Neural Latents Benchmark [neurallatents.github.io].
When and (maybe) why do high-dimensional neural networks produce low-dimensional dynamics?
There is an avalanche of new data on activity in neural networks and the biological brain, revealing the collective dynamics of vast numbers of neurons. In principle, these collective dynamics can be of almost arbitrarily high dimension, with many independent degrees of freedom — and this may reflect powerful capacities for general computing or information. In practice, neural datasets reveal a range of outcomes, including collective dynamics of much lower dimension — and this may reflect other desiderata for neural codes. For what networks does each case occur? We begin by exploring bottom-up mechanistic ideas that link tractable statistical properties of network connectivity with the dimension of the activity that they produce. We then cover “top-down” ideas that describe how features of connectivity and dynamics that impact dimension arise as networks learn to perform fundamental computational tasks.
Efficient GPU training of SNNs using approximate RTRL
Last year’s SNUFA workshop report concluded “Moving toward neuron numbers comparable with biology and applying these networks to real-world data-sets will require the development of novel algorithms, software libraries, and dedicated hardware accelerators that perform well with the specifics of spiking neural networks” [1]. Taking inspiration from machine learning libraries — where techniques such as parallel batch training minimise latency and maximise GPU occupancy — as well as our previous research on efficiently simulating SNNs on GPUs for computational neuroscience [2,3], we are extending our GeNN SNN simulator to pursue this vision. To explore GeNN’s potential, we use the eProp learning rule [4] — which approximates RTRL — to train SNN classifiers on the Spiking Heidelberg Digits and the Spiking Sequential MNIST datasets. We find that the performance of these classifiers is comparable to those trained using BPTT [5] and verify that the theoretical advantages of neuron models with adaptation dynamics [5] translate to improved classification performance. We then measured execution times and found that training an SNN classifier using GeNN and eProp becomes faster than SpyTorch and BPTT after less than 685 timesteps and much larger models can be trained on the same GPU when using GeNN. Furthermore, we demonstrate that our implementation of parallel batch training improves training performance by over 4⨉ and enables near-perfect scaling across multiple GPUs. Finally, we show that performing inference using a recurrent SNN using GeNN uses less energy and has lower latency than a comparable LSTM simulated with TensorFlow [6].
Event-based Backpropagation for Exact Gradients in Spiking Neural Networks
Gradient-based optimization powered by the backpropagation algorithm proved to be the pivotal method in the training of non-spiking artificial neural networks. At the same time, spiking neural networks hold the promise for efficient processing of real-world sensory data by communicating using discrete events in continuous time. We derive the backpropagation algorithm for a recurrent network of spiking (leaky integrate-and-fire) neurons with hard thresholds and show that the backward dynamics amount to an event-based backpropagation of errors through time. Our derivation uses the jump conditions for partial derivatives at state discontinuities found by applying the implicit function theorem, allowing us to avoid approximations or substitutions. We find that the gradient exists and is finite almost everywhere in weight space, up to the null set where a membrane potential is precisely tangent to the threshold. Our presented algorithm, EventProp, computes the exact gradient with respect to a general loss function based on spike times and membrane potentials. Crucially, the algorithm allows for an event-based communication scheme in the backward phase, retaining the potential advantages of temporal sparsity afforded by spiking neural networks. We demonstrate the optimization of spiking networks using gradients computed via EventProp and the Yin-Yang and MNIST datasets with either a spike time-based or voltage-based loss function and report competitive performance. Our work supports the rigorous study of gradient-based optimization in spiking neural networks as well as the development of event-based neuromorphic architectures for the efficient training of spiking neural networks. While we consider the leaky integrate-and-fire model in this work, our methodology generalises to any neuron model defined as a hybrid dynamical system.
StereoSpike: Depth Learning with a Spiking Neural Network
Depth estimation is an important computer vision task, useful in particular for navigation in autonomous vehicles, or for object manipulation in robotics. Here we solved it using an end-to-end neuromorphic approach, combining two event-based cameras and a Spiking Neural Network (SNN) with a slightly modified U-Net-like encoder-decoder architecture, that we named StereoSpike. More specifically, we used the Multi Vehicle Stereo Event Camera Dataset (MVSEC). It provides a depth ground-truth, which was used to train StereoSpike in a supervised manner, using surrogate gradient descent. We propose a novel readout paradigm to obtain a dense analog prediction –the depth of each pixel– from the spikes of the decoder. We demonstrate that this architecture generalizes very well, even better than its non-spiking counterparts, leading to state-of-the-art test accuracy. To the best of our knowledge, it is the first time that such a large-scale regression problem is solved by a fully spiking network. Finally, we show that low firing rates (<10%) can be obtained via regularization, with a minimal cost in accuracy. This means that StereoSpike could be implemented efficiently on neuromorphic chips, opening the door for low power real time embedded systems.
Rastermap: Extracting structure from high dimensional neural data
Large-scale neural recordings contain high-dimensional structure that cannot be easily captured by existing data visualization methods. We therefore developed an embedding algorithm called Rastermap, which captures highly nonlinear relationships between neurons, and provides useful visualizations by assigning each neuron to a location in the embedding space. Compared to standard algorithms such as t-SNE and UMAP, Rastermap finds finer and higher dimensional patterns of neural variability, as measured by quantitative benchmarks. We applied Rastermap to a variety of datasets, including spontaneous neural activity, neural activity during a virtual reality task, widefield neural imaging data during a 2AFC task, artificial neural activity from an agent playing atari games, and neural responses to visual textures. We found within these datasets unique subpopulations of neurons encoding abstract properties of the environment.
Fundamentals of PyTorch: Building a Model Step-by-Step
In this workshop you'll learn the fundamentals of PyTorch using an incremental, from-first-principles approach. We'll start with tensors, autograd, and the dynamic computation graph, and then move on to developing and training a simple model using PyTorch's model classes, datasets, data loaders, optimizers, and more. You should be comfortable using Python, Jupyter notebooks, Google Colab, Numpy and, preferably, object oriented programming.
Autopilot v0.4.0 - Distributing development of a distributed experimental framework
Autopilot is a Python framework for performing complex behavioral neuroscience experiments by coordinating a swarm of Raspberry Pis. It was designed to not only give researchers a tool that allows them to perform the hardware-intensive experiments necessary for the next generation of naturalistic neuroscientific observation, but also to make it easier for scientists to be good stewards of the human knowledge project. Specifically, we designed Autopilot as a framework that lets its users contribute their technical expertise to a cumulative library of hardware interfaces and experimental designs, and produce data that is clean at the time of acquisition to lower barriers to open scientific practices. As autopilot matures, we have been progressively making these aspirations a reality. Currently we are preparing the release of Autopilot v0.4.0, which will include a new plugin system and wiki that makes use of semantic web technology to make a technical and contextual knowledge repository. By combining human readable text and semantic annotations in a wiki that makes contribution as easy as possible, we intend to make a communal knowledge system that gives a mechanism for sharing the contextual technical knowledge that is always excluded from methods sections, but is nonetheless necessary to perform cutting-edge experiments. By integrating it with Autopilot, we hope to make a first of its kind system that allows researchers to fluidly blend technical knowledge and open source hardware designs with the software necessary to use them. Reciprocally, we also hope that this system will support a kind of deep provenance that makes abstract "custom apparatus" statements in methods sections obsolete, allowing the scientific community to losslessly and effortlessly trace a dataset back to the code and hardware designs needed to replicate it. I will describe the basic architecture of Autopilot, recent work on its community contribution ecosystem, and the vision for the future of its development.
Learning the structure and investigating the geometry of complex networks
Networks are widely used as mathematical models of complex systems across many scientific disciplines, and in particular within neuroscience. In this talk, we introduce two aspects of our collaborative research: (1) machine learning and networks, and (2) graph dimensionality. Machine learning and networks. Decades of work have produced a vast corpus of research characterising the topological, combinatorial, statistical and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and sometimes overlapping) characteristics of a network. We have developed hcga, a framework for highly comparative analysis of graph data sets that computes several thousands of graph features from any given network. Taking inspiration from hctsa, hcga offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterisation of graph data sets. We show that hcga outperforms other methodologies (including deep learning) on supervised classification tasks on benchmark data sets whilst retaining the interpretability of network features, which we exemplify on a dataset of neuronal morphologies images. Graph dimensionality. Dimension is a fundamental property of objects and the space in which they are embedded. Yet ideal notions of dimension, as in Euclidean spaces, do not always translate to physical spaces, which can be constrained by boundaries and distorted by inhomogeneities, or to intrinsically discrete systems such as networks. Deviating from approaches based on fractals, here, we present a new framework to define intrinsic notions of dimension on networks, the relative, local and global dimension. We showcase our method on various physical systems.
Exploring perceptual similarity and its relation to image-based spaces: an effect of familiarity
One challenge in exploring the internal representation of faces is the lack of controlled stimuli transformations. Researchers are often limited to verbalizable transformations in the creation of a dataset. An alternative approach to verbalization for interpretability is finding image-based measures that allow us to quantify image transformations. In this study, we explore whether PCA could be used to create controlled transformations to a face by testing the effect of these transformations on human perceptual similarity and on computational differences in Gabor, Pixel and DNN spaces. We found that perceptual similarity and the three image-based spaces are linearly related, almost perfectly in the case of the DNN, with a correlation of 0.94. This provides a controlled way to alter the appearance of a face. In experiment 2, the effect of familiarity on the perception of multidimensional transformations was explored. Our findings show that there is a positive relationship between the number of components transformed and both the perceptual similarity and the same three image-based spaces used in experiment 1. Furthermore, we found that familiar faces are rated more similar overall than unfamiliar faces. That is, a change to a familiar face is perceived as making less difference than the exact same change to an unfamiliar face. The ability to quantify, and thus control, these transformations is a powerful tool in exploring the factors that mediate a change in perceived identity.
Characterising the brain representations behind variations in real-world visual behaviour
Not all individuals are equally competent at recognizing the faces they interact with. Revealing how the brains of different individuals support variations in this ability is a crucial step to develop an understanding of real-world human visual behaviour. In this talk, I will present findings from a large high-density EEG dataset (>100k trials of participants processing various stimulus categories) and computational approaches which aimed to characterise the brain representations behind real-world proficiency of “super-recognizers”—individuals at the top of face recognition ability spectrum. Using decoding analysis of time-resolved EEG patterns, we predicted with high precision the trial-by-trial activity of super-recognizers participants, and showed that evidence for face recognition ability variations is disseminated along early, intermediate and late brain processing steps. Computational modeling of the underlying brain activity uncovered two representational signatures supporting higher face recognition ability—i) mid-level visual & ii) semantic computations. Both components were dissociable in brain processing-time (the first around the N170, the last around the P600) and levels of computations (the first emerging from mid-level layers of visual Convolutional Neural Networks, the last from a semantic model characterising sentence descriptions of images). I will conclude by presenting ongoing analyses from a well-known case of acquired prosopagnosia (PS) using similar computational modeling of high-density EEG activity.
Zero-shot visual reasoning with probabilistic analogical mapping
There has been a recent surge of interest in the question of whether and how deep learning algorithms might be capable of abstract reasoning, much of which has centered around datasets based on Raven’s Progressive Matrices (RPM), a visual analogy problem set commonly employed to assess fluid intelligence. This has led to the development of algorithms that are capable of solving RPM-like problems directly from pixel-level inputs. However, these algorithms require extensive direct training on analogy problems, and typically generalize poorly to novel problem types. This is in stark contrast to human reasoners, who are capable of solving RPM and other analogy problems zero-shot — that is, with no direct training on those problems. Indeed, it’s this capacity for zero-shot reasoning about novel problem types, i.e. fluid intelligence, that RPM was originally designed to measure. I will present some results from our recent efforts to model this capacity for zero-shot reasoning, based on an extension of a recently proposed approach to analogical mapping we refer to as Probabilistic Analogical Mapping (PAM). Our RPM model uses deep learning to extract attributed graph representations from pixel-level inputs, and then performs alignment of objects between source and target analogs using gradient descent to optimize a graph-matching objective. This extended version of PAM features a number of new capabilities that underscore the flexibility of the overall approach, including 1) the capacity to discover solutions that emphasize either object similarity or relation similarity, based on the demands of a given problem, 2) the ability to extract a schema representing the overall abstract pattern that characterizes a problem, and 3) the ability to directly infer the answer to a problem, rather than relying on a set of possible answer choices. This work suggests that PAM is a promising framework for modeling human zero-shot reasoning.
Digitization as a driving force for collaboration in neuroscience
Many of the collaborations we encounter in our scientific careers are centered on a common idea that can be associated with certain resources, such as a dataset, an algorithm, or a model. All partners in a collaboration need to develop a common understanding of these resources, and need to be able to access them in a simple and unambiguous manner in order to avoid incorrect conclusions especially in highly cross-disciplinary contexts. While digital computers have entered to assist scientific workflows in experiment and simulation for many decades, the high degree of heterogeneity in the field had led to a scattered landscape of highly customized, lab-internal solutions to organizing and managing the resources on a project-by-project basis. Only with the availability of modern technologies such as the semantic web, platforms for collaborative coding or the development of data standards overarching different disciplines, we have tools at our disposal to make resources increasingly more accessible, understandable, and usable. However, without overarching standardization efforts and adaptation of such technologies to the workflows and needs of individual researchers, their adoption by the neuroscience community will be impeded. From the perspective of computational neuroscience, which is inherently dependent on leveraging data and methods across the field of neuroscience for inspiration and validation, I will outline my view on past and present developments towards a more rigorous use of digital resources and how they improved collaboration, and introduce emerging initiatives to support this process in the future (e.g., EBRAINS http://ebrains.eu, NFDI-Neuro http://www.nfdi-neuro.de).
Understanding neural dynamics in high dimensions across multiple timescales: from perception to motor control and learning
Remarkable advances in experimental neuroscience now enable us to simultaneously observe the activity of many neurons, thereby providing an opportunity to understand how the moment by moment collective dynamics of the brain instantiates learning and cognition. However, efficiently extracting such a conceptual understanding from large, high dimensional neural datasets requires concomitant advances in theoretically driven experimental design, data analysis, and neural circuit modeling. We will discuss how the modern frameworks of high dimensional statistics and deep learning can aid us in this process. In particular we will discuss: (1) how unsupervised tensor component analysis and time warping can extract unbiased and interpretable descriptions of how rapid single trial circuit dynamics change slowly over many trials to mediate learning; (2) how to tradeoff very different experimental resources, like numbers of recorded neurons and trials to accurately discover the structure of collective dynamics and information in the brain, even without spike sorting; (3) deep learning models that accurately capture the retina’s response to natural scenes as well as its internal structure and function; (4) algorithmic approaches for simplifying deep network models of perception; (5) optimality approaches to explain cell-type diversity in the first steps of vision in the retina.
SpikeInterface
Much development has been directed toward improving the performance and automation of spike sorting. This continuous development, while essential, has contributed to an over-saturation of new, incompatible tools that hinders rigorous benchmarking and complicates reproducible analysis. To address these limitations, we developed SpikeInterface, a Python framework designed to unify preexisting spike sorting technologies into a single codebase and to facilitate straightforward comparison and adoption of different approaches. With a few lines of code, researchers can reproducibly run, compare, and benchmark most modern spike sorting algorithms; pre-process, post-process, and visualize extracellular datasets; validate, curate, and export sorting outputs; and more. In this presentation, I will provide an overview of SpikeInterface and, with applications to real and simulated datasets, demonstrate how it can be utilized to reduce the burden of manual curation and to more comprehensively benchmark automated spike sorters.
Computational psychophysics at the intersection of theory, data and models
Behavioural measurements are often overlooked by computational neuroscientists, who prefer to focus on electrophysiological recordings or neuroimaging data. This attitude is largely due to perceived lack of depth/richness in relation to behavioural datasets. I will show how contemporary psychophysics can deliver extremely rich and highly constraining datasets that naturally interface with computational modelling. More specifically, I will demonstrate how psychophysics can be used to guide/constrain/refine computational models, and how models can be exploited to design/motivate/interpret psychophysical experiments. Examples will span a wide range of topics (from feature detection to natural scene understanding) and methodologies (from cascade models to deep learning architectures).
An open-source experimental framework for automation of cell biology experiments
Modern biological methods often require a large number of experiments to be conducted. For example, dissecting molecular pathways involved in a variety of biological processes in neurons and non-excitable cells requires high-throughput compound library or RNAi screens. Another example requiring large datasets - modern data analysis methods such as deep learning. These have been successfully applied to a number of biological and medical questions. In this talk we will describe an open-source platform allowing such experiments to be automated. The platform consists of an XY stage, perfusion system and an epifluorescent microscope with autofocusing. It is extremely easy to build and can be used for different experimental paradigms, ranging from immunolabeling and routine characterisation of large numbers of cell lines to high-throughput imaging of fluorescent reporters.
A discussion on the necessity for Open Source Hardware in neuroscience research
Research tools are paramount for scientific development, they enable researchers to observe and manipulate natural phenomena, learn their principles, make predictions and develop new technologies, treatments and improve living standards. Due to their costs and the geographical distribution of manufacturing companies access to them is not widely available, hindering the pace of research, the ability of many communities to contribute to science and education and reap its benefits. One possible solution for this issue is to create research tools under the open source ethos, where all documentation about them (including their designs, building and operating instructions) are made freely available. Dubbed Open Science Hardware (OSH), this production method follows the established and successful principles of open source software and brings many advantages over traditional creation methods such as: economic savings (see Pearce 2020 for potential economic savings in developing open source research tools), distributed manufacturing, repairability, and higher customizability. This development method has been greatly facilitated by recent technological developments in fast prototyping tools, Internet infrastructure, documentation platforms and lower costs of electronic off-the-shelf components. Taken together these benefits have the potential to make research more inclusive, equitable, distributed and most importantly, more reliable and reproducible, as - 1) researchers can know their tools inner workings in minute detail - 2) they can calibrate their tools before every experiment and having them running in optimal condition everytime - 3) given their lower price point, a)students can be trained/taught with hands on classes, b) several copies of the same instrument can be built leading to a parallelization of data collection and the creation of more robust datasets. - 4) Labs across the world can share the exact same type of instruments and create collaborative projects with standardized data collection and sharing.
Inferring brain-wide interactions using data-constrained recurrent neural network models
Behavior arises from the coordinated activity of numerous distinct brain regions. Modern experimental tools allow access to neural populations brain-wide, yet understanding such large-scale datasets necessitates scalable computational models to extract meaningful features of inter-region communication. In this talk, I will introduce Current-Based Decomposition (CURBD), an approach for inferring multi-region interactions using data-constrained recurrent neural network models. I will first show that CURBD accurately isolates inter-region currents in simulated networks with known dynamics. I will then apply CURBD to understand the brain-wide flow of information leading to behavioral state transitions in larval zebrafish. These examples will establish CURBD as a flexible, scalable framework to infer brain-wide interactions that are inaccessible from experimental measurements alone.
Cortical and subcortical grey matter micro-structure is associated with polygenic risk for schizophrenia
Background: Recent discovery of hundreds of common gene variants associated with schizophrenia has enabled polygenic risk scores (PRS) to be measured in the population. It is hypothesized that normal variation in genetic risk of schizophrenia should be associated with MRI changes in brain morphometry and tissue composition. Methods: We used the largest extant genome-wide association dataset (N = 69,369 cases and N = 236,642 healthy controls) to measure PRS for schizophrenia in a large sample of adults from the UK Biobank (Nmax = 29,878) who had multiple micro- and macro-structural MRI metrics measured at each of 180 cortical areas and seven subcortical structures. Linear mixed effect models were used to investigate associations between schizophrenia PRS and brain structure at global and regional scales, controlled for multiple comparisons. Results: Micro-structural phenotypes were more robustly associated with schizophrenia PRS than macro-structural phenotypes. Polygenic risk was significantly associated with reduced neurite density index (NDI) at global brain scale, at 149 cortical regions, and five subcortical structures. Other micro-structural parameters, e.g., fractional anisotropy, that were correlated with NDI were also significantly associated with schizophrenia PRS. Genetic effects on multiple MRI phenotypes were co-located in temporal, cingulate and prefrontal cortical areas, insula, and hippocampus. (Preprint: https://www.medrxiv.org/content/10.1101/2021.02.06.21251073v1)
NeuroTask: A Benchmark Dataset for Multi-Task Neural Analysis
Bernstein Conference 2024
Unified C. elegans Neural Activity and Connectivity Datasets for Building Foundation Models of a Small Nervous System
Bernstein Conference 2024
A high-throughput pipeline for evaluating recurrent neural networks on multiple datasets
COSYNE 2022
Fast inter-subject alignment method for large datasets shows fine-grained cortical reorganisations
COSYNE 2022
A high-throughput pipeline for evaluating recurrent neural networks on multiple datasets
COSYNE 2022
An accessible hippocampal dataset for benchmarking models of cognitive mapping
COSYNE 2023
A Large Dataset of Macaque V1 Responses to Natural Images Revealed Complexity in V1 Neural Codes
COSYNE 2023
Responses to inconsistent stimuli in pyramidal neurons: An open science dataset
COSYNE 2023
A labeled clinical-MRI dataset of Nigerian brains
FENS Forum 2024
Neuronal travelling waves explain rotational dynamics in experimental datasets and modelling
FENS Forum 2024
Re-analysing the Allen Gene Expression ISH dataset with deep learning
FENS Forum 2024
Supervised spike inference from calcium imaging data: New datasets, new analyses
FENS Forum 2024
ModuleXplore: A user-friendly Shiny application to compare gene co-expression modules within and across transcriptomic datasets
Neuromatch 5
Optimization techniques for machine learning based classification involving large-scale neuroscience datasets
Neuromatch 5