Dataset

Topic spotlight

TopicWorld Wide

dataset

Discover seminars, jobs, and research tagged with dataset across World Wide.

75 curated items60 Seminars14 ePosters1 Position

Updated 2 days ago

Browse all topics Explore domains

75 items · dataset

75 results

Position

Arcadia Science

Berkeley, CA, US

Dec 5, 2025

Job Description: Data scientist specializing in analysis of large, high-dimensional datasets and associated methods. Apply techniques from statistics, machine learning, and computational biology to a variety of datatypes and datasets from across Arcadia's research organisms. Datasets might range from genomics, multi-omics, imaging, time course, mass spectrometry, and neural recordings. Coordinate with experimentalists from experimental design all the way to publication. Work with our publishing team to build interactive, sharable resources for the scientific community. Our ideal candidate would have a history of contributions in analysis of complex datasets, a curiosity to work on a variety of problems and data types, and a passion for open science. They would be able to share their expertise both within and outside of Arcadia, and they would be able to translate difficult concepts into runnable, sharable analysis. The Arcadia Story: We are a research and development company leveraging the biology of emerging research organisms. We were founded by Seemay Chou and Prachee Avasthi, scientists convinced there is a better way to explore the full potential of science: how discoveries can be both meaningful and profitable. We are building a team of in-house scientists to carry out active research programs and convene a broader scientific community with a visiting scholars and internship program. Visit our website at www.arcadiascience.com to learn more about our work and check out Seemay’s founding story here.

SeminarNeuroscience

OpenNeuro FitLins GLM: An Accessible, Semi-Automated Pipeline for OpenNeuro Task fMRI Analysis

Michael Demidenko

Stanford University

Jul 31, 2025

In this talk, I will discuss the OpenNeuro Fitlins GLM package and provide an illustration of the analytic workflow. OpenNeuro FitLins GLM is a semi-automated pipeline that reduces barriers to analyzing task-based fMRI data from OpenNeuro's 600+ task datasets. Created for psychology, psychiatry and cognitive neuroscience researchers without extensive computational expertise, this tool automates what is largely a manual process and compilation of in-house scripts for data retrieval, validation, quality control, statistical modeling and reporting that, in some cases, may require weeks of effort. The workflow abides by open-science practices, enhancing reproducibility and incorporates community feedback for model improvement. The pipeline integrates BIDS-compliant datasets and fMRIPrep preprocessed derivatives, and dynamically creates BIDS Statistical Model specifications (with Fitlins) to perform common mass univariate [GLM] analyses. To enhance and standardize reporting, it generates comprehensive reports which includes design matrices, statistical maps and COBIDAS-aligned reporting that is fully reproducible from the model specifications and derivatives. OpenNeuro Fitlins GLM has been tested on over 30 datasets spanning 50+ unique fMRI tasks (e.g., working memory, social processing, emotion regulation, decision-making, motor paradigms), reducing analysis times from weeks to hours when using high-performance computers, thereby enabling researchers to conduct robust single-study, meta- and mega-analyses of task fMRI data with significantly improved accessibility, standardized reporting and reproducibility.

SeminarNeuroscience

Understanding reward-guided learning using large-scale datasets

Kim Stachenfeld

DeepMind, Columbia U

Jul 8, 2025

Understanding the neural mechanisms of reward-guided learning is a long-standing goal of computational neuroscience. Recent methodological innovations enable us to collect ever larger neural and behavioral datasets. This presents opportunities to achieve greater understanding of learning in the brain at scale, as well as methodological challenges. In the first part of the talk, I will discuss our recent insights into the mechanisms by which zebra finch songbirds learn to sing. Dopamine has been long thought to guide reward-based trial-and-error learning by encoding reward prediction errors. However, it is unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Longitudinal recordings of dopamine and bird songs reveal that dopamine activity is indeed consistent with encoding a reward prediction error during naturalistic learning. In the second part of the talk, I will talk about recent work we are doing at DeepMind to develop tools for automatically discovering interpretable models of behavior directly from animal choice data. Our method, dubbed CogFunSearch, uses LLMs within an evolutionary search process in order to "discover" novel models in the form of Python programs that excel at accurately predicting animal behavior during reward-guided learning. The discovered programs reveal novel patterns of learning and choice behavior that update our understanding of how the brain solves reinforcement learning problems.

SeminarPsychology

FLUXSynID: High-Resolution Synthetic Face Generation for Document and Live Capture Images

Raul Ismayilov

University of Twente

Jul 1, 2025

Synthetic face datasets are increasingly used to overcome the limitations of real-world biometric data, including privacy concerns, demographic imbalance, and high collection costs. However, many existing methods lack fine-grained control over identity attributes and fail to produce paired, identity-consistent images under structured capture conditions. In this talk, I will present FLUXSynID, a framework for generating high-resolution synthetic face datasets with user-defined identity attribute distributions and paired document-style and trusted live capture images. The dataset generated using FLUXSynID shows improved alignment with real-world identity distributions and greater diversity compared to prior work. I will also discuss how FLUXSynID’s dataset and generation tools can support research in face recognition and morphing attack detection (MAD), enhancing model robustness in both academic and practical applications.

SeminarNeuroscience

Expanding mechanisms and therapeutic targets for neurodegenerative disease

Aaron D. Gitler

Department of Genetics, Stanford University

Jun 4, 2025

A hallmark pathological feature of the neurodegenerative diseases amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) is the depletion of RNA-binding protein TDP-43 from the nucleus of neurons in the brain and spinal cord. A major function of TDP-43 is as a repressor of cryptic exon inclusion during RNA splicing. By re-analyzing RNA-sequencing datasets from human FTD/ALS brains, we discovered dozens of novel cryptic splicing events in important neuronal genes. Single nucleotide polymorphisms in UNC13A are among the strongest hits associated with FTD and ALS in human genome-wide association studies, but how those variants increase risk for disease is unknown. We discovered that TDP-43 represses a cryptic exon-splicing event in UNC13A. Loss of TDP-43 from the nucleus in human brain, neuronal cell lines and motor neurons derived from induced pluripotent stem cells resulted in the inclusion of a cryptic exon in UNC13A mRNA and reduced UNC13A protein expression. The top variants associated with FTD or ALS risk in humans are located in the intron harboring the cryptic exon, and we show that they increase UNC13A cryptic exon splicing in the face of TDP-43 dysfunction. Together, our data provide a direct functional link between one of the strongest genetic risk factors for FTD and ALS (UNC13A genetic variants), and loss of TDP-43 function. Recent analyses have revealed even further changes in TDP-43 target genes, including widespread changes in alternative polyadenylation, impacting expression of disease-relevant genes (e.g., ELP1, NEFL, and TMEM106B) and providing evidence that alternative polyadenylation is a new facet of TDP-43 pathology.

SeminarNeuroscience

Understanding reward-guided learning using large-scale datasets

Harnessing Big Data in Neuroscience: From Mapping Brain Connectivity to Predicting Traumatic Brain Injury

Franco Pestilli

University of Texas, Austin, USA

May 12, 2025

Neuroscience is experiencing unprecedented growth in dataset size both within individual brains and across populations. Large-scale, multimodal datasets are transforming our understanding of brain structure and function, creating opportunities to address previously unexplored questions. However, managing this increasing data volume requires new training and technology approaches. Modern data technologies are reshaping neuroscience by enabling researchers to tackle complex questions within a Ph.D. or postdoctoral timeframe. I will discuss cloud-based platforms such as brainlife.io, that provide scalable, reproducible, and accessible computational infrastructure. Modern data technology can democratize neuroscience, accelerate discovery and foster scientific transparency and collaboration. Concrete examples will illustrate how these technologies can be applied to mapping brain connectivity, studying human learning and development, and developing predictive models for traumatic brain injury (TBI). By integrating cloud computing and scalable data-sharing frameworks, neuroscience can become more impactful, inclusive, and data-driven..

Dataset

dataset

Arcadia Science

OpenNeuro FitLins GLM: An Accessible, Semi-Automated Pipeline for OpenNeuro Task fMRI Analysis

Understanding reward-guided learning using large-scale datasets

FLUXSynID: High-Resolution Synthetic Face Generation for Document and Live Capture Images

Expanding mechanisms and therapeutic targets for neurodegenerative disease

Understanding reward-guided learning using large-scale datasets

Harnessing Big Data in Neuroscience: From Mapping Brain Connectivity to Predicting Traumatic Brain Injury

Brain Emulation Challenge Workshop

Brain Emulation Challenge Workshop

Brain Emulation Challenge Workshop

Brain Emulation Challenge Workshop

Brain Emulation Challenge Workshop

Learning and Memory

A Comprehensive Overview of Large Language Models

Trends in NeuroAI - Meta's MEG-to-image reconstruction

Trends in NeuroAI - SwiFT: Swin 4D fMRI Transformer

Mathematical and computational modelling of ocular hemodynamics: from theory to applications

Enhancing Qualitative Coding with Large Language Models: Potential and Challenges

Spatial and Single Cell Genomics for Next Generation Neuroscience

NII Methods (journal club): NeuroQuery, comprehensive meta-analysis of human brain mapping

Estimating repetitive spatiotemporal patterns from resting-state brain activity data

Programmed axon death: from animal models into human disease

Sampling the environment with body-brain rhythms

Lifelong Learning AI via neuro inspired solutions

Multi-level theory of neural representations in the era of large-scale neural recordings: Task-efficiency, representation geometry, and single neuron properties

Linking GWAS to pharmacological treatments for psychiatric disorders

Do we measure what we think we are measuring?

Pynapple: a light-weight python package for neural data analysis - webinar + tutorial

Pynapple: a light-weight python package for neural data analysis - webinar + tutorial

Malignant synaptic plasticity in pediatric high-grade gliomas

Mesmerize: A blueprint for shareable and reproducible analysis of calcium imaging data

Network science and network medicine: New strategies for understanding and treating the biological basis of mental ill-health

Brain chart for the human lifespan

Towards a More Authentic Vision of the (multi)Coding Potential of RNA

CaImAn: large-scale batch and online analysis of calcium imaging data

NMC4 Short Talk: What can 140,000 Reaches Tell Us About Demographic Contributions to Visuomotor Adaptation?

NMC4 Short Talk: Novel population of synchronously active pyramidal cells in hippocampal area CA1

NMC4 Short Talk: Rank similarity filters for computationally-efficient machine learning on high dimensional data

NMC4 Short Talk: Hypothesis-neutral response-optimized models of higher-order visual cortex reveal strong semantic selectivity

NMC4 Short Talk: Image embeddings informed by natural language improve predictions and understanding of human higher-level visual cortex

NMC4 Short Talk: Directly interfacing brain and deep networks exposes non-hierarchical visual processing

NMC4 Keynote: Latent variable modeling of neural population dynamics - where do we go from here?

When and (maybe) why do high-dimensional neural networks produce low-dimensional dynamics?

Efficient GPU training of SNNs using approximate RTRL

Event-based Backpropagation for Exact Gradients in Spiking Neural Networks

StereoSpike: Depth Learning with a Spiking Neural Network

Rastermap: Extracting structure from high dimensional neural data

Fundamentals of PyTorch: Building a Model Step-by-Step

Autopilot v0.4.0 - Distributing development of a distributed experimental framework

Learning the structure and investigating the geometry of complex networks

Exploring perceptual similarity and its relation to image-based spaces: an effect of familiarity

Characterising the brain representations behind variations in real-world visual behaviour

Zero-shot visual reasoning with probabilistic analogical mapping

Digitization as a driving force for collaboration in neuroscience

Understanding neural dynamics in high dimensions across multiple timescales: from perception to motor control and learning

SpikeInterface

Computational psychophysics at the intersection of theory, data and models

An open-source experimental framework for automation of cell biology experiments

A discussion on the necessity for Open Source Hardware in neuroscience research

Inferring brain-wide interactions using data-constrained recurrent neural network models

Cortical and subcortical grey matter micro-structure is associated with polygenic risk for schizophrenia

NeuroTask: A Benchmark Dataset for Multi-Task Neural Analysis

Unified C. elegans Neural Activity and Connectivity Datasets for Building Foundation Models of a Small Nervous System

A high-throughput pipeline for evaluating recurrent neural networks on multiple datasets

Fast inter-subject alignment method for large datasets shows fine-grained cortical reorganisations

A high-throughput pipeline for evaluating recurrent neural networks on multiple datasets

An accessible hippocampal dataset for benchmarking models of cognitive mapping

A Large Dataset of Macaque V1 Responses to Natural Images Revealed Complexity in V1 Neural Codes

Responses to inconsistent stimuli in pyramidal neurons: An open science dataset

A labeled clinical-MRI dataset of Nigerian brains

Neuronal travelling waves explain rotational dynamics in experimental datasets and modelling

Re-analysing the Allen Gene Expression ISH dataset with deep learning

Supervised spike inference from calcium imaging data: New datasets, new analyses

ModuleXplore: A user-friendly Shiny application to compare gene co-expression modules within and across transcriptomic datasets

Optimization techniques for machine learning based classification involving large-scale neuroscience datasets