ePoster

STUDYING THE LINK BETWEEN HUMAN PREFERENCE ALIGNMENT AND NEURAL ALIGNMENT IN LLMS

Mireia Masiasand 3 co-authors

Telefonica Innovación Digital

FENS Forum 2026 (2026)
Barcelona, Spain
Board PS02-07PM-571

Presentation

Date TBA

Board: PS02-07PM-571

Poster preview

STUDYING THE LINK BETWEEN HUMAN PREFERENCE ALIGNMENT AND NEURAL ALIGNMENT IN LLMS poster preview

Event Information

Poster Board

PS02-07PM-571

Abstract

Large language models (LLMs) are increasingly post-trained to better follow human preferences, yet it remains unclear whether improved preference alignment is accompanied by stronger alignment to human neural representations. We systematically compare pretrained models, instruction-following variants across LLMs. Using a shared set of language stimuli with matched neural recordings, we extract layer-wise representations and estimate neural alignment via representational similarity and encoding-based correlations, while accounting for measurement noise. We then relate neural alignment to performance on human-alignment benchmarks. Finally, we fine-tune a subset of models with a neural-alignment objective and test whether this brain tuning transfers to human-alignment and task performance measures. Preliminary results indicate that post-training shifts peak neural alignment toward middle-to-late layers and increases correspondence with higher-level semantic signals, while reward models show the strongest late-layer alignment. Across models, higher benchmark scores are positively associated with neural alignment in these layers. Brain-tuned models show small but consistent gains on preference benchmarks without degrading task performance, suggesting that neural alignment may provide a complementary training signal for building more human-aligned systems.

Recommended posters

Cookies

We use essential cookies to run the site. Analytics cookies are optional and help us improve World Wide. Learn more.