Resources
Authors & Affiliations
Felix Grün, Ioannis Iossifidis
Abstract
Dopaminergic Reward Prediction Errors (RPEs) are a key motivation and inspiration for model free, temporal difference reinforcement learning methods. Originally, the correlation of RPEs with model free temporal difference errors was seen as a strong indicator for model free reinforcement learning in brains. The standard view was that model free learning is the norm and more computationally expensive model based decision-making is only used when it leads to outcomes that are good enough to justify the additional effort. Nowadays, the landscape of opinions, models and experimental evidence, both electrophysiological and behavioral, paints a more complex picture, including but not limited to mechanisms of arbitration between the two systems. Model based learning or hybrid models better capture experimental behavioral data, and model based signatures are found in RPEs that were previously thought to be model free or hybrid [1]. The evidence for clearly model free learning is scarce [2]. On the other hand, multiple approaches show how model based behavior and RPEs can be produced with fundamentally model free reinforcement learning methods [3, 4, 5]. We point out findings that seem to contradict each other, others that complement each other, speculate which ideas are compatible with each other and give our opinions on ways forward, towards understanding if and how model based and model free learning from rewards coexist and interact in the brain.