Value Functions
value functions
A role for dopamine in value-free learning
Recent success in training artificial agents and robots derives from a combination of direct learning of behavioral policies and indirect learning via value functions. Policy learning and value learning employ distinct algorithms that depend upon evaluation of errors in performance and reward prediction errors, respectively. In mammals, behavioral learning and the role of mesolimbic dopamine signaling have been extensively evaluated with respect to reward prediction errors; but there has been little consideration of how direct policy learning might inform our understanding. I’ll discuss our recent work on classical conditioning in naïve mice (https://www.biorxiv.org/content/10.1101/2021.05.31.446464v1) that provides multiple lines of evidence that phasic dopamine signaling regulates policy learning from performance errors in addition to its well-known roles in value learning. This work points towards new opportunities for unraveling the mechanisms of basal ganglia control over behavior under both adaptive and maladaptive learning conditions.
Deep learning for model-based RL
Model-based approaches to control and decision making have long held the promise of being more powerful and data efficient than model-free counterparts. However, success with model-based methods has been limited to those cases where a perfect model can be queried. The game of Go was mastered by AlphaGo using a combination of neural networks and the MCTS planning algorithm. But planning required a perfect representation of the game rules. I will describe new algorithms that instead leverage deep neural networks to learn models of the environment which are then used to plan, and update policy and value functions. These new algorithms offer hints about how brains might approach planning and acting in complex environments.