Approximate gradient descent and the brain: the role of bias and variance

Abstract

Gradient descent is a central tool in modern machine learning, largely because it scales well to train large networks on computationally challenging tasks. As such, computational neuroscientists are exploring potential means by which the brain could leverage gradient descent. However, it is unlikely that the brain could implement perfect gradient descent, but instead could rely on using approximations of the true gradient signal. Therefore, it is likely that the gradient estimates used by the brain would have both bias and variance. Here, we use model agnostic mathematical analysis and simulations to understand how bias and variance in gradient approximations affect the learning performance of a system. Our work, supported by experiments involving artificial neural networks trained on synthetic datasets in a student-teacher setup, demonstrates the distinctive impact of bias and variance on learning performance. Firstly, we find that the effect of bias increases with the gradient norm, and decreases as the network approaches a loss minimum. Secondly, the effect of variance increases when networks are smaller but decreases when network activity is sparser. These results indicate that having good priors over the system’s parameters, possibly inherited through evolution, could ameliorate the impact of bias. Additionally, increasing the network size, both depth and width, allows the system to reduce the impact of noise in gradient estimates. Taken together, our findings suggest a normative explanation for the need for good priors over the synaptic connections in the brains to mitigate biased gradient estimates and the need for increased brain size to mitigate noise in gradient estimates. Furthermore, our results can inform the search for algorithms that approximate gradient descent depending on the system characteristics and task complexity. Overall, we believe that our work contributes to developing biologically-plausible learning algorithms for artificial systems as well as the quest to understand learning in the nervous system.

Approximate gradient descent and the brain: the role of bias and variance

Resources

Authors & Affiliations

Abstract

Guiding Visual Attention in Dynamic Scenes

Knight ADRC Seminar

TBD

Analytics consent required