Uncertainty-modulated prediction errors in cortical microcircuits

Katharina A. Wilmes; Mihai A. Petrovici; Shankar Sachidhanandam; Walter Senn

doi:10.7554/eLife.95127.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Georg Keller
Friedrich Miescher Institute, Basel, Switzerland
Senior Editor
Panayiota Poirazi
FORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece

Reviewer #1 (Public Review):

Summary:
Wilmes and colleagues present a computational model of a cortical circuit for predictive processing which tackles the issue of how to learn predictions when different levels of uncertainty are present for the predicted sensory stimulus. When a predicted sensory outcome is highly variable, deviations from the average expected stimulus should evoke prediction errors that have less impact on updating the prediction of the mean stimulus. In the presented model, layer 2/3 pyramidal neurons represent either positive or negative prediction errors, SST neurons mediate the subtractive comparison between prediction and sensory input, and PV neurons represent the expected variance of sensory outcomes. PVs therefore can control the learning rate by divisively inhibiting prediction error neurons such that they are activated less, and exert less influence on updating predictions, under conditions of high uncertainty.

Strengths:
The presented model is a very nice solution to altering the learning rate in a modality and context-specific way according to expected uncertainty and, importantly, the model makes clear, experimentally testable predictions for interneuron and pyramidal neuron activity. This is therefore an important piece of modelling work for those working on cortical and/or predictive processing and learning. The model is largely well-grounded in what we know of the cortical circuit.

Weaknesses:
Currently, the model has not been challenged with experimental data, presumably because data from an adequate paradigm is not yet available. I therefore only have minor comments regarding the biological plausibility of the model:

Beyond the fact that some papers show SSTs mediate subtractive inhibition and PVs mediate divisive inhibition, the selection of interneuron types for the different roles could be argued further, given existing knowledge of their properties. For instance, is a high PV baseline firing rate, or broad sensory tuning that is often interpreted as a 'pooling' of pyramidal inputs, compatible with or predicted by the model?

On a related note, SSTs are thought to primarily target the apical dendrite, while PVs mediate perisomatic inhibition, so the different roles of the interneurons in the model make sense, particularly for negative PE neurons, where a top-down excitatory predicted mean is first subtractively compared with the sensory input, s, prior to division by the variance. However, sensory input is typically thought of as arising 'bottom-up', via layer 4, so the model may match the circuit anatomy less in the case of positive PE neurons, where the diagram shows 's' arising in a top-down manner. Do the authors have a justification for this choice?

In cortical circuits, assuming a 2:8 ratio of inhibitory to excitatory neurons, there are at least 10 pyramidal neurons to each SST and PV neuron. Pyramidal neurons are also typically much more selective about the type of sensory stimuli they respond to compared to these interneuron classes (e.g., Kerlin et al., 2012, Neuron). A nice feature of the proposed model is that the same interneurons can provide predictions of the mean and variance of the stimulus in a predictor-dependent manner. However, in a scenario where you have two types of sensory stimulus to predict (e.g., two different whiskers stimulated), with pyramidal neurons selective for prediction errors in one or the other, what does the model predict? Would you need specific SST and PV circuits for each type of predicted stimulus?

https://doi.org/10.7554/eLife.95127.1.sa2

Reviewer #2 (Public Review):

Summary:
This computational modeling study addresses the observation that variable observations are interpreted differently depending on how much uncertainty an agent expects from its environment. That is, the same mismatch between a stimulus and an expected stimulus would be less significant, and specifically would represent a smaller prediction error, in an environment with a high degree of variability than in one where observations have historically been similar to each other. The authors show that if two different classes of inhibitory interneurons, the PV and SST cells, (1) encode different aspects of a stimulus distribution and (2) act in different (divisive vs. subtractive) ways, and if (3) synaptic weights evolve in a way that causes the impact of certain inputs to balance the firing rates of the targets of those inputs, then pyramidal neurons in layer 2/3 of canonical cortical circuits can indeed encode uncertainty-modulated prediction errors. To achieve this result, SST neurons learn to represent the mean of a stimulus distribution and PV neurons its variance.

The impact of uncertainty on prediction errors is an understudied topic, and this study provides an intriguing and elegant new framework for how this impact could be achieved and what effects it could produce. The ideas here differ from past proposals about how neuronal firing represents uncertainty. The developed theory is accompanied by several predictions for future experimental testing, including the existence of different forms of coding by different subclasses of PV interneurons, which target different sets of SST interneurons (as well as pyramidal cells). The authors are able to point to some experimental observations that are at least consistent with their computational results. The simulations shown demonstrate that if we accept its assumptions, then the authors' theory works very well: SSTs learn to represent the mean of a stimulus distribution, PVs learn to estimate its variance, firing rates of other model neurons scale as they should, and the level of uncertainty automatically tunes the learning rate, so that variable observations are less impactful in a high uncertainty setting.

Strengths:
The ideas in this work are novel and elegant, and they are instantiated in a progression of simulations that demonstrate the behavior of the circuit. The framework used by the authors is biologically plausible and matches some known biological data. The results attained, as well as the assumptions that go into the theory, provide several predictions for future experimental testing.

Weaknesses:
Overall, I found this manuscript to be frustrating to read and to try to understand in detail, especially the Results section from the UPE/Figure 4 part to the end and parts of the Methods section. I don't think the main ideas are so complicated, and it should be possible to provide a much clearer presentation.

For me, one source of confusion is the comparison across Figure 1EF, Figure 2A, Figure 3A, Figure 4AB, and Figure 5A. All of these are meant to be schematics of the same circuit (although with an extra neuron in Figure 5), yet other than Figures 1EF and 4AB, no two are the same! There should be a clear, consistent schematic used, with identical labeling of input sources, neuron types, etc. across all of these panels.

The flow of the Results section overall is clear until the ``Calculation of the UPE in Layer 2/3 error neurons' and Figure 4, where I find that things become significantly more confusing. The mention of NMDA and calcium spikes comes out of the blue, and it's not clear to me how this fits into the authors' theory. Moreover: Why would this property of pyramidal cells cause the PV firing rate to increase as stated? The authors refer to one set of weights (from SSTs to UPE) needing to match two targets (weights from s to UPE and weights from mean representation to UPE); how can one set of weights match two targets? Why do the authors mention ``out-of-distribution detection' here when that property is not explored later in the paper? (see also below for other comments on Figure 4)

Coming back to one of the points in the previous paragraph: How realistic is this exact matching of weights, as well as the weight matching that the theory requires in terms of the weights from the SSTs to the PVs and the weights from the stimuli to the PVs? This point should receive significant elaboration in the discussion, with biological evidence provided. I would not advocate for the authors' uncertainty prediction theory, despite its elegant aspects, without some evidence that this weight matching occurs in the brain. Also, the authors point out on page 3 that unlike their theory, "...SSTs can also have divisive effects, and PVs can have subtractive effects, dependent on circuit and postsynaptic properties". This should be revisited in the Discussion, and the authors should explain why these effects are not problematic for their theory. In a similar vein, this work assumes the existence of two different populations of SST neurons with distinct UPE (pyramidal) targets. The Discussion doesn't say much about any evidence for this assumption, which should be more thoroughly discussed and justified.

Finally, I think this is a paper that would have been clearer if the equations had been interspersed within the results. Within the given format, I think the authors should include many more references to the Methods section, with specific equation numbers, where they are relevant throughout the Results section. The lack of clarity is certainly made worse by the current state of the Methods section, where there is far too much repetition and poor ordering of material throughout.

https://doi.org/10.7554/eLife.95127.1.sa1

Reviewer #3 (Public Review):

Summary:
The authors proposed a normative principle for how the brain's internal estimate of an observed sensory variable should be updated during each individual observation. In particular, they propose that the update size should be inversely proportional to the variance of the variable. They then proposed a microcircuit model of how such an update can be implemented, in particularly incorporating two types of interneurons and their subtractive and divisive inhibition onto pyramidal neurons. One type should represent the estimated mean while another represents the estimated variance. The authors used simulations to show that the model works as expected.

Strengths:
The paper addresses two important issues: how uncertainty is represented and used in the brain, and the role of inhibitory neurons in neural computation. The proposed circuit and learning rules are simple enough to be plausible. They also work well for the designated purposes. The paper is also well-written and easy to follow.

Weaknesses:
I have concerns with two aspects of this work.

(1) The optimality analysis leading to Eq (1) appears simplistic. The learning setting the authors describe (estimating the mean of a stationary Gaussian variable from a stream of observations) is a very basic problem in online learning/streaming algorithm literature. In this setting, the real "optimal" estimate is simply the arithmetic average of all samples seen so far. This can be implemented in an online manner with \hat{\mu}_{t} = \hat{\mu}_{t-1} +(s_t-\hat{\mu}_{t-1})/t. This is optimal in the sense that the estimator is always the maximum likelihood estimator given the samples seen up to time t. On the other hand, doing gradient descent only converges towards the MLE estimator after a large number of updates. Another critique is that while Eq (1) assumes an estimator of the mean (\hat{mu}), it assumes that the variance is already known. However, in the actual model, the variance also needs to be estimated, and a more sophisticated analysis thus needs to take into account the uncertainty of the variance estimate and so on. Finally, the idea that the update should be inverse to the variance is connected to the well-established idea in neuroscience that more evidence should be integrated over when uncertainty is high. For example, in models of two-alternative forced choices it is known to be optimal to have a longer reaction time when the evidence is noisier.

(2) While the incorporation of different inhibitory cell types into the model is appreciated, it appears to me that the computation performed by the circuit is not novel. Essentially the model implements a running average of the mean and a running average of the variance, and gates updates to the mean with the inverse variance estimate. I am not sure about how much new insight the proposed model adds to our understanding of cortical microcircuits.

https://doi.org/10.7554/eLife.95127.1.sa0

Uncertainty-modulated prediction errors in cortical microcircuits

Peer review process

Editors

Be the first to read new articles from eLife