Examining exploration strategy in relation to uncertainty in an incremental learning task.

a, Structure of the task. Participants explored four tables, each containing two decks with different proportions of blue/orange cards. The goal was to learn the difference in proportions of the decks on each table. b, The two phases of the task - exploration and test. On a single exploration trial (left), participants chose between two tables, and then sampled a card from one of the decks on that table, observing its color. After a random number of exploration trials, participants were tested on their knowledge (right). A color was designated as rewarding, and participants then chose the deck with the highest proportion of the rewarding color on each table. They were rewarded for correct test-phase choices, and received no reward during exploration. c, Histogram of round lengths. Participants played 22 rounds. The length of exploration in each round followed a shifted geometric distribution, such that the test was equally likely to occur following any trial after the first 10. d, We considered a hierarchy of strategies for choosing which table to explore. The normatively prescribed strategy is to choose the table affording maximal expected information gain. This is the table for which the next card is expected to maximally decrease uncertainty (measured as entropy H) about the value of the goal-relevant latent parameter θ, given observations thus far x. A simpler strategy is to choose the table with the maximum uncertainty, as it does not necessitate computing an expectation over the next observation. An even simpler heuristic is to equate previous exposure and choose the table with the least previous observations nx. Even though these three strategies vary considerably in complexity, they are all uncertainty-approaching on average. Lastly, people may be random explorers.

Hypothetical strategies make differing predictions for exploratory choice behavior.

We computed the three quantities hypothesized to drive exploratory choices using a Bayesian observer model. To illustrate this process, we plot the derivation of Bayesian belief on a single trial. (a) and across multiple trials (b, c). For visualization, we use a simplified version with two tables only. a depicts the Bayesian observer’s belief about a single table on a single trial. Given a sequence of previously observed cards (left), the Bayesian observer forms posterior beliefs about the proportion of orange cards in each deck (center). These beliefs are expressed as Beta distributions. From these, it is possible to derive a belief about the difference in the proportion of orange cards between the two decks π1 – π2 (right). The probability that π1< π2 is given by the proportional size of the area marked in gray (0.74 in this example). b Depicts the same process over a series of20trials. The observed card sequence for each table is presented at the top of each panel. The matching belief state about π1 – π2 is plotted below it as an evolving posterior density in white (high) and black (low). The green arrows mark the true value of π1 – π2 for that round. As the round progresses, belief converges towards the true value, and becomes more certain. c, The three choice strategies prescribe different table choices on most trials. The difference between table 1 and table 2 in each of the three quantities (EIG, uncertainty and exposure) is plotted for each trial. This difference is the hypothesized decision variable for choosing between tables 1 and 2. A positive value indicates a preference for exploring table 1, and a negative value a preference for table 2. The three variables are normalized to facilitate visual comparison.

The Bayesian observer model is validated by participants’ accuracy and confidence on the test phase.

a, Participants were accurate when an exploration phase ended with low uncertainty, and performed at chance level when the phase ended with high uncertainty. b, Participant’s confidence on correct choices fell with rising uncertainty. Confidence on error trials did not depend as much on Bayesian observer uncertainty. When a test question was unsolvable because no evidence was observed on each deck during exploration, participants had very low confidence. Data presented as mean values ±1 SE, n=194 participants.

Figure 3—figure supplement 1. Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: The Bayesian observer model is validated by participants’ accuracy and confidence on the test phase.

a, Participants were accurate when an exploration phase ended with low uncertainty, and performed at chance level when the phase ended with high uncertainty. b, Participant’s confidence on correct choices fell with rising uncertainty. Confidence on errors did not depend as much on Bayesian observer uncertainty. Data presented as mean values ±1 SE, n=62 participants. Nats are the units of entropy, a mathematically convenient measure of uncertainty.

Uncertainty is the best predictor of choice.

a, On each plot the difference in the hypothesized quantity between the two tables presented on each trial is plotted against actual choices of the table presented on the right. For each plot, the relevant hypothesis predicts a positive smooth curve. -uncertainty, plotted on the left, matches this prediction better than Δ (center). The relationship between Δ-exposure (right) and choice is negative, rather than the hypothesized positive correlation. b Quantitative model comparison confirms this observation. Out of the three hypothesized strategies uncertainty has the highest approximate expected log predictive density (using PSIS LOO; see Methods). Data presented as mean values ±1SE, n=194 participants.

Figure 4—figure supplement 1. Fitting simulated data successfully recovers the underlying strategy.

Figure 4—figure supplement 2. Uncertainty is a sufficient predictor of choice.

Figure 4—figure supplement 3. Matching results in the preliminary sample.

Our analysis approach successfully recovers the strategy used by simulated agents.

We compared the actual data (top) to datasets generated by artificial agents. Each simulated dataset comprised a group of agents operating according to one of the hypothesized strategies. We fixed the effect size for each strategy in the simulations to the effect size we observed for uncertainty in the actual data. Each agent matched a single participant in the true dataset, choosing to observe cards from the same decks presented to the participant. Each agent chose the table on the right or the table on the left on each trial, with the probability of choosing the table on the right given by f (a + b × Δx), where f is the logistic function, b is the degree to which the agent’s choices are dependent on Ax, standing for the relevant decision variable, and a is a general bias towards rightward or leftward choices. Coefficients a and b were extracted per participant from the uncertainty model described in Figure 4. For the sake of this analysis, we assumed the agents choose a random deck on the table of their choice. Here, the simulated data for each group of agents was plotted against each of the three decision variables and fit with the same models we used on the actual dataset (center). We tested whether our procedure for qualitative and quantitative model comparison used in Figure 4 is potent at recovering the true strategy generating the data. For easy comparison, the actual data is re-plotted on the first row. For each of the three simulated strategies, we observe successful recovery: the decision variable matching the true strategy shows the strongest positive correlation with choice (center), and the correct strategy is indicated as best fitting the data (right). Furthermore, a negative correlation between behavior and Δ-exposure, as observed in the true data, is only evident in the uncertainty-based group of agents (second row). Data plotted as means ±1SE, n=194 participants/agents.

Simulations confirm that uncertainty is a sufficient predictor of choice.

We further confirmed our conclusion that uncertainty is the best predictor of participants’ choices, by plotting the posterior predictive distribution for each of the models predicting choice from a hypothesized strategy. We simulated 500 datasets for each of the three models, and plotted the distribution of the simulated data (green lines with 50% and 95% posterior interval bands, n=500 iterations) against the observed dataset (means ±1SE plotted in black, n=194 participants). The simulation procedure was similar to that used in Figure 4—figure Supplement 1, with the exception that coefficients were extracted from the posterior distribution of each model fitted to the actual data. We expect that the posterior predictive distribution for each model would capture the relationship between the relevant decision variable and choice well. The extent to which the posterior predictive distribution can recreate the association with the other two decision variables is a test of model fit. a, The posterior predictive distribution for the EIG model does not match the observed data well: it does not reproduce the strong slope for Δ-uncertainty, nor the negative correlation with Δ-exposure. b, The posterior predictive distribution for uncertainty captures the data very well, matching the particular shape of the correspondence between choices and Δ-EIG, and the negative correlation between choices and. Δ-exposure. c, The posterior predictive distribution of exposure does not match observe data well: it fails to recreate the positive correlations between choice and EIG and uncertainty.

Reproducing the analysis in Figure 4 using the preliminary sample: Uncertainty is the best predictor of choice.

a, On each plot the difference in the hypothesized quantity between the two tables presented on each trial is plotted against actual choices of the table presented on the right. For each plot, the relevant hypothesis predicts a positive smooth curve. Δ-uncertainty, plotted on the left, matches this prediction better than Δ-EIG (center). The relationship between Δ-exposure (right) and choice is negative, rather than the hypothesized positive correlation. b Quantitative model comparison confirms this observation. Out of the three hypothesized strategies uncertainty has the highest approximate expected log predictive density (PSIS LOO; see Methods). Data presented as mean values ±1 SE, n=62 participants.

Participants approach vs. avoid Δ-uncertainty as a function of overall uncertainty.

a, While the Δ-uncertainty is the decision variable identified above, overall uncertainty, defined as the sum of uncertainty for both tables, is a measure of decision difficulty. b, The influence of Δ-uncertainty on choice differed markedly below and above a threshold of overall uncertainty. Below an estimated threshold of overall uncertainty, Δ-uncertainty had a significant positive effect on choice. Above this threshold of overall uncertainty, the influence of Δ-uncertainty became strongly negative. Points denote mean posterior estimate from regression models fitted to binned data, error bars mark 50% PI. The solid line depicts the prediction from a piecewise regression model capturing the non-linear relationship and estimating the threshold, with darker ribbon marking50%PI and light ribbon marking95% PI. Data from three regions of overall uncertainty marked in color are plotted in c. For low overall uncertainty (blue) participants tend to choose the table they are more uncertain about, as normatively prescribed. But that relationship is broken for medium levels of overall uncertainty (purple). For high overall uncertainty (red), participants strongly prefer to choose the table they are less uncertain about, thereby slowing down the rate of information intake. Data plotted as mean ±SE, n=194 participants.

Figure 5—figure supplement 1. Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: Participants approach vs. avoid Δ-uncertainty as a function of overall uncertainty.

a, The influence of Δ-uncertainty on choice differed markedly below and above a threshold of overall uncertainty. Below a certain estimated threshold of overall uncertainty, Δ-uncertainty had a significant positive effect on choice. Above this threshold of overall uncertainty, the influence of Δ-uncertainty decreased significantly. Points denote mean posterior estimate from regression models fitted to binned data, error bars mark 50% PI. The solid line depicts the prediction from a piecewise regression model capturing the non-linear relationship and estimating the threshold, with the darker ribbon marking 50% PI and the light ribbon marking 95% PI. Data from three regions of overall uncertainty marked in color are plotted in b. For low overall uncertainty (blue) participants tend to choose the table they are more uncertain about, as normatively prescribed. But that relationship is broken for medium levels of overall uncertainty (purple). For high overall uncertainty (red), participants strongly prefer to choose the table they are less uncertain about, thereby slowing down the rate of information intake. Data plotted as mean ±SE, n=62 participants.

Approaching uncertainty benefits learning while avoiding uncertainty does not hurt it.

a, We observe substantial individual differences in strategy. Replotting Figure 5e for each individual reveals differences in the baseline tendency to approach uncertainty, and differences in the interaction with overall uncertainty, which captures uncertainty avoidance when overall uncertainty is high. b, Associations between test performance and the parameters describing approaching and avoiding uncertainty. The baseline tendency to approach uncertainty (left) is strongly associated with performance at test, such that participants who are unable to approach uncertainty also perform poorly at test. There is a weak and positive correlation between test performance and the tendency to avoid uncertainty when overall uncertainty is high (right), indicating that uncertainty avoidance does not hinder learning. Uncertainty avoidance is quantified based on the individual lines plotted in panel a as the triangular area charted by the piecewise regression line.

Figure 6—figure supplement 1. Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: Approaching uncertainty benefits learning while avoiding uncertainty does not hurt it.

a, We observe substantial individual differences in strategy. Replotting Figure 5e for each individual reveals differences in the baseline tendency to approach uncertainty, and differences in the interaction with overall uncertainty, which captures uncertainty avoidance when overall uncertainty is high. b, Associations between test performance and the parameters describing approaching and avoiding uncertainty. The baseline tendency to approach uncertainty (left) is strongly associated with performance at test, such that participants who are unable to approach uncertainty also perform poorly at test. There is a weak and positive correlation between test performance and the tendency to avoid uncertainty when overall uncertainty is high (right), indicating that uncertainty avoidance does not hinder learning. Uncertainty avoidance is quantified based on the individual lines plotted in panel a as the triangular area charted by the piecewise regression line.

Individuals who spend time deliberation during exploration make strategic choices and learn well.

Participants varied not only in the pattern of their choices, but also in their RTs. a, Data from three example participants. The relationship of choice and RTs with Δ-uncertainty weakens from left to right. Data plotted as mean ±SE. b, These individual differences were captured by a sequential sampling model, explaining choices and RTs as the interaction between participant’s efficacy of deliberating about Δ-uncertainty and their tendency to deliberate longer vs. make quick responses. Plotting model predictions, we observe a u-shaped dependence of RTs on Δ-uncertainty for participants whose performance at test was in the top accuracy tertile. This characteristic u-shape is indicative of decisions made by prolonged deliberation. This relationship is weaker for participants in the bottom two test accuracy tertiles. Such participants also exhibit shorter RTs overall. Lines mark mean predictions from a sequential sampling model fit by tertiles for visualization, ribbons denote 50% PIs. c, Correlating the sequential sampling model parameters with test performance confirms these observations. Participants with a stronger dependence of RT on Δ–uncertainty perform better at test, as do participants who deliberate longer for the sake of accuracy. Example participants from a are marked in red. Lines are mean predictions from a logistic regression model.

Figure 7—figure supplement 1. Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: Individuals who spend time deliberation during exploration make strategic choices and learn well.

Participants varied not only in the pattern of their choices, but also in their RTs. a, Data from three example participants. The relationship of choice and RTs with Δ-uncertainty weakens from left to right. Data plotted as mean ±SE. b, These individual differences were captured by a sequential sampling model, explaining choices and RTs as the interaction between participant’s efficacy of deliberating about Δ-uncertainty and their tendency to deliberate longer vs. make quick responses. Plotting model predictions, we observe a u-shaped dependence of RTs on Δ-uncertainty for participants whose performance attest was in the top accuracy tertile. This characteristic u-shape is indicative of decisions made by prolonged deliberation. This relationship is weaker for participants in the bottom two test accuracy tertiles. Such participants also exhibit shorter RTs overall. Lines mark mean predictions from a sequential sampling model fit by tertiles for visualization, ribbons denote 50% PIs. c, Correlating the sequential sampling model parameters with test performance confirms these observations. Participants with a stronger dependence of RT on Δ-uncertainty perform better at test, as do participants who deliberate longer for the sake of accuracy. Example participants from a are marked in red. Lines are mean predictions from a logistic regression model.

Participants tend to repeat previous choices instead of deliberating over uncertainty.

a, On a given trial one table has been chosen more recently than the other (frames denote previous choices). In the example the green table had been chosen more recently, hence it is designated the repeat option and the other table the switch option. b, Participants tend to choose the table displayed on the right more often when it is the repeat option than when it is the switch option. Data plotted as mean ±SE, n=194 participants. c, When choosing a repeat option, participants’ RTs are shorter and less dependent on -uncertainty. Lines mark mean predictions from a sequential sampling model, ribbons denote 50% PIs. d, Participants who tended to repeat their previous choice also tended to perform better at test (left), were more likely to have a stronger baseline tendency to approach uncertainty (middle), and a stronger tendency to avoid uncertainty when overall uncertainty is high (right). Regression lines are plotted for visualization.

Figure 8—figure supplement 1. Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: Participants tend to repeat previous choices instead of deliberating over uncertainty.

a, On a given trial one table has been chosen more recently than the other (frames denote previous choices). In the example the green table had been chosen more recently, hence it is designated the repeat option and the other table the switch option. b, Participants tend to choose the table displayed on the right more often when it is the repeat option than when it is the switch option. Data plotted as mean ±SE, n=194participants. c, When choosing a repeat option, participants’ RTs are shorter and less dependent on Δ-uncertainty. Lines mark mean predictions from a sequential sampling model, ribbons denote 50% PIs. d, Participants who tended to repeat their previous choice also tended to perform better at test (left), were more likely to have a stronger baseline tendency to approach uncertainty(middle),and a stronger tendency to avoid uncertainty when overall uncertainty is high (right). Regression lines are plotted for visualization.

Forgetting is associated with random choice rather than a systematic bias.

a, Memory lag, defined as trials since last choice, serves as a proxy for forgetting and contributes to the difficulty of making an exploratory choice. RTs rise with memory lag. The RT advantage for repeat choices disappears with higher memory lag. b, With higher memory lag choices become less dependent on Δ-uncertainty, as indicated by flatter curves. The tendency to repeat the last choice is also diminished with memory lag. Both effects amount to choice becoming more random due to forgetting. Data plotted as mean ±SE, n=194 participants.

Figure 9—figure supplement 1. Matching results in the preliminary sample.

Reproduction the analysis using the preliminary sample.

Forgetting is associated with random choice rather than a systematic bias. a, Memory lag, defined as trials since last choice, serves as a proxy for forgetting and contributes to the difficulty of making an exploratory choice. RTs rise with memory lag. The RT advantage for repeat choices disappears with higher memory lag. b, With higher memory lag choices become less dependent on Δ-uncertainty, as indicated by flatter curves. The tendency to repeat the last choice is also diminished with memory lag. Both effects amount to choice becoming more random due to forgetting. Data plotted as mean ±SE, n=194 participants.

Regularizing priors used in regression models.

Test Accuracy as a Function of the Final Uncertainty in the Exploration-Phase.

Test Confidence as a Function of the Final Uncertainty in the Exploration-Phase.

Exploration-Phase Choices as a Function of Δ-Uncertainty

Exploration-Phase Choices as a Function of Δ-EIG

Exploration-Phase Choices as a Function of Δ-Exposure

Exploration-Phase Choices as a Function of ΔΔ-uncertainty and Overall Uncertainty

Test Performance as a Function of the Tendency to Approach Uncertainty in Exploration

Test Performance as a Function of Tendency to Approach Uncertainty in Exploration when Overall Uncertainty is High

Drift Diffusion Model of Exploration-Phase Choice and RTs

Test Performance as a Function of Drift Diffusion Model Parameters for Exploration Phase

Exploration-phase Choices as a Function of Δ-Uncertainty, Overall Uncertainty, and Side of Repeat Option

Drift Diffusion Model of Exploration-Phase Choice and RTs, Differentiating between Repeat and Switch Choices

Test Performance as a Function of the Tendency to Repeat Exploration-Phase Choices

Exploration-Phase RTs as a Function of Memory Lag and Side of Repeat Option

Exploration-Phase Choices as a Function of ΔΔ-Uncertainty, Memory Lag, and Side of Repeat Option

Exploration-Phase Choices as a Function of Δ-Uncertainty, Overall Uncertainty, and Trial Number

Model Comparison for Sequential Sampling Models of the Tendency to Repeat Previous Choices