Effects of dopamine D2/3 and opioid receptor antagonism on the trade-off between model-based and model-free behaviour in healthy volunteers

  1. Nace Mikus  Is a corresponding author
  2. Sebastian Korb
  3. Claudia Massaccesi
  4. Christian Gausterer
  5. Irene Graf
  6. Matthäus Willeit
  7. Christoph Eisenegger
  8. Claus Lamm
  9. Giorgia Silani
  10. Christoph Mathys
  1. Department of Cognition, Emotion, and Methods in Psychology, Faculty of Psychology, University of Vienna, Austria
  2. Interacting Minds Centre, Aarhus University, Denmark
  3. Department of Psychology, University of Essex, United Kingdom
  4. Department of Clinical and Health Psychology, Faculty of Psychology, University of Vienna, Austria
  5. FDZ‐Forensisches DNA Zentrallabor GmbH, Medical University of Vienna, Austria
  6. Department of Psychiatry and Psychotherapy, Medical University of Vienna, Austria
  7. Translational Neuromodeling Unit (TNU), Institute for Biomedical Engineering, University of Zurich and ETH Zurich, Switzerland
  8. Scuola Internazionale Superiore di Studi Avanzati (SISSA), Italy
5 figures, 1 table and 2 additional files

Figures

Figure 1 with 1 supplement
Study Procedure.

After an initial online screening, participants were invited to the lab for a first visit (Session 1), where they were subjected to a medical check-up, before playing the two-step task for the first time. If they fulfilled the study criteria, they were invited for another visit (Session 2), where they received either 400 mg of amisulpride, 50 mg of naltrexone, or placebo (mannitol). After 180 min of waiting time, participants started with the test battery. Approximately 270 min after drug intake, participants performed the two-step task the second time, followed by a Reading Span (working memory) task, and a blood draw to determine amisulpride serum levels.

Figure 1—figure supplement 1
Side effects.

On most measures most participants rated the severity of the side effect as low as possible. The only measures where more than 50% of data points were above 1 was tiredness. Here, the effect of amisulpride was at trend level (b=0.20, p=0.06) and not significant for naltrexone (b=–0.15, p=0.17).

Task Design and Hypotheses.

(a) In the first stage of the trial, the participants were presented with one of the two pairs of spaceships. Each of the spaceships flew to one of two planets in the second stage, where they encountered an alien that gave them points. The transition from each of the four spaceships to the planets was deterministic and stayed the same throughout the experiment. The points each alien gave changed independently according to a discretized Gaussian random walk with bouncing boundaries at –4 and 5. (b) Behaviour in the task showcasing trials where the previous first-stage state (spaceship pair) was the same (trial 2) and trials where it was different (trial 3). High number of points should encourage choosing the spaceship that flies to the same planet, what we term as ‘staying with the previous choice’. Note that in the featured example, the participant after receiving –1 point in the trial 1 opted against staying with the previous choice in the next trial, and after receiving 2 points in trial 2 opted for staying with the previous choice, by choosing the spaceship, that flew to the same planet.

Behavioural analysis.

(a) We used a hierarchical Bayesian logistic regression model to analyse how the probability to stay with the previous choice depended on previous points, previous first-stage state, session, and drug administration and their interaction. We allowed the intercept and the slopes for previous first-stage state and session to vary by participant (full table of coefficients in Supplementary file 1a). Points and error bars depict mean proportions and standard errors in trials where the participants stayed with their previous choice for each previous point averaged within session and drug group. Lines and ribbons depict the mean estimates and 80% credible intervals. When encountering a trial with the same first-stage state as in its preceding trial, participants were 1.495 times (95% CI [1.286, 1.739], P(βlogodds < 0)<10e-3) more likely to stay with their previous choice for each additional previous point (βlogodds = 0.402, 95% CI [0.251, 0.553], P(βlogodds < 0)<10e-3). In contrast, in trials where the first-stage state was different from the preceding trial, the odds (on the logarithmic scale) of repeating the previous choice of spaceship were reduced, with each additional point earned in the previous trial, by –0.204 (95% CI [–0.263, –0.146], P(βlogodds > 0)<10e-3), and participants were only 1.219 times (95% CI [1.053, 1.410], P(βlogodds < 0)<10e-3) more likely to stay with their choice for each additional previous point. This indicates that participants often failed to consider the mapping from spaceship to planets when making choices in trials where the first-stage state differed from the previous trial. The effects of the drugs can be seen by the different slopes of previous points in the two sessions depicted for both trial types. (b) Differences in staying behaviour between sessions, binned into five different reward levels for clarity. Means with standard errors overlayed with means and 80% CI of estimated posterior distributions. (c) Means with 80% and 95% CIs of effect sizes (in logodds space) of selected regression coefficients of the hierarchical logistic regression model predicting staying behaviour from previous points (PrevPoints), previous first-stage states (PrevState), session, drug administrations, and all the interactions between them. Ami, amisulpride, n = 38; Nal, naltrexone, n = 39; Pla, placebo, n = 35.

Figure 4 with 2 supplements
Effects of amisulpride and naltrexone on the model parameters.

The best performing model (M1) is described with three free parameters, ω (the degree of model-based vs model-free value contributions to choice), γ (the degree of devaluation of unencountered spaceships), and η (the inverse temperature in the softmax mapping from values to probabilities). The parameters for both sessions and effects of drug treatments were estimated in one hierarchical model. (a) Going from session 1 to session 2, amisulpride administration led to higher estimations of ω, and therefore increased model-based relative to model-free control. The difference between sessions of the model-based weights is shown in parameter estimation space (hence the prime). (b) Difference in the parameter γ. (c) Difference in the inverse temperature parameter η. Lower values mean higher exploration. (d) Posterior distributions of the effect of drugs, compared to placebo, on group-level mean session differences. Effect sizes with 80% and 95% CI. Ami, amisulpride, n = 38; Nal, naltrexone, n = 39; Pla, placebo, n = 35.

Figure 4—figure supplement 1
Computational modelling.

(a) We compared our model to two dual-system reinforcement learning (RL) models used previously with this task (Kool et al., 2017; Kool et al., 2016). In model M1 both model-free and model-based agents remember only the last outcome of a choice of spaceships, but only the model-based agent is aware of the relation between first-stage states. Models M2 and M3 are reinforcement learning-based models adapted from Kool et al., 2016 with either separate (M2) or equal (M3) first and second stage learning rates. (b) Models were compared using the leave-one-out information criterion (looic), where lower values indicate better out of sample trial-by-trial predictive performance. Refer to the Methods for weights from Bayesian Model Averaging of all models, indicating the posterior probability of each model given the data. Model M1 performs best in both metrics. (c) For the winning model (M1) we calculated the posterior predictive accuracy averaged within participants and found that the mean (SD) posterior prediction accuracy was 65% (11%). (d) Parameter recovery. (e) Simulated data plotted to visually and statistically investigate whether the model captures the crucial aspects of behaviour and group differences (compared to Figure 3).

Figure 4—figure supplement 2
Results of the model including stickiness parameters.

Posterior distribution of effect sizes of group level effects on model parameters in a model including stickiness parameters with means, 95 and 80% intervals.

Effects of amisulpride across the high and low blood serum levels.

(a) Across both effective dose groups of participants, amisulpride increased the model-free/model-based weight parameter. (b) Amisulpride increased exploration (decreased η) in the group that had high blood serum levels, but not in the group with low serum levels. Effect sizes with 80% and 95% CI. Ami, amisulpride; n = 32 (low serum, n = 14, high serum, n = 18); Nal, naltrexone, n = 39; Pla, placebo, n = 35.

Tables

Table 1
Prior distributions for the behavioural analysis.
Standard deviationsσ Half Cauchy(0,2)
Regression coefficientsβ~ N(0,3)
Interceptβ0~Student(3,0,10)
Prior for the correlation matrixR~LKJcorr(2)

Additional files

Supplementary file 1

Supplementary Data and Model Summaries.

(a) Estimates and CIs of fixed effects of the Bayesian logistic regression predicting staying behaviour. Q2.5 and Q97.5 are the 2.5% and 97.5% quantiles of the posterior parameter distribution. For details on how the posterior distributions were calculated refer to the code online (Ami = Amisulpride, Nal = Naltrexone).(b) Results of a Bayesian logistic linear model predicting percentage correct from drug groups.(c) Mood in mean and standard deviation at time of pill intake (T1) and 3 hr later (T2).(d) Drug effects on differences in positive PANAS scales (centralized) between sessions. Drug variables coded as follows: nal is as dummy variable for naltrexone (1 for naltrexone, 0 otherwise), ami is a dummy variable for amisulpride, and serum_ami is a dummy variable for serum (1 only in the high serum group, and 0 otherwise). (e) Drug effects on differences in negative PANAS scales (centralized) between sessions. Drug variables coded as before. (f) Drug effects on session differences in ω, including mood at baseline, difference in mood from baseline and working memory performance as covariates, as well as sex, age and weight as moderators of effects. Drug variables coded as before, all other dependent variables scaled and centralized. (g) Drug effects on session differences in η, including mood at baseline, difference in mood from baseline and working memory performance as covariates, as well as sex, age and weight as moderators of effects. Drug variables coded as before, all other dependent variables scaled and centralized. (h) Drug effects on session differences in ω, from genetic variables. Drug variables coded as before, all other dependent variables scaled and centralized. (i) Drug effects on session differences in ω, from genetic variables. Drug variables coded as before, all other dependent variables scaled and centralized. (j) Description of participants in terms of body mass index (BMI), age and sex with mean (m) and standard deviation (sd). (k) Distribution of genotypes. (l) Number of participants per drug group used in analysis.

https://cdn.elifesciences.org/articles/79661/elife-79661-supp1-v1.docx
MDAR checklist
https://cdn.elifesciences.org/articles/79661/elife-79661-mdarchecklist1-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Nace Mikus
  2. Sebastian Korb
  3. Claudia Massaccesi
  4. Christian Gausterer
  5. Irene Graf
  6. Matthäus Willeit
  7. Christoph Eisenegger
  8. Claus Lamm
  9. Giorgia Silani
  10. Christoph Mathys
(2022)
Effects of dopamine D2/3 and opioid receptor antagonism on the trade-off between model-based and model-free behaviour in healthy volunteers
eLife 11:e79661.
https://doi.org/10.7554/eLife.79661