On the normative advantages of dopamine and striatal opponency for learning and choice

  1. Alana Jaskir  Is a corresponding author
  2. Michael J Frank  Is a corresponding author
  1. Brown University, United States

Abstract

The basal ganglia (BG) contribute to reinforcement learning (RL) and decision making, but unlike artificial RL agents, it relies on complex circuitry and dynamic dopamine modulaton of opponent striatal pathways to do so. We develop the OpAL* model to assess the normative advantages of this circuitry. In OpAL*, learning induces opponent pathways to differentially emphasize the history of positive or negative outcomes for each action. Dynamic DA modulation then amplifies the pathway most tuned for the task environment. This efficient coding mechanism avoids a vexing explore-exploit tradeoff that plagues traditional RL models in sparse reward environments. OpAL* exhibits robust advantages over alternative models, particularly in environments with sparse reward and large action spaces. These advantages depend on opponent and nonlinear Hebbian plasticity mechanisms previously thought to be pathological. Finally, OpAL* captures risky choice patterns arising from DA and environmental manipulations across species, suggesting that they result from a normative biological mechanism.

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. Simulation code is available on the authors' GitHub repositories https://github.com/amjaskir/opal-star

Article and author information

Author details

  1. Alana Jaskir

    Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, United States
    For correspondence
    alana_jaskir@brown.edu
    Competing interests
    No competing interests declared.
  2. Michael J Frank

    Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, United States
    For correspondence
    Michael_Frank@brown.edu
    Competing interests
    Michael J Frank, Senior editor, eLife.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8451-0523

Funding

National Institute of Mental Health (P50MH119467)

  • Michael J Frank

National Institute of Mental Health (R01 MH084840)

  • Michael J Frank

National Institutes of Health (S10OD025181)

  • Michael J Frank

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2023, Jaskir & Frank

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,397
    views
  • 332
    downloads
  • 31
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Alana Jaskir
  2. Michael J Frank
(2023)
On the normative advantages of dopamine and striatal opponency for learning and choice
eLife 12:e85107.
https://doi.org/10.7554/eLife.85107

Share this article

https://doi.org/10.7554/eLife.85107

Further reading

    1. Neuroscience
    Sven Ohl, Martin Rolfs
    Research Article

    Detecting causal relations structures our perception of events in the world. Here, we determined for visual interactions whether generalized (i.e. feature-invariant) or specialized (i.e. feature-selective) visual routines underlie the perception of causality. To this end, we applied a visual adaptation protocol to assess the adaptability of specific features in classical launching events of simple geometric shapes. We asked observers to report whether they observed a launch or a pass in ambiguous test events (i.e. the overlap between two discs varied from trial to trial). After prolonged exposure to causal launch events (the adaptor) defined by a particular set of features (i.e. a particular motion direction, motion speed, or feature conjunction), observers were less likely to see causal launches in subsequent ambiguous test events than before adaptation. Crucially, adaptation was contingent on the causal impression in launches as demonstrated by a lack of adaptation in non-causal control events. We assessed whether this negative aftereffect transfers to test events with a new set of feature values that were not presented during adaptation. Processing in specialized (as opposed to generalized) visual routines predicts that the transfer of visual adaptation depends on the feature similarity of the adaptor and the test event. We show that the negative aftereffects do not transfer to unadapted launch directions but do transfer to launch events of different speeds. Finally, we used colored discs to assign distinct feature-based identities to the launching and the launched stimulus. We found that the adaptation transferred across colors if the test event had the same motion direction as the adaptor. In summary, visual adaptation allowed us to carve out a visual feature space underlying the perception of causality and revealed specialized visual routines that are tuned to a launch’s motion direction.

    1. Neuroscience
    Gergely F Turi, Sasa Teng ... Yueqing Peng
    Research Article

    Synchronous neuronal activity is organized into neuronal oscillations with various frequency and time domains across different brain areas and brain states. For example, hippocampal theta, gamma, and sharp wave oscillations are critical for memory formation and communication between hippocampal subareas and the cortex. In this study, we investigated the neuronal activity of the dentate gyrus (DG) with optical imaging tools during sleep-wake cycles in mice. We found that the activity of major glutamatergic cell populations in the DG is organized into infraslow oscillations (0.01–0.03 Hz) during NREM sleep. Although the DG is considered a sparsely active network during wakefulness, we found that 50% of granule cells and about 25% of mossy cells exhibit increased activity during NREM sleep, compared to that during wakefulness. Further experiments revealed that the infraslow oscillation in the DG was correlated with rhythmic serotonin release during sleep, which oscillates at the same frequency but in an opposite phase. Genetic manipulation of 5-HT receptors revealed that this neuromodulatory regulation is mediated by Htr1a receptors and the knockdown of these receptors leads to memory impairment. Together, our results provide novel mechanistic insights into how the 5-HT system can influence hippocampal activity patterns during sleep.