Audiovisual task switching rapidly modulates sound encoding in mouse auditory cortex
Abstract
In everyday behavior, sensory systems are in constant competition for attentional resources, but the cellular and circuit-level mechanisms of modality-selective attention remain largely uninvestigated. We conducted translaminar recordings in mouse auditory cortex (AC) during an audiovisual (AV) attention shifting task. Attending to sound elements in an AV stream reduced both pre-stimulus and stimulus-evoked spiking activity, primarily in deep-layer neurons and neurons without spectrotemporal tuning. Despite reduced spiking, stimulus decoder accuracy was preserved, suggesting improved sound encoding efficiency. Similarly, task-irrelevant mapping stimuli during inter-trial intervals evoked fewer spikes without impairing stimulus encoding, indicating that attentional modulation generalized beyond training stimuli. Importantly, spiking reductions predicted trial-to-trial behavioral accuracy during auditory attention, but not visual attention. Together, these findings suggest auditory attention facilitates sound discrimination by filtering sound-irrelevant background activity in AC, and that the deepest cortical layers serve as a hub for integrating extramodal contextual information.
Editor's evaluation
This is an important paper that is methodologically compelling, harnessing a complex behavioral task for modality-specific control of attention to provide new evidence that directed auditory attention produces a global decrease in auditory cortex firing rates without a loss of stimulus-related information. These findings build on previous results showing that task engagement or locomotion down regulates activity in auditory cortex. The manuscript is comprehensive and well-illustrated. It provides highly detailed analysis of the cortical activity modulations during attentional switching that will be valuable to others within and beyond the field of hearing research.
https://doi.org/10.7554/eLife.75839.sa0Introduction
Information from one or another sensory pathway may become differentially relevant due to environmental changes. The brain must therefore continuously assign limited attentional resources to processing simultaneous information streams from each sensory modality. For example, hearing a siren while listening to music in the car might prompt an attentional shift away from the auditory stream, toward a visual search for emergency vehicles. On the other hand, a similar shift away from the music is unlikely while listening at home. In these cases, contextual cues support allocating attention to either the auditory domain or the visual domain, and the perceptual experience of the music is qualitatively different. How might sensory cortex differentially encode stimuli from an attended versus filtered modality?
Attentional selection operates cooperatively at many levels of sensory processing. Most effort has been devoted to understanding the neural mechanisms of feature-selective attention within a single modality (Desimone and Duncan, 1995; Fritz et al., 2007). A major focus of this work has been characterizing transformations of stimulus representations in sensory cortical areas, due to their pivotal position between ascending sensory pathways and behavioral networks implementing top-down control (Lamme et al., 1998; Sutter and Shamma, 2011). These studies, largely from the visual domain, have shown that attention to a stimulus feature or space will often increase stimulus-evoked spiking responses and reduce thresholds for eliciting a response; likewise, responses to unattended stimuli are often decreased (Reynolds and Chelazzi, 2004). On the other hand, fewer studies have examined how modality-selective attention affects encoding in sensory cortex. This mode of attention highlights behaviorally relevant sensory streams while filtering less relevant ones. Human fMRI studies have reported differential activation patterns in auditory and visual cortex (AC, VC) reflecting the attended modality (Johnson and Zatorre, 2005; Petkov et al., 2004; Shomstein and Yantis, 2004; Woodruff et al., 1996). Extending these findings, studies in primate AC and VC have reported entrainment local field potential (LFP) oscillations by modality-selective attention, which serves to modulate excitability and sharpen feature tuning within sensory cortex corresponding to the attended modality (Hocherman et al., 1976; Lakatos et al., 2009; Lakatos et al., 2008; O’Connell et al., 2014). Several findings suggest that these influences may differ among cortical layers and between inhibitory and excitatory neurons (Lakatos et al., 2016; O’Connell et al., 2014).
Nevertheless, there are many open questions about the influence of modality-specific attention on stimulus encoding in sensory cortex. Importantly, potential interplay between ongoing activity and evoked responses during attentional selection, as well as their consequences for information and encoding efficiency, has not been examined. The degree to which influences of modality-specific attention may generalize beyond training stimuli has yet to be elucidated. Finally, how these influences may be differentially expressed in cell subpopulations defined by cortical depth or inhibitory/excitatory cell type similarly remains unknown.
In the present study, we addressed these open questions by examining single neuron activity and sensory responses in mouse AC during an audiovisual (AV) attention shifting task. AC integrates ascending auditory information with diverse input from frontal, cingulate, striatal, and non-auditory sensory areas to rapidly alter sensory processing in response to changing behavioral demands (Budinger et al., 2008; Budinger and Scheich, 2009; Park et al., 2015; Rodgers and DeWeese, 2014; Winkowski et al., 2013). To isolate the influence of modality-selective attentional modulation, we compared responses to identical compound auditory-visual stimuli under different cued contexts requiring attention to the auditory or visual elements, thus holding constant other task-related variables such as arousal, attention, reward expectation, and motor activity (Saderi et al., 2021). Because spike rate and information changes are dissociable (Bigelow et al., 2019; Phillips and Hasenstaub, 2016), we quantified both evoked spike rates and the mutual information (MI) between responses and stimuli. We also examined the generality of modality-specific attention by examining responses to task-irrelevant sounds presented between trials. Finally, we used translaminar probes and spike waveform morphology classification to capture possible attention-related differences in neurons among cortical layers and between putative inhibitory and excitatory cell classes.
Results
AV rule-switching in mice
We trained mice to perform an AV rule-switching task, in which they made decisions using auditory stimuli while ignoring simultaneously presented visual stimuli or vice versa. Trial presentation was self-paced in a virtual foraging environment wherein a visual track was advanced by forward locomotion on a spherical treadmill (Figure 1A). A task-irrelevant random double sweep (RDS) sound was presented during inter-trial intervals (ITIs) for mapping auditory receptive fields in each attentional state (Figure 1B). Decision stimuli were presented after variable track length, consisting of 1 s auditory tone clouds (TCs; centered at 8 or 17 kHz) and/or visual drifting gratings (horizontal or vertical orientation; Figure 1C). One of the decision stimuli for each modality was a rewarded target (AR, VR) and the other an unrewarded distractor (AU, VU). Lick responses following targets (hits) and distractors (false alarms [FAs]) produced water rewards and dark timeouts, respectively. Withholding licks for targets (misses) or distractors (correct rejects [CRs]) advanced the next trial. Each session began with a block of unimodal decision stimuli, which cued the attended modality of a subsequent AV block (Figure 1D). A second unimodal block from the other modality was then presented, cueing the rule for a final AV block. Decision stimuli had identical physical properties but different behavioral significance between rules (e.g., licks following ARVU were rewarded in A-rule but punished in V-rule). Targets and distractor stimuli remained constant throughout training for each mouse and were approximately counterbalanced across animals. Block sequences (A-rule then V-rule, or vice versa) were also counterbalanced across sessions (Figure 1D.c).
We used two approaches to ensure that animals were engaged during both task rules. First, we restricted analysis to sessions in which discrimination was well above chance (d’>1.5) for both rules, and for which FA rates were below 0.5 for the stimuli with reward valences that conflicted across rules (AUVR in the A-rule, ARVU in the V-rule; Figure 1F). Second, for a subset of sessions (n=14 sessions, 5 mice) we measured pupil size, a well-established correlate of arousal and behavioral performance (Bradley et al., 2008; McGinley et al., 2015; Reimer et al., 2014). We used a computer vision algorithm to automate measurement of pupil size (pupil diameter/eye diameter) for each frame acquired by a CCD video camera (Figure 2A.a). To isolate pupil fluctuations reflecting general arousal, pupil size was measured during an ITI window designed to avoid pupil responses to decision stimulus onset, dark timeouts, and decreased locomotion events following reward administration (Figure 2.A.b–c, Figure 2B). Previous studies have reported that pupil size increases with task difficulty and engagement in humans, non-human primates, and rodents (Hess and Polt, 1964; Kawaguchi et al., 2018; Schriver et al., 2018). We reasoned that comparison of pupil size across the rules would allow us to establish whether task demands differed between the rules. No difference in pupil size was observed between rules during bimodal blocks (Figure 2C; A-rule bimodal: 0.29±0.05 norm. diameter ± SD, V-rule bimodal: 0.30±0.05; Z=–1.0, p=0.30, paired Wilcoxon signed-rank [WSR], Benjamini-Hochberg false discovery rate [FDR]-adjusted p-values). As expected, pupil diameters were significantly larger during the bimodal portion of the task, when visual stimuli were present and the task had increased in difficulty, compared to the auditory-only unimodal portion of the task (Figure 2—figure supplement 1; A-rule unimodal: 0.28±0.04, A-rule bimodal: 0.29±0.05; Z=–2.6, p=0.009). A trend toward smaller pupil size in the unimodal visual rule compared to bimodal rule was also noted, but the difference did not reach significance after multiple comparisons correction (V-rule unimodal: 0.29±0.04, V-rule bimodal: 0.30±0.05; Z=–2.0, p=0.062). Because pupil size also closely tracks locomotion (Figure 2A.b–c; McGinley et al., 2015), we examined locomotion speed during the same ITI window (Figure 2D). Differences in locomotion speed were also not observed between rules (all p≥0.623; all |Z|≤1.29, paired WSR). Arousal and motor activity were thus comparable between rules, suggesting that differences in neuronal activity may be attributable to modality-selective attention.
Single unit recording in AC
After mice learned the AV rule-switching task, a craniotomy was made over right AC, to allow for acute recordings during behavior using multichannel probes spanning the full cortical depth (Figure 3A). In total, we recorded AC activity in 10 mice during 23 behavioral sessions meeting inclusion criteria. The putative cortical depth of each sorted single unit (SU) was assigned by calculating the fractional position of the channel with the largest waveform amplitude within the span of channels in AC, as estimated from spontaneous and tone-evoked recordings following the task (Figure 3B). A separate set of experiments to visualize probe tracks with the fluorescent dye Di-I provided support for this depth estimation technique (Figure 3C; DiCarlo et al., 1996; Morrill and Hasenstaub, 2018). We then divided the fractional depth values into superficial, middle, and deep groups, approximating the supragranular, granular, and infragranular laminae. We further divided SUs into narrow-spiking (NS, putative inhibitory; n=130, 18%) and broad-spiking (BS, predominantly excitatory; n=612, 82%) populations based on trough-peak time (Figure 3D; Bigelow et al., 2019; Cardin et al., 2007; Nandy et al., 2017; Phillips et al., 2017).
Modality-selective attention modulates stimulus-evoked firing rates
To measure the effects of modality-selective attention on stimulus processing in AC, we began by comparing SU responses to bimodal decision stimuli across task rules. These responses reflected physically identical stimuli and similar levels of arousal and locomotion, as shown in Figure 2. Activity patterns evoked by decision stimuli and modulatory effects of task rule were diverse (Figure 4A). To capture a predominantly sensory-driven response component, we measured mean firing rates (FRs) during the first 300 ms post-stimulus onset (Figure 4B), which preceded most lick responses (lick latency median: 611 ms; 5–95th percentiles: 289–1078 ms; 5.4% of licks <300 ms, n=2852 total lick trials across dataset). Trials with licks earlier than 300 ms were excluded from analysis. We first compared A-rule and V-rule responses to the TC rewarded in the A-rule (AR*: ARVR and ARVU responses combined). Averaging across units, responses in the deep layers were suppressed in the A-rule relative to the V-rule for both NS and BS units (Figure 4C; deep BS: p=2.8e-4, Z=4.1, median fold change [FC; A-rule/V-rule]: 0.89, n=333 SUs; deep NS: p=0.011, Z=2.9, median FC: 0.87, n=66; paired WSR, FDR-adjusted p-values; see Figure 4—source data 1A for full stats). No significant group-level change was found in middle or superficial units. Consistent with group-level trends, individual units with significant FR decreases in the A-rule (p<0.01, unpaired t-test) substantially outnumbered units with significant FR increases for all unit populations other than superficial and mid-depth BS units (Figure 4C, right).
A similar pattern of attention-related modulation was observed for unrewarded stimuli in the A-rule (AU*: AUVR and AUVU responses combined). At the group level, superficial and middle unit responses did not differ significantly between conditions, whereas deep BS units were suppressed in the A-rule (Figure 4D; deep BS: p=2.0e-06, Z=5.11, median FC: 0.81, n=321; paired WSR, FDR-adjusted p-values; see Figure 4—source data 1B for full stats). Relative fractions of units with significantly modulated FRs to AU* stimuli were similar to those described above for AR* stimuli (Figure 4D, right), with the exception of the superficial group, in which slightly more unitshad increased FRs. We further found that most units showed the same direction of modulation for AR* and AU* stimuli (Figure 4E), with a similar modulation sign observed for 78% of BS units (50% suppressed for both AR* and AU*, 28% enhanced for both) and 99% of NS units (68% suppressed for both, 31% enhanced for both). These findings suggest that modality-selective attention similarly influences FRs evoked by task-relevant target and distractor sounds with different acoustic properties and learned behavioral values.
To determine whether these attentional influences might generalize to task-irrelevant sounds, we examined responses to RDS sounds presented during the ITI. Using the same analysis window (300 ms post-stimulus onset, Figure 4F), we found that attention-related modulation of FR responses evoked by task-irrelevant sounds was highly similar to that observed for both types of decision stimuli: middle- and deep-layer BS and NS populations exhibited group-level FR suppression during the A-rule (Figure 4G), whereas superficial layer units showed no difference (middle BS: p=6.0e-3, Z=3.1, median FC: 0.85, n=112; middle NS: p=0.014, Z=2.7, median FC: 0.65, n=28; deep BS: p=1.2e-6, Z=5.2, median FC: 0.84, n=309; deep NS: p=0.021, Z=2.5, median FC: 0.80, n=64; paired WSR; see Figure 4—source data 2 for full stats). Significantly modulated unit counts were again highly biased toward suppression in the A-rule, with pronounced differences in the middle and deep unit groups (Figure 4G, right). Together, these results show that auditory-selective attention tends to reduce FR responses to sounds, regardless of their behavioral relevance, valence, or spectral content, and that these influences are strongest for deep-layer units.
Modality-selective attention also modulates pre-stimulus FRs
Previous studies have found that modulation of ongoing activity in sensory cortex can influence subsequent sensory-evoked responses (Arieli et al., 1996; Haider and McCormick, 2009). Thus, the response suppression during auditory attention reported above may either reflect specific decreases in stimulus responsivity or general decreases in ongoing activity. To address these possibilities, we quantified FRs in a pre-stimulus window spanning 300 ms prior to decision stimulus onset in which no sounds were presented (Figure 5A.a). Although this window may include anticipatory modulation of activity (Cox et al., 2019; Egner et al., 2010; Samuelsen et al., 2012), it nevertheless provides a measure of baseline activity for comparison with evoked responses. We observed significant group-level decreases in pre-stimulus FRs during the A-rule for units in the middle NS and deep BS groups, but no modulation of superficial units (Figure 5B; middle NS: p=0.039, Z=2.48, median FC: 0.71, n=28; deep BS: p=4.2e-05, Z=4.49, median FC: 0.87, n=336; paired WSR, FDR-adjusted p-values; see Figure 5—source data 1 for full stats). To test whether the reduction in pre-stimulus FR was sufficient to account for stimulus-evoked changes reported above, we recalculated FRs evoked by decision stimuli as FC from pre-stimulus FRs (Figure 5A.b). Following this adjustment and after FDR correction, the middle- and deep-layer unit population responses no longer differed between rules (Figure 5C; Figure 5—source data 2). Together, these results suggest that group-level decreases in evoked FRs during A-rule are largely due to generalized suppression of ongoing AC activity.
Attention-related suppression is driven by units without STRF tuning
We next sought to determine whether attention-related changes in stimulus response were related to the tuning preferences of units, a phenomenon termed ‘feature attention’ previously observed in both monkey VC (Maunsell and Treue, 2006; Treue and Martínez Trujillo, 1999) and AC (Da Costa et al., 2013). The RDS mapping stimulus, which we have previously used to efficiently identify auditory response properties and AV interactions (Bigelow et al., 2022), was used to generate spectrotemporal receptive fields (STRFs) through reverse correlation (Figure 6A; Aertsen and Johannesma, 1981; de Boer, 1968; Gourévitch et al., 2015). Tuning for each STRF was measured through a trial-to-trial reliability metric, which we used to divide units into those with activity changes that were reliably evoked by defined spectral or temporal features (tuned STRFs, n=172; Figure 6C.a) and those without feature-evoked changes (untuned, n=409; Figure 6C.b). Spiking activity levels were higher in tuned units compared to untuned (Figure 6D.a). To control for possible activity level-dependent effects, we compared our population of tuned units to a randomly selected subset of untuned units which was matched for both sample size and FR to the tuned population (Figure 6D.b). We then examined attentional modulation of stimulus responses between the tuned and subsampled untuned groups. Responses to the rewarded TC (AR*), unrewarded TC (AU*), and the RDS mapping stimuli were significantly modulated by task rule in the untuned group, but not the tuned group (tuned: all p≥0.18, all |Z|≤1.72; untuned: all p≤0.023, all |Z|≥2.27; one-way WSR vs. modulation of 1 [equal across rules], FDR-adjusted p-values; Figure 6—source data 1C, D). Nevertheless, comparisons across these tuned and untuned groups showed that the distributions did not significantly differ after multiple comparisons correction (all p≥0.12, all |Z|≤2.06; WSR).
An important caveat is that the RDS stimuli may not capture all units with some degree of tuning preference. As such, a conservative interpretation would be that group-level suppression during auditory attention is driven by units that do not exhibit strong tuning preferences. Additionally, both tuned and untuned populations contained units with significant evoked responses to the two TCs, although fractions of responsive units were higher in the tuned group (Figure 6D.c). This shows that an absence of STRF tuning does not imply that units were not responsive to the task stimuli.
For the tuned group, does frequency preference determine degree of attentional modulation? We measured the best frequency (BF) of the excitatory field in each tuned STRF (Figure 6F). Consistent with previous work showing that task demands shape frequency representation in AC (Atiani et al., 2009; Fritz et al., 2005; Fritz et al., 2003; Yin et al., 2014), we found a strong BF preference for a 1-octave band around the center frequency of the rewarded TC (Figure 6G). Furthermore, distributions of BFs measured during the A-rule and V-rule were strikingly similar. This suggests that in our task, AC had shifted its frequency representation in a manner that was not rule-dependent. To test whether modulation by rule was dependent on tuning, we next divided units by their BF, as measured from the A-rule STRF, into groups near center frequency of AR (±0.5 octaves), near AU or with a BF outside of either band. No difference between the tuning groups was observed for responses to AR* or AU* TCs (Kruskal-Wallis non-parametric ANOVA, all p>0.12, all H<5.5, FDR-adjusted p-values; Figure 6—source data 1C, D), suggesting that frequency tuning does not determine suppression or enhancement by attention in this task.
Attention to sound increases encoding efficiency in deep-layer BS units
Previous work has established that FR changes do not necessarily imply changes in the amount of information spikes carry about sensory stimuli. For instance, optogenetic activation of inhibitory interneurons can reduce FRs in AC without changing information, suggesting increased encoding efficiency (Phillips and Hasenstaub, 2016). By contrast, locomotion reduces both FRs and information in AC (Bigelow et al., 2019). To determine whether reduced FRs evoked by decision stimuli were accompanied by changes in information or encoding efficiency, we used a peristimulus time histogram (PSTH)-based neural pattern decoder to compare sound discrimination across attentional states (Foffani and Moxon, 2004; Hoglen et al., 2018; Malone et al., 2007). For each unit, the decoder generates a single-trial test PSTH and then compares these to two or more template PSTHs from different stimulus response conditions, generated sans test trial (Figure 7A). The test trial is assigned to the template that is closest in n-dimensional Euclidean space, reflecting n PSTH bins. This is repeated for all trials, generating new templates for each classifier run. After all trials have been classified, a confusion matrix is generated. From this, we calculated accuracy of classification, MI (bits), and encoding efficiency, a spike-rate-normalized MI (bits/spike). As in previous analyses, a 0–300 ms post-stimulus onset window was used in this method to restrict decoding to a predominantly sensory-driven component of the response. The binwidth for generating PSTHs was 30 ms (Hoglen et al., 2018). Only trials with correct responses (hits and CRs) and units with a minimum stimulus response FR of 1 Hz to both stimuli used in the decoder comparison were included.
-
Figure 7—source data 1
- https://cdn.elifesciences.org/articles/75839/elife-75839-fig7-data1-v2.xlsx
-
Figure 7—source data 2
- https://cdn.elifesciences.org/articles/75839/elife-75839-fig7-data2-v2.xlsx
-
Figure 7—source data 3
- https://cdn.elifesciences.org/articles/75839/elife-75839-fig7-data3-v2.xlsx
-
Figure 7—source data 4
- https://cdn.elifesciences.org/articles/75839/elife-75839-fig7-data4-v2.xlsx
-
Figure 7—source data 5
- https://cdn.elifesciences.org/articles/75839/elife-75839-fig7-data5-v2.xlsx
We found that task rule could be decoded at greater than chance levels from responses to all four AV stimuli, and at all depth and NS/BS groups, showing that attentional state modulates decision stimulus PSTH responses throughout AC (Figure 7—figure supplement 1; Figure 7—source data 1). These comparisons suggest response modulation by task rule, but do not address how information processing changes across the rules. To test this, we used the decoder to compare accuracy in discriminating between responses to AR* (rewarded in A-rule) and responses to AU* (unrewarded in A-rule) bimodal stimuli across A-rule and V-rule conditions. This mimics the TC discrimination required by the mice during the A-rule. In both rules, classification accuracy for the auditory decision stimuli (AR*, AU*) was higher than chance for all depth and BS/NS groups (see scatter plots in Figure 7B; all p≤1.4e-05, all |Z|≥4.2, one-way WSR vs. chance [50%]; see Figure 7—source data 2A for stats). Sound classification accuracy (AR*, AU*) did not significantly differ across the A-rule and V-rule (Figure 7B, AR* vs. AU* comparison across rules: all p≥0.17, all |Z|≤1.39, see Figure 7—source data 3A for full stats; paired WSR on decoder accuracy in A-rule vs. V-rule by depth and NS/BS groups). Despite a reduction in activity levels during auditory attention, there was no loss in decoder accuracy, suggesting a possible change in encoding efficiency.
Through analysis of all decoder runs, we found that classifier accuracy and raw information were indeed correlated with FR (accuracy: r(3001)=0.49, p=2.3e-180; MI: r(3001)=0.41, p=1.5e-123; Pearson’s correlation, all AR* vs. AU* decoder runs). Thus, normalizing information by mean joint per-trial spike rate for the two responses in each decode (bits/spike) provides insight into the efficiency with which spikes are used to represent stimuli. We found that this encoding efficiency measure increased by ~20% during the A-rule for deep-layer BS units (Figure 7C, AR* vs. AU* comparison across rules: deep BS: p=2.9e-04, Z=–4.06, paired WSR, FDR-adjusted p-value; median FC: 1.19 [fold change: A-rule/V-rule]; V-rule: 0.15±0.13, A-rule: 0.19±0.19, mean bits/spike ± SDs, n=233; all other groups p≥0.40, all |Z|≤1.47; see Figure 7—source data 4A for full stats). No other unit subpopulations showed significant changes. Note that for clarity, the above results are presented as the mean of decoder comparisons ARVR vs. AUVR and ARVU vs. AUVU, thus collapsing visual stimulus identity. Analysis of these comparisons separately yields highly similar results (Figure 7—figure supplement 2; Figure 7—source data 2–4), suggesting that visual stimulus identity does not contribute substantially to decoder accuracy or encoding efficiency at the level of group analysis.
Receptive fields mapped during the ITI also show increased stimulus encoding efficiency
The analyses above revealed that auditory attention increased the per-spike encoding efficiency of task decision sounds. Does this effect of cross-modal attention switching generalize to encoding of sounds that were explicitly designed to be task-irrelevant? This helps determine whether attention observed here is specific to features of the auditory stream or broadly alters encoding of incoming auditory information. To address this, we tested whether information between STRFs derived from task-irrelevant ITI sounds and spike trains was modulated by attentional demands of the task. We restricted our analyses to only those units with STRFs passing the reliability criterion shown in Figure 6B. To calculate STRF-spike train MI for each SU, we first calculated probability distributions of STRF-stimulus projection values for all stimulus time points (P(x)) and for those time points preceding a spike (P(x|spike); Figure 7D). Intuitively, these projection values reflect the similarity between a windowed stimulus segment at a given timepoint and the STRF. The divergence of the two projection distributions is captured in a spike-rate-normalized MI measure (bits/spike; encoding efficiency), which describes the reliability with which spikes are determined by stimulus features of the STRF (Figure 7E). No differences in encoding efficiency between conditions were observed in the superficial or middle BS/NS groups, or the deep NS group. Instead, consistent with our earlier findings for decision stimuli, encoding efficiency showed a significant A-rule increase in the deep BS subpopulation (Figure 7F; deep BS: p=0.014, Z=–3.05, median FC: 1.25, n=50; paired WSR, FDR-adjusted p-value; FC: A-rule/V-rule; mean bits/spike ± SDs; all other groups p≥0.24, all |Z|≤1.66; see Figure 7—source data 5 for full stats). This finding shows that during auditory attention, stimulus encoding is better described by a linear STRF filter and thus better tracks physical sound features. Furthermore, it suggests that increased encoding efficiency resulting from decreased spiking is a general effect of auditory attention in deep-layer BS units, regardless of the context-based behavioral relevance or learned valence of the sounds.
Information encoding efficiency changes are driven by suppressed units
The increase in A-rule encoding efficiency and decrease in average FRs in deep AC led us to further explore the relationship between activity level and information changes. Specifically, we tested whether group-level information efficiency changes are driven by SUs with suppressed responses, and how the minority of units with increased A-rule FRs perform in the decoder. We therefore examined classifier accuracy and encoding efficiency for target and distractor (AR* vs. AU*) decoding separately for deep-layer BS units with increased and decreased FRs in the A-rule (Figure 7—figure supplement 3). We found that units with increased FRs (39%; n=96) exhibited a significant increase in A-rule decoding accuracy (Figure 7—figure supplement 3C; p=0.0030, Z=–2.97, med. FC: 1.04, V-rule % correct: 66.4±15.5, A-rule: 69.5±15.5, n=96; paired WSR; FC: A-rule/V-rule; mean ± SDs), but no significant change in encoding efficiency (p=0.84, Z=0.2, V-rule bits/spike: 0.18±0.15, A-rule: 0.18±0.16). By contrast, units with suppressed FRs (60%; n=146) showed no significant change in decoding accuracy (Figure 7—figure supplement 3D; p=0.44, Z=0.77, V-rule: 67.32±15.25, A-rule: 66.53±14.61), but a 44% increase in encoding efficiency (p=1.8e-07, Z=–5.22, median FC: 1.44, V-rule: 0.13±0.12, A-rule: 0.18±0.19; paired WSR). These results suggest that the minority of units that increase FRs in the A-rule perform marginally better at decoding the auditory stimulus, and that the units that decrease FRs drive the shift in encoding efficiency.
Attention-related FR changes predict correct task performance
To ensure that mice were adequately engaged and attentive in the task, the analyses described above excluded any trials in which the incorrect behavioral response was made. However, an examination of these error trials, which may correlate with lapses in attention, could provide insight into the moment-to-moment behavioral relevance of the attentional effects described above. We have shown that attention to sound is marked by a net suppression of pre-stimulus and evoked FRs. We hypothesized that, if this attentional modulation is behaviorally meaningful, FRs preceding A-rule error trials may be more similar to sound-unattended V-rule trials than to A-rule correct trials. We addressed this possibility by comparing pre-stimulus FRs in error vs. correct trials (300 ms prior to stimulus onset; Figure 8B). Because misses were uncommon (Figure 8A), we restricted our analysis to the comparison of FA and CR trials to allow for adequate sampling of each trial outcome. We included only behavior sessions with at least 10 FA and CR trials (A-rule and V-rule trials considered separately). This decreased unit sample sizes (n=234, 58 across all depth groups for BS, NS; min. group size = 9, 2 for BS, NS). Given the small sample of NS units and the likelihood of insufficient power, NS units were not included in this analysis. When considering BS units with increased FRs in the A-rule, we found no significant group-level difference between A-rule FA and CR trials at any cortical depth (Figure 8C; Figure 8—source data 1A for full stats; all p≥0.29, all |Z|≤1.43, paired WSR, FDR-adjusted p-values). However, deep cortical BS units with A-rule suppression showed significantly higher pre-stimulus FRs prior to A-rule FA trials than CR trials (deep BS [n=98]: mean FR difference between pre-stimulus FA and CR trials = 0.35 Hz, p=0.0098, Z=–3.15; paired WSR; other depth groups: p≥0.28, |Z|≤1.48; all p-values FDR-adjusted). This is unlikely to reflect a motor effect of higher FR before a lick, as it was specific to the A-rule: pre-stimulus FRs in A-rule-suppressed or A-rule-enhanced units did not differ between FA and CR trials in the V-rule (Figure 8C.c; Figure 8—source data 1B; paired WSR: all p≥0.68, all |Z|≤1.59, FDR-adjusted p-values). Together, these findings suggest that FR reductions typical of modality-selective attention directly relate to behavioral outcomes.
Discussion
In the present study, we recorded SU activity across AC layers in mice performing an AV rule-switching task. We compared responses evoked by identical stimuli under conditions of auditory or visual modality-selective attention. Attention to sound shifted AC stimulus representation by decreasing activity of untuned units and increasing encoding efficiency in the deep cortical laminae. Pre-stimulus activity was also reduced by auditory attention, which accounted for changes in stimulus-evoked responses. The effects of attention extended beyond the decision stimuli required to complete the task; responses to task-irrelevant receptive field mapping stimuli exhibited similar reductions in evoked activity and increases in encoding efficiency, suggesting that attention to sound induces a stimulus-general shift in processing. This attentional shift was behaviorally meaningful, with error trials in the A-rule predicted by higher FRs in the set of units that is suppressed under auditory attention. Taken together, these results show that attending to sound results in a general suppression of ongoing activity in AC, while retaining activity critical for sensory representation.
Attentional highlighting of behaviorally relevant signals may employ multiple mechanisms, including response enhancement or noise suppression. Feature selective attention studies have shown that FRs for neurons tuned to attended features are often increased, thereby increasing the reliability of the sensory cortical readout (Desimone and Duncan, 1995; Moran and Desimone, 1985; Reynolds and Chelazzi, 2004). Another mechanism which may act in tandem with response enhancement is the reduction of noise to improve encoding reliability. Noise reduction may act through decreased rates in pre-stimulus baseline activity (Buran et al., 2014), reduced variance in single neuron rates (Mitchell et al., 2007), or decreased correlations of noise across the population (Cohen and Maunsell, 2009; Downer et al., 2015). In the present study, we found no evidence for increased signal-to-noise ratio in the FR signal, as shown by the roughly equal stimulus response magnitudes across rules when normalizing for pre-stimulus rate. However, the timing of activity in AC is known to carry substantial information (Hoglen et al., 2018; Krishna and Semple, 2000; Malone et al., 2007), which would not be captured by coarse rate estimations. By accounting for fine scale temporal patterns with a PSTH-based pattern classifier and analysis of stimulus-STRF selectivity, we show that decreased ongoing activity and a concomitant increase in encoding efficiency at the group level provides an additional mechanism for attentional noise reduction, perhaps refining the stimulus-encoding portion of the neural signal for readout in downstream brain areas.
Previous studies of behavioral state-dependent state changes in auditory processing have typically compared task-engaged and passive sound processing. While this paradigm does not specifically isolate the effects of attention due to confounds of arousal, attention, reward expectation, and motor activity (Saderi et al., 2021), it has provided valuable insight into the dependence of sensory processing on task-engaged behavioral states. Consistent with our findings, this work has shown that AC stimulus-evoked spiking responses are predominantly suppressed during self-initiated task engagement when compared to passive listening (Bagur et al., 2018; Carcea et al., 2017; Kuchibhotla et al., 2017; Otazu et al., 2009). Activity levels preceding a stimulus may also decrease (Buran et al., 2014; Carcea et al., 2017), although some studies in AC do not show this effect (Otazu et al., 2009). Reductions of pre-stimulus activity during task engagement have also been observed in rat gustatory cortex (Yoshida and Katz, 2011) and monkey visual cortex (Bisley and Goldberg, 2003; Cox et al., 2019; Herrington and Assad, 2010; Sato and Schall, 2001).
Neuronal stimulus preferences relative to a target have been shown to determine the degree of attentional modulation, such that stimulus-evoked responses for attended features are generally enhanced but can also be suppressed for features outside of the receptive field (Reynolds and Chelazzi, 2004). Here, we find that frequency preferences of units with STRF tuning do not appear to determine suppression or enhancement within the task, but critically we also find that the bulk of units with STRF tuning exhibit a preference for frequencies near the rewarded TC (Figure 6G). This is consistent with a body of work from Shamma, Fritz, and colleagues showing that engagement in an auditory discrimination task rapidly shifts AC receptive fields to enhance frequency representation of behaviorally relevant stimuli (Atiani et al., 2009; Fritz et al., 2005; Fritz et al., 2003; Yin et al., 2014). In our task, mice were trained for multiple months prior to physiological recordings, and the TC frequencies of rewarded and unrewarded stimuli were held consistent for each animal. As such, spectral representation in the AC of our highly trained mice is biased toward task-relevant stimuli. Speculatively, it is possible that tuning-dependent attentional modulation may occur in earlier stages of task acquisition, but that the substantial reconfiguration of sound processing tailored to the task alters its expression after training. The distribution of preferred frequencies also does not shift between auditory and visual rules, suggesting that attending to visual stimuli does not place plasticity-inducing demands on AC frequency representation. Instead, we find that units without STRF tuning drive the reduction in neural activity during auditory attention. An important caveat is that our STRF-based approach is only one way to determine AC tuning, and other stimulus and analysis methods may reveal additional tuning preferences. Nevertheless, we believe that this method provides a useful classification for degree of tuning. This result is also consistent with our information theoretic analyses in that both suggest that attention to sound may selectively remove spikes that are minimally sound-driven.
As in previous studies, attention-related modulation was not uniformly expressed across cortical depths and neuron types. Changes in both FR and encoding efficiency were most prominent in deep-layer neurons. These findings extend several previous studies reporting larger effects of attention in infragranular LFP and multi-unit activity (MUA) (O’Connell et al., 2014; Zempeltzi et al., 2020). These physiological outcomes are consistent with anatomical work suggesting that top-down modulatory signals arrive primarily in the supragranular and infragranular layers (Felleman and Van Essen, 1991). As the main cortical output layer, information shifts in the infragranular population would differentially influence subcortical sites and other cortical regions (Salin and Bullier, 1995). One important caveat is that superficial AC is known to have lower spontaneous and evoked FRs than deeper cortex (e.g., Figure 4C; Christianson et al., 2011; Sakata and Harris, 2009), which may have made it more difficult for us to observe statistically significant attention-related effects. Furthermore, although we tried to minimize neural tissue damage through technical considerations such as using a slow probe insertion speed (Fiáth et al., 2019), the superficial layers likely sustain the greatest level of damage when the probe is inserted to span the full cortical depth. Despite these factors, we were able to isolate a reasonably large sample size of responsive neurons in superficial cortex from successful behavior sessions (n=119 units, of which 57% were stimulus-responsive). Nevertheless, we cannot rule out whether the absence of observed attentional modulation at superficial depths may have been due to experimental limitations such as the comparatively small sample size. Future work employing imaging techniques to target superficial neurons may help resolve this.
Previous studies have reported larger effects of task engagement or attention in inhibitory interneurons (Kuchibhotla et al., 2017; Mitchell et al., 2007). As such, attention-related reduction of activity could be sustained by inhibitory network drive. Our approach of dividing activity into BS and NS did not suggest a general increase in NS activity during auditory attention. However, we observed heterogenous types of modulation; in many units, NS activity decreased during auditory attention, but in a smaller group, there was a significant increase. An important caveat is that the BS/NS distinction is an imperfect approximation of excitatory/inhibitory activity, with many inhibitory cell types presenting a BS waveform phenotype (e.g., somatostatin-positive interneurons; Li et al., 2015). An alternative mechanism is that excitatory drive is decreased during auditory attention. These two proposed mechanisms – increased inhibitory tone and decreased excitatory drive – are not mutually exclusive.
Our findings suggest that attentional selection is achieved by removal of a noise background on which sound stimulus-encoding activity sits. This is in line with an influential theory of cortical attention that posits that spontaneous activity fluctuations partly reflect internal processes such as mental imagery or memory recall, in contrast with activity that arises from external sensory stimulation (Harris and Thiele, 2011). In this model, attention suppresses internally generated spontaneous activity to favor the processing of behaviorally relevant external stimulation. The work presented here offers multiple pieces of evidence in favor of this theory. Auditory attention suppresses activity in untuned units, affecting both pre-stimulus and stimulus-evoked activity. This activity reduction does not alter stimulus-spike train decoding accuracy, but instead increases stimulus encoding efficiency and preserves stimulus representation.
In summary, we demonstrate a novel connection between attention-induced shifts in activity levels and stimulus encoding in early sensory cortex, which are directly related to behavioral outcomes. Previous research suggests that such effects reflect top-down control by executive networks comprising frontal, parietal, thalamic, and striatal areas (Cools et al., 2004; Crone et al., 2006; Licata et al., 2017; Rikhye et al., 2018; Rougier et al., 2005; Toth and Assad, 2002; Wimmer et al., 2015). These networks may act as a context-dependent switch, routing attentional modulatory feedback to the sensory systems. In the present study, we provide evidence that such modulation specifically suppresses stimulus-irrelevant spiking, thus enhancing encoding efficiency in deep AC neurons.
Materials and methods
Animals
All experiments were approved by the Institutional Animal Care and Use Committee at the University of California, San Francisco. Twenty-seven C57BL/6 background male mice were surgically implanted with a headpost and began behavioral training, of which 10 completed the training and successfully performed the task during physiology recording sessions. All mice began the experiment between ages P56 and P84. Mice used in this report expressed optogenetic effectors in various subsets of interneurons, which we intended to use for optogenetic identification of cells (Lima et al., 2009; analysis not included here). These mice were generated by crossing an interneuron subpopulation-specific Cre driver line (PV-Cre JAX Stock Nr. 012358; Sst-Cre: JAX Stock Nr. 013044) with either the Ai32 strain (JAX Stock Nr. 012569), expressing Cre-dependent eYFP-tagged channelrhodopsin-2, or the Ai40 strain (JAX Stock Nr. 021188), expressing Cre-dependent eGFP-tagged archaerhodopsin-3. Of the 10 behavior mice included in this report, 6 were Ai32/Sst-Cre, 3 were Ai32/PV-Cre, and 1 was Ai40/Sst-Cre. In most experiments (n=21 recordings), brief, low-level optogenetic pulses during the ITI of the task were used to identify opsin-expressing neurons (<0.3 mW light; 5 light pulses of 10 ms duration, every ~1.5 min); these analyses are outside of the scope of this report. The optogenetic stimulation protocol was consistent through A- and V-rules of the task. Unit stimulus response FRs and behavioral response error rates were not statistically different between trials immediately after optogenetic pulses and stimulus-matched trials preceding the pulses.
All mice were housed in groups of 2–5 for the duration of the behavioral training until the craniotomy. Post-craniotomy and during physiology recordings, mice were housed singly (up to 6 days) to protect the surgical site. Mice were kept in a 12 hr/12 hr reversed dark/light cycle. All training occurred during the dark period, when mice show increased activity and behavioral task performance (Roedel et al., 2006).
AV rule-switching behavior task
Request a detailed protocolAdult mice (>P56) were trained on an AV go/no-go rule-switching behavior task. In this task, mice were positioned on a floating spherical treadmill in front of a monitor and a speaker, and an optical computer mouse recorded treadmill movement. Mice licked to receive a reward depending on auditory, visual, or AV stimulus presentation (‘decision’ stimuli, either ‘target’ or ‘distractor’), but the modality predictive of the reward changed partway through the behavioral session. Each session would start with a unimodal go/no-go block, in which a series of auditory (AR, AU; 17 or 8 kHz TC) or visual (VR, VU; upward or rightward moving gratings) stimuli was presented. After stimulus presentation, mice signaled choice by either licking a spout in front of the mouth or withholding licking. Licking at the target unimodal stimulus would trigger a water reward, while licking at the distractor would trigger a short dark timeout. After a fixed number of unimodal trials, the stimuli would become AV, but the rule for which stimulus modality predicted reward would carry over from the unimodal block. All four stimulus combinations (ARVR, ARVU, AUVR, AUVU) would be presented in the AV block, such that two AV combinations would be target stimuli and two would be distractor. Then, after completing a fixed number of trials in the AV block, the task using the rule of the opposite modality would begin; a unimodal block with the other modality would start, followed by a second AV block using the rule from the preceding unimodal block. For any mouse, the stimuli predictive of the reward in each rule was kept constant across days and training sessions (e.g., a 17 kHz TC would always predict a reward in the A-rule, and a rightward grating would always predict a reward in the V-rule).
The task was self-paced using a virtual foraging approach, in which mouse locomotion (measured through treadmill rotation) would cause a track of randomly placed dots on the screen to move downward. After a randomly varied virtual distance, a decision stimulus would be presented, at which point the mouse would lick or withhold licking to signal choice. For receptive field mapping during physiology experiments, an RDS stimulus was presented in-between decision stimuli, during the inter-trial track portion. Stimuli are detailed below.
Behavior training and apparatus
Request a detailed protocolPrior to any training, mice were surgically implanted with a stainless steel headplate, used both for head fixation during the task and for physiology recordings after the task was learned (surgical methods described below). Three days post-implant, mice began a water restriction protocol based on previously published guidelines (Guo et al., 2014). Throughout the course of training, mice received a minimum water amount of 25 mL/kg/day, based on weight at time of surgical implant. After recovery from surgery, mice were given ~7 days to adjust to water restriction. Then, mice were head fixed and habituated to the floating treadmill for 15–30 min daily sessions with no stimulus presentation for 2–3 days. After mice appeared comfortable on the treadmill, a phased behavioral task training regimen began. Mice were trained once daily for ~6 days per week. On day 1, mice were introduced to an auditory-only (A-only) stimulus training version of the task in which AR (‘target’/‘rewarded’) or AU (‘distractor’/‘unrewarded’) stimuli were presented, and a reward would be automatically administered shortly after the onset of AR. Next, the mice were put on an operant version of the A-only task, which required licking any time after the onset of AR to receive a reward and withholding of licking during AU to avoid a dark timeout punishment. Mice achieved proficiency, defined as 2 or more consecutive days of sensitivity index d’>1.5 (see Data analysis for calculation), on the A-only task after 11.0±4.7 days after start of training (median ± SD, n=10 successful mice). Then, a similar training structure was repeated for the visual task: V-only stimulus training with automatic rewards for VR, but not VU, followed by an operant version of the visual task requiring licks for rewards (median time to proficiency: 26.0±7.2 days after start). After learning the tasks for each modality separately, mice were introduced to an auditory-AV (A-AV) version, in which the rule from the auditory stimulus carried over to the AV block. This was intermixed with training days on a visual-AV (V-AV) version of the task. Number of training days on A-AV or V-AV were decided based on prior performance, with extra training given as needed. Mice were considered proficient at this stage after performing with d’>1.5 on each rule (A-AV; V-AV) on 2 consecutive days (median time to proficiency: 40.0±15.8 days after start). Finally, the full rule-switching task was introduced (Figure 1D), generally alternating between days of V-rule-first and the A-rule-first task sequences but allocating more training days to task orders as needed. Because physiology recordings were acute and strictly limited to 6 days after craniotomy, we set a greater threshold for expert-level performance on the full task before advancing to physiology: 3 consecutive days of d’>2.5 (median time to expertise: 90.5±31.8 days). Care was taken to train each mouse at a roughly consistent time of day (no more than ~1–2 hr day-to-day variation). During expert-level task performance, mice typically completed 260–300 trials in a daily session (30 A-only; 100–120 A-AV; 30 V-only; 100–120 V-AV).
The behavior training setup was controlled by two computers: a behavior monitoring and reward control PC (OptiPlex 7040 MT, Dell) and a dedicated stimulus presentation machine running Mac OS X (Mac Mini, Apple). Stimulus presentation was controlled with MATLAB using custom software (https://github.com/HasenstaubLab/AVtrainer-stim/tree/main/demo; copy archived at swh:1:rev:737720f41fd5302b90fd5e60a10822270381818c;path=/demo/; Morrill et al., 2022), and inter-machine communication used the ZeroMQ protocol. Auditory and visual stimuli were generated and presented using the Psychophysics Toolbox Version 3 (Kleiner et al., 2007). Water rewards were administered using a programmable syringe pump (NE-500, New Era Pump Systems, Farmingdale, NY), positioned outside of the sound-attenuating recording chamber. Early in training, water reward volume was set at 0.01 mL per correct response, but over training the reward volume was gradually decreased to 0.006 mL to achieve greater trial counts. Licking events were recorded using a custom photobeam-based lickometer circuit based on plans provided by Evan Remington (Xiaoqin Wang Lab, Johns Hopkins University). Licks were registered when an IR photobeam positioned in front of the lick tube was broken, queried at a sample rate of 100 Hz by an Arduino Uno microcontroller (Arduino, LLC).
In vivo awake recordings during behavior
Request a detailed protocolAnimals in this experiment underwent two surgeries: first, before training a surgery to implant a custom steel headplate over the temporal skull using dental cement was conducted. The animal was anesthetized using isoflurane and a headplate was implanted over AC, ~2.5 mm posterior to bregma and under the squamosal ridge, to allow for physiology recordings after achieving task expertise. When mice completed the training regimen outlined above, a craniotomy surgery was performed. The animal was again anesthetized using isoflurane and an elliptical opening (0.75 mm wide × 1.5 mm long) was made in the skull over AC using a dental drill. This opening was promptly covered with silicone elastomer (Kwik-Cast, World Precision Instruments), and the animal was allowed to recover overnight. The following day, the animal was affixed by its headplate over the treadmill inside of a sound-attenuating recording chamber, the silicone plug over the craniotomy was removed, and the craniotomy was flushed with saline. A silver chloride ground wire was placed into the craniotomy well at a safe distance from the exposed brain. A 64-channel linear probe (20 µm site spacing; Cambridge Neurotech, Cambridge, UK) was slowly inserted in the brain using a motorized microdrive (FHC, Bowdoin, ME) at an approximate rate of ~1 μm/s (Fiáth et al., 2019). After reaching the desired depth, the brain was allowed to settle for 10 min, after which the water spout, lickometer, visual stimulus delivery monitor, and speaker were positioned in front of the mouse, and the behavior session commenced. Behavior sessions were sometimes stopped early and restarted due to poor performance. In approximately half of behavior-physiology sessions (13 of 23 successful recordings), the task was stopped due to low performance after the rule transition and restarted at the beginning (unimodal block) of the second rule. To control for possible effects of task order, attempts were made to counterbalance recordings from A-rule first (15) and V-rule first (8) behavior sessions.
After completion of the behavior task, the water spout and lickometer were removed, and a series of auditory and/or visual passive experiments were conducted in order to characterize the response properties of the recording site. All stimuli were presented with the auditory and visual stimulation apparatus described above. Following completion of these experiments, the probe was slowly removed, and the brain was covered with a thin layer of freshly mixed 2.5% agarose in saline, followed by a layer of silicone elastomer. The animal was returned to its home cage, and the following day the physiological recording process was repeated. Recordings were made for up to 6 days after the craniotomy. The neural signal acquisition system consisted of an Intan RHD2000 recording board and an RHD2164 amplifier (Intan Technologies), sampling at 30 kHz.
Auditory and visual stimuli
Request a detailed protocolIn-task auditory decision stimuli were 1 s TCs, consisting of 50 ms tone pips overlapping by 25 ms, with frequencies in a 1-octave band around either 17 or 8 kHz. TCs were frozen for the duration of the task, so that each mouse always heard the same pip sequences, allowing for direct comparisons of sound-evoked neural responses across rules without concern that stimulus peculiarities may be driving observed differences. TCs were presented at 60 dB SPL. Visual decision stimuli consisted of a circular moving grating stimulus (33° diameter subtended visual space), which appeared at the center of the screen for 1 s (coincident with TC stimulus during bimodal presentation). Gratings moved either upward or rightward with a 4 Hz temporal frequency, 0.09 cycles/degree spatial frequency at 50% contrast. In-between decision stimulus presentations, an RDS stimulus was presented for receptive field mapping (Bigelow et al., 2022; Gourévitch et al., 2015). The RDS comprised two uncorrelated random sweeps that varied continuously and smoothly between 4 and 64 kHz, with a maximum sweep modulation frequency of 20 Hz. RDS stimuli were presented at 50 dB SPL.
After the behavior task, passive auditory search stimuli (pure tones, click trains) were presented to characterize response properties of the electrode channel. Click trains consisted of broadband 5 ms white noise pulses, presented at 20 Hz for 500 ms duration. Pure tone stimuli consisted of 100 ms tones of varied frequencies (4–64 kHz, 0.2 octave spacing) and sound attenuation levels (30–60 dB in 5 dB linear steps), with an interstimulus interval of 500 ms.
Auditory stimuli were presented from a free-field electrostatic speaker (ES1, Tucker-Davis Technologies) driven by an external soundcard (Quad-Capture or Octa-Capture, Roland) sampling at 192 kHz. Sound levels were calibrated using a Brüel & Kjær model 2209 meter and a model 4939 microphone. Visual stimuli were presented on a 19-inch LCD monitor with a 60 Hz refresh rate (Asus VW199), positioned 25 cm in front of the mouse and centered horizontally and vertically on the eyes of the mouse. Monitor luminance was calibrated to 25 cd/m2 for a gray screen, measured at approximate eye level for the mouse.
Data analysis
Behavioral performance
Request a detailed protocolTask performance was evaluated by calculation of the d’ sensitivity index:
where H is hit rate and F is false alarm rate, and Z is the inverse normal transform. Because this transform is undefined for values of 0 or 1 and hit rates of 1 commonly occurred in this study, we employed the log-linear transformation, a standard method for correction of extreme proportions, for all calculations of d’ (Hautus, 1995). In this correction, a value of 0.5 is added to all elements of the 2×2 contingency table that defines performance such that:
where FA is the false alarm count and CR is the correct reject count. To ensure that mice properly transitioned between task rules, d’ values were calculated separately for responses in the A-rule and the V-rule. Behavioral sessions during physiological recording with d’<1.5 in either rule were excluded from analyses, as were any sessions with an FAR >0.5 to stimuli with conflicting reward valances across rules: AUVR in A-rule or ARVU in V-rule (n=23 successful sessions, n=10 mice; 1 session excluded due to recording artifact, see below).
Spike sorting and unit stability evaluation
Request a detailed protocolSpikes were assigned to unit clusters using KiloSort2 (KS2; Pachitariu et al., 2016). Clusters were first evaluated for isolation quality through the automated KS2 unit classification algorithm and then with a custom MATLAB interface. In this second step, clusters with non-neuronal waveforms or 2 ms refractory period violations >0.5% were removed from analysis (Laboy-Juárez et al., 2019; Sukiban et al., 2019). To evaluate stability, activity for each unit was plotted for the recording duration as a raster and binned spike counts (2 min bins) and manually examined for periods with a substantial dropoff in FR (periods flagged for instability: 88 ± 10% [mean ± SD] decrease in FR from median activity level). Flagged unstable periods were marked and removed from analysis (101/742 SUs with flagged durations >10% of recording time). One session meeting behavior performance criteria was excluded due to a high degree of electrical noise contamination.
Classification of units by depth and waveform shape
Request a detailed protocolProbes with electrode spans of 1260 µm were used, allowing for channels below and above AC. During recording, the probe was lowered to a point where several channels showed a prominent drop in field potential amplitude and spiking activity, indicating penetration into the white matter (Land et al., 2013). After behavior sessions, a set of auditory and visual stimulation protocols was used to map response properties of each electrode site, and MUA responses were analyzed. Here, we define MUA as threshold crossings of 4.5 SD above a moving window threshold applied to each channel. Analysis of MUA was restricted to site characterization and is not included in the main results. We analyzed each tone or click PSTH for reliable responses, which we defined as trial-to-trial similarity of p<0.01 (Escabí et al., 2014). We designated the deepest channel with a reliable MUA sound response of any magnitude as the deep cortex-white matter border. Limited somatic spiking in the top layer of cortex prevented the use of MUA as a reliable marker for the superficial cortex-pia border (Senzai et al., 2019), so we instead relied on an LFP-based measure. To define the top border of cortex, the maximum spontaneous LFP (1–300 Hz) amplitude of a 10 s snippet from each channel was plotted, and the channel at which LFP amplitude dropped off to the approximate probe-wise noise floor (i.e., minimum LFP amplitude) was considered the top channel in cortex (Figure 3B.c). These measures were confirmed histologically through Di-I probe marking experiments with a separate group of untrained mice; histology methods described below and elsewhere (Morrill and Hasenstaub, 2018). Marking the top and bottom cortical borders generated a span of channels putatively within AC. This span was used to divide channels into superficial, middle, and deep groups, based on measurements of the fraction of cortex attributed to supragranular (layers 1–3), granular (layer 4), and infragranular (layers 5–6) in the mouse AC (Allen Institute Mouse Brain Atlas; https://mouse.brain-map.org/). SUs were assigned the fractional depth of the channel on which the largest magnitude waveform was recorded.
Clusters were also classified into BS (putatively excitatory) and NS (putatively fast-spiking inhibitory) units on the basis of the bimodal distribution of waveform peak-trough durations (Figure 3D; NS/BS transition boundary = 0.6 ms). From sessions with successful behavior, we recorded 742 SUs from all cortical depths, comprising 17.5% (130) NS units and 82.5% (612) BS units.
FR analysis and trial filters
Request a detailed protocolTo compare FR responses to stimuli across task rules and to the receptive field mapping stimulus, we measured FR in the first 300 ms post-stimulus onset. Only units with nonzero FRs in both rules were included. To ensure that measurements were capturing periods of task engagement, all trials with incorrect responses (misses and FAs) were excluded from all decision-stimulus analyses, with the exception of those shown in Figure 8. We also excluded trials with recorded licks earlier than the 300 ms post-stimulus onset, or in the 500 ms pre-stimulus onset. Given these filters, analyses were restricted to units present in the recording during at least 10 trials (correct behavioral choice and without ‘early licks’) for each stimulus type.
PSTH-based Euclidean distance decoding
Request a detailed protocolA PSTH-based decoder was used to compute the MI between spike trains and stimulus identity (Figure 7A; Foffani and Moxon, 2004; Hoglen et al., 2018; Malone et al., 2007). In this method, two or more responses are compared by generating template PSTHs by removing one test trial. This test trial response is also binned into a single-trial PSTH, and then classified as belonging to the nearest template in n-dimensional Euclidean space, where n is the number of PSTH bins. More formally, the nearest template is that which minimizes the Euclidean norm between test and template vectors (PSTHs). This process is then repeated for all trials comprising the template PSTHs. Decoding accuracy is the percentage of trial responses that are correctly assigned to the stimuli that elicited them. MI is calculated from a confusion matrix of classifications as follows:
where X is the decoder prediction, Y is the actual, P(XiYj) represents the value of the (i, j) element of the confusion matrix, and P(Xi) and P(Yj) are sums on the marginals. This yields a value of MI in bits. To measure encoding efficiency (bits/spike), we normalized MI by the joint mean spikes per trial of the responses submitted to the decoder (Bigelow et al., 2019; Buracas et al., 1998; Zador, 1998).
For consistency with FR analyses, a time window of 0–300 ms, where stimulus onset is 0, was chosen for decoding analysis. A PSTH binwidth of 30 ms was chosen based on optimal binwidth calculations for mouse AC using the same decoding method (Hoglen et al., 2018). To filter out units with low responsiveness to any of the stimuli in a given decoding analysis, we required a minimum FR of 1 Hz during the 0–300 ms window in both stimulus conditions. As such, unit sets may differ between each decoding analysis due to units that were responsive to one set of stimuli but unresponsive to others.
STRF analysis
Request a detailed protocolTo test whether task rule modulates auditory receptive fields, we presented an RDS stimulus (described in Auditory and visual stimuli) in-between trials for durations of ~1–15 s, depending on rate of task progression. Different randomly generated RDS segments were presented in each ITI, and STRFs were generated separately for each rule. Because total RDS duration varied between the A-rule and the V-rule in a single session, we equated presentation time across rules by truncating the segments of the rule with greater RDS time (presentation time in each rule: 6.8±2.6 min [mean ± SD]; n=23 sessions). This ensured that different stimulus presentation times did not bias STRF estimation. The first 200 ms of RDS response was dropped from all STRF analyses to minimize bias from onset transients. SU activity during these short RDS segments was used to generate STRFs for each segment using standard reverse correlation techniques (Aertsen and Johannesma, 1981; de Boer, 1968; Gourévitch et al., 2015). In brief, the spike-triggered average was calculated by summing all stimulus segments that preceded spikes using a window of 200 ms before and 50 ms after each spike. The choice of 200 ms prior to each spike reflects the upper limit of temporal integration times of auditory cortical neurons (Atencio and Schreiner, 2013), and the 50 ms post-spike time was included to estimate acausal values, that is, those that would be expected by chance given the stimulus and spike train statistics (Gourévitch et al., 2015). STRFs were transformed into units of FR (Hz) using standard methods discussed elsewhere (Rutkowski et al., 2002). Units with poorly defined STRFs were filtered out using a trial-to-trial correlation metric (Escabí et al., 2014): STRF segments were randomly divided into two halves, re-averaged separately, and a correlation value was calculated for the two STRFs. This process was then repeated 1000 times, and the mean of correlations defined the reliability value for each STRF. We compared the mean observed STRF reliability to a null distribution of reliabilities, generated by repeating the procedure on null STRFs made from circularly shuffled spike trains (preserving spike count and interspike interval but breaking the timing relationship between spikes and stimulus). A p-value was calculated as the fraction of the null STRF reliabilities greater than the mean observed STRF reliability, and STRFs with p<0.05 in either rule were included in subsequent analyses. Any STRFs from units with greater than 10% of recording duration marked as unstable were removed from analysis.
MI between a spike train and an STRF was measured as the divergence of two distributions: one reflecting the similarity of the windowed stimulus segments (RDS) preceding a spike and the STRF, and the other reflecting the similarity of all possible windowed stimulus segments and the STRF, regardless of whether a spike occurred (Figure 7D; Atencio et al., 2008; Atencio and Schreiner, 2012; Escabi and Schreiner, 2002). Stimulus-STRF similarity was defined as the inner product of the STRF and the stimulus segment of equivalent dimensions, with higher values reflecting closer matches between the STRF and stimulus. The distribution P(z|spike) was generated from , where s represents all RDS stimulus segments that preceded a spike. Then the distribution P(z) was made from similarity calculations of all possible windowed RDS segments and the STRF. The mean μ and the standard deviation (SD) σ of P(z) were calculated, and the distributions were transformed into units of SD: , yielding distributions of P(x|spike) and P(x) expressed in units of SD.
Using the distributions described above, a spike count-normalized measure of MI between the calculated STRF and the spike train can be calculated as:
We used this value to compare how well STRFs from A-rule and V-rule ITIs predict a spike train, and thus whether activity in each attentional condition is well described by this canonical filter model.
Statistics
All statistical calculations were performed in MATLAB r2019a and its Statistics and Machine Learning Toolbox, V11.5. For group comparisons of SU responses across task rules, paired WSR tests were used, unless otherwise noted. Because tests were performed separately on each depth and spike waveform subpopulation, the Benjamini-Hochberg FDR procedure was used to correct for multiple comparisons, typically across n=6 comparisons (three depth groups, two spike waveform groups; Benjamini and Hochberg, 1995). This method relies on controlling the Type I error rate (here, q=0.05), providing increased power over typical family-wise error rate controls. To determine if individual SUs were significantly modulated by rule, an unpaired Student’s t-test on FR was used with a threshold of p<0.01. Descriptive statistics reported in text are mean ± standard deviation (SD), unless otherwise noted. Fractional change values between task rules are reported as the median of the A-rule/V-rule. All other statistical tests are described in Results. Sample sizes (n) are indicated for each comparison in Results or source data files.
Histological verification of depth measurement
Request a detailed protocolTo test the accuracy of our depth estimation method based on physiological responses (Figure 3), we presented the pure tone search stimuli described above to a separate set of untrained control mice while performing extracellular recordings (n=11 recordings from four mice; Ai32/Sst-Cre). Before insertion, the probe was painted with the fluorescent lipophilic dye Di-I (DiCarlo et al., 1996; Morrill and Hasenstaub, 2018). The depth measurement procedure based on physiological signals was carried out as described above, and then probe tracks from each recording were visualized as described previously (Morrill and Hasenstaub, 2018). Briefly, after recordings, the animal was euthanized, and the brain was removed and placed into a solution of 4% PFA in PBS (0.1 m, pH 7.4) for 12 hr, followed by 30% sucrose in PBS solution for several days. The brain was then frozen and sliced using a sliding microtome (SM2000R, Leica Biosystems) and slices were imaged with a fluorescence microscope (BZ-X810, Keyence). Di-I probe markings showing cortical depth were consistent with physiological activity-based depth measurements described above (Figure 3B–C).
Data availability
Physiology and behavior data supporting all figures in this manuscript have been submitted to Dryad with https://doi.org/10.7272/Q6BV7DVM.
-
Dryad Digital RepositoryAudiovisual task switching rapidly modulates sound encoding in mouse auditory cortex.https://doi.org/10.7272/Q6BV7DVM
References
-
BookStimulus Choices for Spike-Triggered Receptive Field Analysis. Handbook of Modern Techniques in Auditory CortexNova Biomedical.
-
Controlling the false discovery rate: A practical and powerful approach to multiple testingJournal of the Royal Statistical Society 57:289–300.https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
-
Visual modulation of firing and spectrotemporal receptive fields in mouse auditory cortexCurrent Research in Neurobiology 3:100040.https://doi.org/10.1016/j.crneur.2022.100040
-
Behaviorally gated reduction of spontaneous discharge can improve detection thresholds in auditory cortexThe Journal of Neuroscience 34:4076–4081.https://doi.org/10.1523/JNEUROSCI.4825-13.2014
-
Stimulus feature selectivity in excitatory and inhibitory neurons in primary visual cortexThe Journal of Neuroscience 27:10333–10344.https://doi.org/10.1523/JNEUROSCI.1692-07.2007
-
Depth-dependent temporal response properties in core auditory cortexThe Journal of Neuroscience 31:12837–12848.https://doi.org/10.1523/JNEUROSCI.2863-11.2011
-
Attention improves performance primarily by reducing interneuronal correlationsNature Neuroscience 12:1594–1600.https://doi.org/10.1038/nn.2439
-
Differential responses in human striatum and prefrontal cortex to changes in object and rule relevanceThe Journal of Neuroscience 24:1129–1135.https://doi.org/10.1523/JNEUROSCI.4312-03.2004
-
Neural evidence for dissociable components of task-switchingCerebral Cortex 16:475–486.https://doi.org/10.1093/cercor/bhi127
-
Tuning in to sound: frequency-selective attentional filter in human primary auditory cortexThe Journal of Neuroscience 33:1858–1863.https://doi.org/10.1523/JNEUROSCI.4405-12.2013
-
Reverse correlation I. - A heuristic introduction to the technique of triggered correlation with application to the analysis of compound systemsProc Kon Ned Acad Wetensch 71:472–486.
-
Neural mechanisms of selective visual attentionAnnual Review of Neuroscience 18:193–222.https://doi.org/10.1146/annurev.ne.18.030195.001205
-
Marking microelectrode penetrations with fluorescent dyesJournal of Neuroscience Methods 64:75–81.https://doi.org/10.1016/0165-0270(95)00113-1
-
Task engagement selectively modulates neural correlations in primary auditory cortexThe Journal of Neuroscience 35:7565–7574.https://doi.org/10.1523/JNEUROSCI.4094-14.2015
-
Expectation and surprise determine neural population responses in the ventral visual streamThe Journal of Neuroscience 30:16601–16608.https://doi.org/10.1523/JNEUROSCI.2770-10.2010
-
Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrainThe Journal of Neuroscience 22:4114–4131.
-
A high-density, high-channel count, multiplexed μecog array for auditory-cortex recordingsJournal of Neurophysiology 112:1566–1583.https://doi.org/10.1152/jn.00179.2013
-
PSTH-based classification of sensory stimuli using ensembles of single neuronsJournal of Neuroscience Methods 135:107–120.https://doi.org/10.1016/j.jneumeth.2003.12.011
-
Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortexNature Neuroscience 6:1216–1223.https://doi.org/10.1038/nn1141
-
Differential dynamic plasticity of A1 receptive fields during multiple spectral tasksThe Journal of Neuroscience 25:7623–7635.https://doi.org/10.1523/JNEUROSCI.1318-05.2005
-
Auditory attention--focusing the searchlight on soundCurrent Opinion in Neurobiology 17:437–455.https://doi.org/10.1016/j.conb.2007.07.011
-
Cortical state and attentionNature Reviews. Neuroscience 12:509–523.https://doi.org/10.1038/nrn3084
-
Corrections for extreme proportions and their biasing effects on estimated values ofd′Behavior Research Methods, Instruments, & Computers 27:46–51.https://doi.org/10.3758/BF03203619
-
Amplitude modulation coding in awake mice and squirrel monkeysJournal of Neurophysiology 119:1753–1766.https://doi.org/10.1152/jn.00101.2017
-
Differentiating between models of perceptual decision making using pupil size inferred confidenceThe Journal of Neuroscience 38:8874–8888.https://doi.org/10.1523/JNEUROSCI.0735-18.2018
-
What’s new in psychtoolbox-3?The Perception Lecture 1:S101.https://doi.org/10.1177/03010066070360S101
-
Auditory temporal processing: responses to sinusoidally amplitude-modulated tones in the inferior colliculusJournal of Neurophysiology 84:255–273.https://doi.org/10.1152/jn.2000.84.1.255
-
Parallel processing by cortical inhibition enables context-dependent behaviorNature Neuroscience 20:62–71.https://doi.org/10.1038/nn.4436
-
Global dynamics of selective attention and its lapses in primary auditory cortexNature Neuroscience 19:1707–1717.https://doi.org/10.1038/nn.4386
-
Feedforward, horizontal, and feedback processing in the visual cortexCurrent Opinion in Neurobiology 8:529–535.https://doi.org/10.1016/s0959-4388(98)80042-1
-
Posterior parietal cortex guides visual decisions in ratsThe Journal of Neuroscience 37:4954–4966.https://doi.org/10.1523/JNEUROSCI.0105-17.2017
-
Dynamic amplitude coding in the auditory cortex of awake rhesus macaquesJournal of Neurophysiology 98:1451–1474.https://doi.org/10.1152/jn.01203.2006
-
Feature-based attention in visual cortexTrends in Neurosciences 29:317–322.https://doi.org/10.1016/j.tins.2006.04.001
-
Visual information present in infragranular layers of mouse auditory cortexThe Journal of Neuroscience 38:2854–2862.https://doi.org/10.1523/JNEUROSCI.3102-17.2018
-
Layer specific sharpening of frequency tuning by selective attention in primary auditory cortexThe Journal of Neuroscience 34:16496–16508.https://doi.org/10.1523/JNEUROSCI.2055-14.2014
-
Engaging in an auditory task suppresses responses in auditory cortexNature Neuroscience 12:646–654.https://doi.org/10.1038/nn.2306
-
Attentional modulation of human auditory cortexNature Neuroscience 7:658–663.https://doi.org/10.1038/nn1256
-
Diverse effects of stimulus history in waking mouse auditory cortexJournal of Neurophysiology 118:1376–1393.https://doi.org/10.1152/jn.00094.2017
-
ATTENTIONAL modulation of visual processingAnnual Review of Neuroscience 27:611–647.https://doi.org/10.1146/annurev.neuro.26.041002.131039
-
Corticocortical connections in the visual system: structure and functionPhysiological Reviews 75:107–154.https://doi.org/10.1152/physrev.1995.75.1.107
-
Pre-excitatory pause in frontal eye field responsesExperimental Brain Research 139:53–58.https://doi.org/10.1007/s002210100750
-
Pupil-linked arousal modulates behavior in rats performing a whisker deflection direction discrimination taskJournal of Neurophysiology 120:1655–1670.https://doi.org/10.1152/jn.00290.2018
-
Control of attention shifts between vision and audition in human cortexThe Journal of Neuroscience 24:10702–10706.https://doi.org/10.1523/JNEUROSCI.2939-04.2004
-
Frontal cortex activation causes rapid plasticity of auditory cortical processingThe Journal of Neuroscience 33:18134–18148.https://doi.org/10.1523/JNEUROSCI.0180-13.2013
-
Rapid spectrotemporal plasticity in primary auditory cortex during behaviorThe Journal of Neuroscience 34:4396–4408.https://doi.org/10.1523/JNEUROSCI.2799-13.2014
-
Control of prestimulus activity related to improved sensory coding within a discrimination taskThe Journal of Neuroscience 31:4101–4112.https://doi.org/10.1523/JNEUROSCI.4380-10.2011
-
Impact of synaptic unreliability on the information transmitted by spiking neuronsJournal of Neurophysiology 79:1219–1229.https://doi.org/10.1152/jn.1998.79.3.1219
Article and author information
Author details
Funding
National Institutes of Health (R01NS116598)
- Andrea R Hasenstaub
National Institutes of Health (R01DC014101)
- Andrea R Hasenstaub
National Science Foundation (GFRP)
- Ryan J Morrill
Hearing Research Incorporated
- Andrea R Hasenstaub
Klingenstein Foundation
- Andrea R Hasenstaub
Coleman Memorial Fund
- Andrea R Hasenstaub
National Institutes of Health (F32DC016846)
- James Bigelow
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by National Institutes of Health grants R01DC014101 and NS116598 to ARH, the National Science Foundation GRFP to RJM, the Klingenstein Foundation to ARH, Hearing Research Inc to ARH, and the Coleman Memorial Fund to ARH.
Ethics
All experiments were approved by the Institutional Animal Care and Use Committee at the University of California, San Francisco.
Copyright
© 2022, Morrill et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,444
- views
-
- 325
- downloads
-
- 4
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Medicine
- Neuroscience
The advent of midazolam holds profound implications for modern clinical practice. The hypnotic and sedative effects of midazolam afford it broad clinical applicability. However, the specific mechanisms underlying the modulation of altered consciousness by midazolam remain elusive. Herein, using pharmacology, optogenetics, chemogenetics, fiber photometry, and gene knockdown, this in vivo research revealed the role of locus coeruleus (LC)-ventrolateral preoptic nucleus noradrenergic neural circuit in regulating midazolam-induced altered consciousness. This effect was mediated by α1 adrenergic receptors. Moreover, gamma-aminobutyric acid receptor type A (GABAA-R) represents a mechanistically crucial binding site in the LC for midazolam. These findings will provide novel insights into the neural circuit mechanisms underlying the recovery of consciousness after midazolam administration and will help guide the timing of clinical dosing and propose effective intervention targets for timely recovery from midazolam-induced loss of consciousness.
-
- Neuroscience
Gamma oscillations in brain activity (30–150 Hz) have been studied for over 80 years. Although in the past three decades significant progress has been made to try to understand their functional role, a definitive answer regarding their causal implication in perception, cognition, and behavior still lies ahead of us. Here, we first review the basic neural mechanisms that give rise to gamma oscillations and then focus on two main pillars of exploration. The first pillar examines the major theories regarding their functional role in information processing in the brain, also highlighting critical viewpoints. The second pillar reviews a novel research direction that proposes a therapeutic role for gamma oscillations, namely the gamma entrainment using sensory stimulation (GENUS). We extensively discuss both the positive findings and the issues regarding reproducibility of GENUS. Going beyond the functional and therapeutic role of gamma, we propose a third pillar of exploration, where gamma, generated endogenously by cortical circuits, is essential for maintenance of healthy circuit function. We propose that four classes of interneurons, namely those expressing parvalbumin (PV), vasointestinal peptide (VIP), somatostatin (SST), and nitric oxide synthase (NOS) take advantage of endogenous gamma to perform active vasomotor control that maintains homeostasis in the neuronal tissue. According to this hypothesis, which we call GAMER (GAmma MEdiated ciRcuit maintenance), gamma oscillations act as a ‘servicing’ rhythm that enables efficient translation of neural activity into vascular responses that are essential for optimal neurometabolic processes. GAMER is an extension of GENUS, where endogenous rather than entrained gamma plays a fundamental role. Finally, we propose several critical experiments to test the GAMER hypothesis.