Semantic plasticity across timescales in the human brain

  1. Psychology, University of Pennsylvania
  2. Radiology, University of Minnesota

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Roberto Bottini
    University of Trento, Trento, Italy
  • Senior Editor
    Yanchao Bi
    Beijing Normal University, Beijing, China

Reviewer #1 (Public Review):

Summary:

This study investigates how the human brain flexibly adjusts its representations of the world as the environment continually changes. The authors identified regions where the representation continuously drifted across multiple months. They also found that the representation in the parahippocampal cortex could be rapidly influenced by recent environmental inputs.

Strengths:

(1) This study touches upon a crucial but less-explored issue: the relationship between semantic knowledge updating and representation drift in the brain.

(2) This study addresses this issue with a unique dataset in which participants viewed objects embedded in thousands of natural scenes across many fMRI sessions over eight months.

(3) The method for investigating whether the recent inputs could change the neural representation is compelling (i.e., subtracting the backward correlation value from the forward correlation value).

Weaknesses:

(1) Statistical Inference.

(a) Statistical inference is across eight subjects. Low statistical power means high false positive rates.

(b) Multiple comparisons across brain regions were not corrected.

(2) Object Encoding

It is unclear whether the identified brain regions represent the objects (as declared in the manuscript) or the visual features shared by pictures of similar items. Such visual features could be those of the background (e.g., spatial layout or the color tone of the scene), not the objects.

(3) Semantic Content in the MTL

Items with higher levels of semantic association tend to cooccur in the same picture. The results could be driven by the number of pictures shared between each pair of items, not semantic similarity (as declared in the manuscript).

(4) Long-term Drift of Item Representations in the MTL

(a) The results show a long-term representational drift in the brain but provide no evidence suggesting that this long-term neural representational drift reflects the drift in semantic representation. Although the authors used the "semantic" mask defined in the previous step, it does not mean the representation drift in the semantic mask is semantic, and there is doubt whether the "semantic" mask defined in the previous step is really semantic (see the third point).

(b) The beta value of the drift can not be directly compared across regions. Different regions have different sizes and signal-to-noise ratios in the BOLD signal. Their within-item similarity can not be compared directly in the first place.

(5) Recent Structure Rapidly Influences Item Representations in PHC

(a) It is unclear why the authors implement additional modularity analysis instead of directly using the pairwise co-occurrence frequencies among the 80 items, which is more straightforward.

(b) It does not make sense to compare the recent structure to the long-term structure across all 30 sessions because the structure of the posterior sessions cannot influence the current structure updating.

(c) It is unclear how the authors calculate the structure-induced change in the PHC in Figure 7.

Reviewer #2 (Public Review):

Summary:

The authors set out to uncover which brain regions might support the continuous updating of semantic associations thereby showing a system of semantic plasticity. Using fMRI data from participants viewing thousands of natural scene images over 30 recording sessions, they hoped to establish how objects co-occuring with each other within images influences the semantic representations in the human brain that relate to those object concepts.

Strengths:

There is a lot to like about the paper. A major strength of the methods and results is the convincing demonstration of many of the results. This includes showing item representations in the ventral visual pathway and medial temporal lobes (MTL), as we would expect. They also show semantic effects - defined using the word co-occurrence vectors from word2vec, along the posterior and anterior ventral visual pathway and MTL - replicating various past studies. The authors use a creative approach to show that the item representations measured within each session are modulated by the co-occurrence structure in previous trials, becoming more closely related. And that item representations seem to subtly change over the course of the 30 sessions, in that they become less related to each other with increasing distance. However, the semantic effects within each session itself are claimed to remain unchanged.

Weaknesses:

This leads to what I see as a weakness in the study. The conclusions relate to semantic plasticity and the changes in semantic (associative) representations. The drift analyses do appear to show representational changes across the sessions, but this is based on the item representations. The inference is that this is due to an updating of knowledge about the associations each item has had with other items. Yet, in the same regions, the authors suggest that semantic associative effects, as tested using word2vec for each session, remain stable. Doesn't this seem to contradict the claims about semantic plasticity?

Some of this is difficult to unpick as the semantic stability analysis using word2vec in each session is only very briefly mentioned, and the data is not shown (I would include it). So, at present, I feel they show evidence of representational changes but do not show evidence of what the nature of the change is. If the neural representations consistently reflect the long-term semantic associations (which is what word2vec captures), then how does this combine with the drift effects of item representations?

Does it mean that the changes in item representations do not reflect semantic associative knowledge? And reflect some other non-specified type of information (perhaps as the participants are doing an image memory test).

Another potential weakness is the robustness of the drift analysis itself. For the drift analysis, item representations in each session are compared to all other sessions and then averaged according to the number of intervening sessions. This means the data for item representation with a session difference of 1 will be based on 29 data points, a session difference of 2 on 28 data points ... and a session difference of 29 based on 1 data point. So there is a huge imbalance in the amount of data that goes into the analysis for the different numbers of intervening sessions. This leads me to wonder if it could impact the validity of the results. An alternative might be to use 1 datapoint for each session (or a suitable value, I imagine 5 would still give enough data to analyse drifts) and calculate drift, and then repeat this with different partitions of the data to see how stable it is, and if drift is reliably occurring. Alternatively, the analyses they use might have been used and validated previously.

To be clear, I do think this is a very nice study and will have a positive impact on researchers interested in object processing, semantic knowledge, statistical learning, and schemas. But think there are some gaps between what the data shows evidence for, and the ultimate inferences made.

Reviewer #3 (Public Review):

Summary:

This study characterizes the relative stability of semantic representations in the human brain using functional magnetic resonance imaging (fMRI) data. The authors suggest that representations in the early stages of processing within the visual system are stable over hours, weeks, and months, while representations in later stages of processing - within the medial temporal lobe - change more rapidly, sometimes within the span of a single fMRI session.

To make this claim, the authors conduct a series of analyses using a well-established fMRI dataset. This begins with a decoding analysis to identify regions that contain reliable object-specific information. This approach identifies early stages within canonical visual cortices (e.g., primary visual cortex, V1), as well as downstream regions within the medial temporal lobe (MTL); this includes perirhinal cortex (PRC), parahippocampal cortex (PHC), and several subfields within the hippocampal cortex (e.g., CA1). Next, they identify regions that are correlated with "semantic features" associated with these objects, determined using word2vec embeddings of each of these object names. Several regions within the MTL (CA1, PRC, PHC) were significantly correlated with these word2vec embeddings. The authors then turn their analyses to representational change across two different timescales. Between scan sessions, regions at early stages of visual processing (e.g., V1) contain relatively stable representations, while regions within the MTL decreased their auto-correlation across sessions, suggesting that there is increased representational change/drift in the MTL. Finally, the authors demonstrate that there is representational change with PHC within a single scan session - changes that reflect the statistics of visual experiences.

Strengths:

The analyses conducted in this study are solid and creative and they yield compelling theoretical results. Beyond the paper's central claims, this study also highlights the utility of publicly available datasets (i.e., NSD) in exploring and evaluating novel theoretical ideas.

Especially compelling is the combined analysis used to estimate reliable item-level representations, first, and then the long-term drift of item representations (i.e., between sessions). The design choices for modeling the fMRI data (e.g., the cross-validated approach to predicting voxel-level responses) reflect state-of-the-art analysis methods, while the control regions used in these analyses (e.g., V1) provide compelling contrasts to the experimental effects. This makes it clear that the observed representational drift/instability is not present throughout the visual system. These results indicate that this effect is worthy of future experiments, while also providing auxiliary information related to effect size, etc.

Weaknesses:

The concerns outlined here do not challenge the central claim within this study, relating to the relative instability of representations within the MTL as compared to V1. Instead, these concerns focus on whether these representations should be described as "semantic," the importance we should give to the distinction between PHC and other MTC structures, and the lack of systematic analysis in relation to the "gradient" from posterior to anterior regions. In each case, I have provided suggestions as to how these concerns might be addressed. Finally, I've made a note about whether these data should be interpreted in terms of neural "plasticity" given the lack of behavioral change in relation to these fMRI data.

(1) No reason to believe that representations within the MTL are necessarily 'semantic.'

The authors suggest "evoked object representations in CA1, PHC, and PRC are semantic in nature." However, the correlation between fMRI responses and word2vec embeddings-the only evidence for "semantic" representations-is ambiguous. These structures might contain high-dimensional features that are associated with these objects for other reasons; concretely, there might be visual information that is not semantic but relates to the reliable visual properties of these objects (e.g., texture, shape, location in the image). Yet there are no analyses to disambiguate between these alternative accounts. As such, labeling these as "semantic" representations is suggestive but premature. Nonetheless, developing such a control analysis should be relatively straightforward. I outline one possible approach below.

While "semantic" information is a relatively nebulous term in the cognitive neurosciences, contemporary deep-learning methods might offer unambiguous ways to characterize such representations. If we assume that "semantics" relate to the meaning of an object/entity and not the "low-level" sensory attributes related to encoding this information, this leads to a straightforward implementation of object semantics: the reliable variance that can be isolated within the residuals of a sensory encoder. For example, do word2vec embeddings explain variance within the medial temporal lobe above and beyond the variance explained by a vision-only image encoder? Of course, care must be taken to use a visual encoder which is not itself a crystallization of object semantics (e.g., encoders optimized using a classification objective), but this is all very feasible given contemporary computer vision methods. Adding such a control analysis would offer a significant improvement over the current approach, clarifying the nature of the stimulus-driven representations within the medial temporal lobe by disentangling "semantic" properties of reliable visual features.

Additionally, it is not clear whether results from the current "object encoding" analysis and "semantic detection" analysis differ because of underlying differences in representational content in these regions or because of design choices in these analyses themselves. That is, while the object encoding analysis learns a linear projection from a one-hot 80-dimensional vector to hemodynamic responses in each brain region, the semantic detection analysis correlates these predicted hemodynamic responses with word2vec embeddings associated with each of these 80 objects. These different analysis methods result in different outcomes: not all regions identified by the object encoding analysis are also identified in the semantic detection analysis (e.g., hippocampal subfields). It is not clear to what degree these different outcomes are a function of "semantic" information, or are simply a consequence of differences in analytic approaches. It would be useful to know the results by repeating the logic from the object encoding analysis, but instead of 1-hot vectors for each object, use the word2vec embeddings.

(2) Unclear if the differences between PHC and other MTL structures are driven by SNR.

Parahippocampal cortex (PHC) is a region reliably identified by the analyses in this study: PHC is identified in the analysis of item encoding, semantic content, and representational drift across long (between-session) and short (within-session) timescales. Control regions here provide a convincing contrast to PHC in each of these analyses, and so the role of PHC appears clear in these analyses. However, it is unclear how to interpret the difference between PHC and other structures within the MTC - namely, the observation that PHC alone is influenced by representational drift across shorter timescales. It's possible that these effects are common throughout the MTL, but are only evident in PHC because of increased SNR. This concern seems plausible when observing PHC's "encoding success" and "semantic content," both visually and statistically, relative to other MTL structures: the magnitude of PHC's effect appears greater, which could simply be an artifact of PHC's relatively high SNR. In fMRI data, for example, PRC typically has relatively low SNR due to field inhomogeneities related to dropout, due to PRC's relative proximity to the ear canal-which is exacerbated in 7T (vs 3T) scanners, which was the case for the data in this study.

Addressing this concern could be relatively straightforward. For example, including information about the SNR in each respective brain region would be very helpful. If the SNR across brain regions within the MTL is relatively uniform, then this already addresses the concern above. regardless, it would be useful to report the experimental effects in relation, for example, the split-half reliability of signal in each brain region. That is, instead of simply reporting that that results are significant across brain regions, the authors might estimate how reliable the variance is across brain regions, and use this reliable variance as a ceiling which can be used to normalize the amount of variance explained in each analysis. By providing an account of the differences in the reliability/SNR of different regions, we would have a much better estimate of the relative importance of differences in the results reported for different regions within the MTL.

(3) Need for more systematic analysis/visualization of "posterior" vs. "anterior" regions.

The authors report that "Whole-brain analyses revealed a gradient of plasticity in the temporal lobe, with drift more evident in anterior than posterior areas." However, the only contrast provided in the main text is between MTL structures and V1-there is no "gradient" in any of these analyses. There are other regions visualized in Supplemental Figure 3, but there is not a systematic evaluation of the gradient along a "posterior/anterior" axis. It would be helpful to see the results in Figures 3A, 4A, 5A, and 6A to include other posterior visual regions (e.g., V4, LOC, PPA, FFA) beyond V1.

(4) Without behavioral data, not a direct relationship with "stability-plasticity tradeoff"

The results from this study are framed in relation to a "stability-plasticity tradeoff." As argued in this manuscript, this tradeoff is central to animal behavior - our ability to rapidly deploy prior knowledge to respond to the world around us. Given that there are no behavioral measures used in the current study, however, no claims can be made about how these fMRI data might relate to learning, or behavior more generally. As such, framing these results in terms of a stability-plasticity tradeoff is tenuous. "Representational drift," on the other hand, is a term that is relatively agnostic in its relationship with behavior, and aptly describes the results presented here. The authors refer to this term as well. Considering the lack of behavioral evidence, alongside the core findings from these neuroimaging data, "representational stability" or "representational "drift" seems to be a more direct description of the available data than "neural plasticity" or a "stability-plasticity tradeoff."

Author response:

We are very appreciative of the reviewers’ assessment that we used “solid and creative” methods to provide a “convincing demonstration” of “compelling theoretical results” on a “crucial but less-explored issue” in cognitive neuroscience. We are also grateful for their thoughtful suggestions for analyses and for pointing out areas where our analysis descriptions need more clarity. While we will respond to all comments in a future response and revision, here we provide information and clarification on a few central points.

Localization of semantic content:

Regarding our semantic analysis, one reviewer rightly pointed out that items with a high degree of semantic association, as captured by word2vec, tend to occur in the same images, and they expressed concern that this could drive our similarity results. We wish to clarify here (and will revise the manuscript accordingly) that we excluded all pairs of co-occurring items in our word2vec semantic analysis in order to avoid this issue. Thus, our results cannot be driven by the number of images within which items co-occurred. We also agree with the reviewer who stated that “semantic information” is a nebulous term in the cognitive neurosciences, and it appears to have led to some confusion as to the nature of our claims. We take a broad view of this term, with the perspective that visual features (e.g., color, shape) can contribute to semantic content rather than necessarily competing with it. In our work, we use word2vec to identify neural representations that reflect the kind of semantic content present in word embedding models—but the conclusions we draw do not depend on these representations being devoid of visual content. That is, we do not use word2vec to examine semantic versus visual representations, but rather to narrow down the set of representations to be considered in subsequent analyses. While there are a range of legitimate views on what should be considered a “semantic” representation, our broad view, which is inclusive of visual content, along with our strategy for localizing semantic content are both standardly used in the visual neuroscience literature. Prior work in this literature has compared the ability of word2vec and low-level visual models to predict neural responses to natural images and found that the brain regions in which activity is accurately predicted by the models are considerably distinct: whereas a low-level visual model best predicts activity in V1, V2, and V4, word2vec performs better in more anterior regions, including in visual areas such as lateral occipital cortex (Güçlü & van Gerven, 2015, arXiv). This suggests that our effects are unlikely to be explained by overlap in the kinds of low-level visual features mentioned by the reviewers. However, the semantic content we localize and the representation of high-level visual features may indeed overlap, and this is compatible with our claims. We will do more in our revision to be explicit about our intended meaning in our use of the word “semantic” and how our approach relates to and builds on prior work in this literature.

Long-term representational drift:

We want to clarify our claims regarding the representational drift analysis. One reviewer stated that, while we show evidence of representational drift, we “provide no evidence suggesting that this long-term neural representational drift reflects a drift in semantic representation.” Another reviewer said: “The inference is that this [drift] is due to an updating of knowledge about the associations each item has had with other items,” and that our finding that semantic structure remains stable within these regions seems “to contradict the claims about semantic plasticity.” The claim we intended to make, which will be unpacked more clearly in our revision, is that the neural representations underlying semantic content drift over time, even if the semantic content itself is unchanging. In other words, we do not claim that our across-session drift analyses show changes in knowledge about object associations. Indeed, one of the reasons that representational drift has recently captured the attention of neuroscientists is that the neural representations underlying certain behaviors or cognitive content appear to drift over time even when the behaviors or cognitive content remain fixed. The relational structure of the neural representations can remain stable, even if the particular neurons recruited to represent each stimulus change over time (see, e.g., the T-maze in Rule, O’Leary, & Harvey., 2019, Curr Opin Neurobiol). Here we are translating these ideas, which were developed using animal models and/or primarily focused on low-level vision, to the semantic system in humans. The neural representations we identify in our paper capture semantic information because they share a similarity structure with word2vec, and the level of similarity to word2vec remains stable over time. Thus, our findings provide a simple demonstration of long-term representational drift in the human semantic system akin to that reported in animals—drift in the neural semantic representations of items even as the relations between these item representations appear stable.

Signal-to-noise variability across the MTL:

A reviewer raised the possibility that differences between our ROIs could be driven by variability in signal-to-noise ratio (SNR) across regions, particularly within the medial temporal lobe (MTL). We looked at noise ceiling SNR brain maps for each participant, which reflect the reliability of neural responses across repetitions of the same image. Preliminary analyses indicate that SNR differences do not account for our object encoding, semantic content, representational drift, or short-term plasticity measures across the MTL.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation