Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorRoberto BottiniUniversity of Trento, Trento, Italy
- Senior EditorYanchao BiBeijing Normal University, Beijing, China
Reviewer #1 (Public Review):
Summary:
This study investigates how the human brain flexibly adjusts its representations of the world as the environment continually changes. The authors identified regions where the representation continuously drifted across multiple months. They also found that the representation in the parahippocampal cortex could be rapidly influenced by recent environmental inputs.
Strengths:
(1) This study touches upon a crucial but less-explored issue: the relationship between semantic knowledge updating and representation drift in the brain.
(2) This study addresses this issue with a unique dataset in which participants viewed objects embedded in thousands of natural scenes across many fMRI sessions over eight months.
(3) The method for investigating whether the recent inputs could change the neural representation is compelling (i.e., subtracting the backward correlation value from the forward correlation value).
Weaknesses:
(1) Statistical Inference.
(a) Statistical inference is across eight subjects. Low statistical power means high false positive rates.
(b) Multiple comparisons across brain regions were not corrected.
(2) Object Encoding
It is unclear whether the identified brain regions represent the objects (as declared in the manuscript) or the visual features shared by pictures of similar items. Such visual features could be those of the background (e.g., spatial layout or the color tone of the scene), not the objects.
(3) Semantic Content in the MTL
Items with higher levels of semantic association tend to cooccur in the same picture. The results could be driven by the number of pictures shared between each pair of items, not semantic similarity (as declared in the manuscript).
(4) Long-term Drift of Item Representations in the MTL
(a) The results show a long-term representational drift in the brain but provide no evidence suggesting that this long-term neural representational drift reflects the drift in semantic representation. Although the authors used the "semantic" mask defined in the previous step, it does not mean the representation drift in the semantic mask is semantic, and there is doubt whether the "semantic" mask defined in the previous step is really semantic (see the third point).
(b) The beta value of the drift can not be directly compared across regions. Different regions have different sizes and signal-to-noise ratios in the BOLD signal. Their within-item similarity can not be compared directly in the first place.
(5) Recent Structure Rapidly Influences Item Representations in PHC
(a) It is unclear why the authors implement additional modularity analysis instead of directly using the pairwise co-occurrence frequencies among the 80 items, which is more straightforward.
(b) It does not make sense to compare the recent structure to the long-term structure across all 30 sessions because the structure of the posterior sessions cannot influence the current structure updating.
(c) It is unclear how the authors calculate the structure-induced change in the PHC in Figure 7.
Reviewer #2 (Public Review):
Summary:
The authors set out to uncover which brain regions might support the continuous updating of semantic associations thereby showing a system of semantic plasticity. Using fMRI data from participants viewing thousands of natural scene images over 30 recording sessions, they hoped to establish how objects co-occuring with each other within images influences the semantic representations in the human brain that relate to those object concepts.
Strengths:
There is a lot to like about the paper. A major strength of the methods and results is the convincing demonstration of many of the results. This includes showing item representations in the ventral visual pathway and medial temporal lobes (MTL), as we would expect. They also show semantic effects - defined using the word co-occurrence vectors from word2vec, along the posterior and anterior ventral visual pathway and MTL - replicating various past studies. The authors use a creative approach to show that the item representations measured within each session are modulated by the co-occurrence structure in previous trials, becoming more closely related. And that item representations seem to subtly change over the course of the 30 sessions, in that they become less related to each other with increasing distance. However, the semantic effects within each session itself are claimed to remain unchanged.
Weaknesses:
This leads to what I see as a weakness in the study. The conclusions relate to semantic plasticity and the changes in semantic (associative) representations. The drift analyses do appear to show representational changes across the sessions, but this is based on the item representations. The inference is that this is due to an updating of knowledge about the associations each item has had with other items. Yet, in the same regions, the authors suggest that semantic associative effects, as tested using word2vec for each session, remain stable. Doesn't this seem to contradict the claims about semantic plasticity?
Some of this is difficult to unpick as the semantic stability analysis using word2vec in each session is only very briefly mentioned, and the data is not shown (I would include it). So, at present, I feel they show evidence of representational changes but do not show evidence of what the nature of the change is. If the neural representations consistently reflect the long-term semantic associations (which is what word2vec captures), then how does this combine with the drift effects of item representations?
Does it mean that the changes in item representations do not reflect semantic associative knowledge? And reflect some other non-specified type of information (perhaps as the participants are doing an image memory test).
Another potential weakness is the robustness of the drift analysis itself. For the drift analysis, item representations in each session are compared to all other sessions and then averaged according to the number of intervening sessions. This means the data for item representation with a session difference of 1 will be based on 29 data points, a session difference of 2 on 28 data points ... and a session difference of 29 based on 1 data point. So there is a huge imbalance in the amount of data that goes into the analysis for the different numbers of intervening sessions. This leads me to wonder if it could impact the validity of the results. An alternative might be to use 1 datapoint for each session (or a suitable value, I imagine 5 would still give enough data to analyse drifts) and calculate drift, and then repeat this with different partitions of the data to see how stable it is, and if drift is reliably occurring. Alternatively, the analyses they use might have been used and validated previously.
To be clear, I do think this is a very nice study and will have a positive impact on researchers interested in object processing, semantic knowledge, statistical learning, and schemas. But think there are some gaps between what the data shows evidence for, and the ultimate inferences made.
Reviewer #3 (Public Review):
Summary:
This study characterizes the relative stability of semantic representations in the human brain using functional magnetic resonance imaging (fMRI) data. The authors suggest that representations in the early stages of processing within the visual system are stable over hours, weeks, and months, while representations in later stages of processing - within the medial temporal lobe - change more rapidly, sometimes within the span of a single fMRI session.
To make this claim, the authors conduct a series of analyses using a well-established fMRI dataset. This begins with a decoding analysis to identify regions that contain reliable object-specific information. This approach identifies early stages within canonical visual cortices (e.g., primary visual cortex, V1), as well as downstream regions within the medial temporal lobe (MTL); this includes perirhinal cortex (PRC), parahippocampal cortex (PHC), and several subfields within the hippocampal cortex (e.g., CA1). Next, they identify regions that are correlated with "semantic features" associated with these objects, determined using word2vec embeddings of each of these object names. Several regions within the MTL (CA1, PRC, PHC) were significantly correlated with these word2vec embeddings. The authors then turn their analyses to representational change across two different timescales. Between scan sessions, regions at early stages of visual processing (e.g., V1) contain relatively stable representations, while regions within the MTL decreased their auto-correlation across sessions, suggesting that there is increased representational change/drift in the MTL. Finally, the authors demonstrate that there is representational change with PHC within a single scan session - changes that reflect the statistics of visual experiences.
Strengths:
The analyses conducted in this study are solid and creative and they yield compelling theoretical results. Beyond the paper's central claims, this study also highlights the utility of publicly available datasets (i.e., NSD) in exploring and evaluating novel theoretical ideas.
Especially compelling is the combined analysis used to estimate reliable item-level representations, first, and then the long-term drift of item representations (i.e., between sessions). The design choices for modeling the fMRI data (e.g., the cross-validated approach to predicting voxel-level responses) reflect state-of-the-art analysis methods, while the control regions used in these analyses (e.g., V1) provide compelling contrasts to the experimental effects. This makes it clear that the observed representational drift/instability is not present throughout the visual system. These results indicate that this effect is worthy of future experiments, while also providing auxiliary information related to effect size, etc.
Weaknesses:
The concerns outlined here do not challenge the central claim within this study, relating to the relative instability of representations within the MTL as compared to V1. Instead, these concerns focus on whether these representations should be described as "semantic," the importance we should give to the distinction between PHC and other MTC structures, and the lack of systematic analysis in relation to the "gradient" from posterior to anterior regions. In each case, I have provided suggestions as to how these concerns might be addressed. Finally, I've made a note about whether these data should be interpreted in terms of neural "plasticity" given the lack of behavioral change in relation to these fMRI data.
(1) No reason to believe that representations within the MTL are necessarily 'semantic.'
The authors suggest "evoked object representations in CA1, PHC, and PRC are semantic in nature." However, the correlation between fMRI responses and word2vec embeddings-the only evidence for "semantic" representations-is ambiguous. These structures might contain high-dimensional features that are associated with these objects for other reasons; concretely, there might be visual information that is not semantic but relates to the reliable visual properties of these objects (e.g., texture, shape, location in the image). Yet there are no analyses to disambiguate between these alternative accounts. As such, labeling these as "semantic" representations is suggestive but premature. Nonetheless, developing such a control analysis should be relatively straightforward. I outline one possible approach below.
While "semantic" information is a relatively nebulous term in the cognitive neurosciences, contemporary deep-learning methods might offer unambiguous ways to characterize such representations. If we assume that "semantics" relate to the meaning of an object/entity and not the "low-level" sensory attributes related to encoding this information, this leads to a straightforward implementation of object semantics: the reliable variance that can be isolated within the residuals of a sensory encoder. For example, do word2vec embeddings explain variance within the medial temporal lobe above and beyond the variance explained by a vision-only image encoder? Of course, care must be taken to use a visual encoder which is not itself a crystallization of object semantics (e.g., encoders optimized using a classification objective), but this is all very feasible given contemporary computer vision methods. Adding such a control analysis would offer a significant improvement over the current approach, clarifying the nature of the stimulus-driven representations within the medial temporal lobe by disentangling "semantic" properties of reliable visual features.
Additionally, it is not clear whether results from the current "object encoding" analysis and "semantic detection" analysis differ because of underlying differences in representational content in these regions or because of design choices in these analyses themselves. That is, while the object encoding analysis learns a linear projection from a one-hot 80-dimensional vector to hemodynamic responses in each brain region, the semantic detection analysis correlates these predicted hemodynamic responses with word2vec embeddings associated with each of these 80 objects. These different analysis methods result in different outcomes: not all regions identified by the object encoding analysis are also identified in the semantic detection analysis (e.g., hippocampal subfields). It is not clear to what degree these different outcomes are a function of "semantic" information, or are simply a consequence of differences in analytic approaches. It would be useful to know the results by repeating the logic from the object encoding analysis, but instead of 1-hot vectors for each object, use the word2vec embeddings.
(2) Unclear if the differences between PHC and other MTL structures are driven by SNR.
Parahippocampal cortex (PHC) is a region reliably identified by the analyses in this study: PHC is identified in the analysis of item encoding, semantic content, and representational drift across long (between-session) and short (within-session) timescales. Control regions here provide a convincing contrast to PHC in each of these analyses, and so the role of PHC appears clear in these analyses. However, it is unclear how to interpret the difference between PHC and other structures within the MTC - namely, the observation that PHC alone is influenced by representational drift across shorter timescales. It's possible that these effects are common throughout the MTL, but are only evident in PHC because of increased SNR. This concern seems plausible when observing PHC's "encoding success" and "semantic content," both visually and statistically, relative to other MTL structures: the magnitude of PHC's effect appears greater, which could simply be an artifact of PHC's relatively high SNR. In fMRI data, for example, PRC typically has relatively low SNR due to field inhomogeneities related to dropout, due to PRC's relative proximity to the ear canal-which is exacerbated in 7T (vs 3T) scanners, which was the case for the data in this study.
Addressing this concern could be relatively straightforward. For example, including information about the SNR in each respective brain region would be very helpful. If the SNR across brain regions within the MTL is relatively uniform, then this already addresses the concern above. regardless, it would be useful to report the experimental effects in relation, for example, the split-half reliability of signal in each brain region. That is, instead of simply reporting that that results are significant across brain regions, the authors might estimate how reliable the variance is across brain regions, and use this reliable variance as a ceiling which can be used to normalize the amount of variance explained in each analysis. By providing an account of the differences in the reliability/SNR of different regions, we would have a much better estimate of the relative importance of differences in the results reported for different regions within the MTL.
(3) Need for more systematic analysis/visualization of "posterior" vs. "anterior" regions.
The authors report that "Whole-brain analyses revealed a gradient of plasticity in the temporal lobe, with drift more evident in anterior than posterior areas." However, the only contrast provided in the main text is between MTL structures and V1-there is no "gradient" in any of these analyses. There are other regions visualized in Supplemental Figure 3, but there is not a systematic evaluation of the gradient along a "posterior/anterior" axis. It would be helpful to see the results in Figures 3A, 4A, 5A, and 6A to include other posterior visual regions (e.g., V4, LOC, PPA, FFA) beyond V1.
(4) Without behavioral data, not a direct relationship with "stability-plasticity tradeoff"
The results from this study are framed in relation to a "stability-plasticity tradeoff." As argued in this manuscript, this tradeoff is central to animal behavior - our ability to rapidly deploy prior knowledge to respond to the world around us. Given that there are no behavioral measures used in the current study, however, no claims can be made about how these fMRI data might relate to learning, or behavior more generally. As such, framing these results in terms of a stability-plasticity tradeoff is tenuous. "Representational drift," on the other hand, is a term that is relatively agnostic in its relationship with behavior, and aptly describes the results presented here. The authors refer to this term as well. Considering the lack of behavioral evidence, alongside the core findings from these neuroimaging data, "representational stability" or "representational "drift" seems to be a more direct description of the available data than "neural plasticity" or a "stability-plasticity tradeoff."