A scene with an invisible wall - navigational experience shapes visual scene representation

  1. Department of Cognitive Science, Johns Hopkins University, Baltimore, MD, USA
  2. Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, MD, USA
  3. Department of Psychology, New York University, New York City, NY, USA
  4. Department of Psychology, Yonsei University, Seoul, Republic of Korea

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Arun SP
    Indian Institute of Science Bangalore, Bangalore, India
  • Senior Editor
    Yanchao Bi
    Beijing Normal University, Beijing, China

Reviewer #1 (Public Review):

In this study, Li et al. aim to determine the effect of navigational experience on visual representations of scenes. Participants first learn to navigate within simple virtual environments where navigation is either unrestricted or restricted by an invisible wall. Environments are matched in terms of their spatial layout and instead differ primarily in terms of their background visual features. In a later same/different task, participants are slower to distinguish between pairs of scenes taken from the same navigation condition (i.e. both restricted or both unrestricted) than different navigation conditions. Neural response patterns in the PPA also discriminate between scenes from different navigation conditions. These results suggest that navigational experience influences perceptual representations of scenes. This is an interesting study, and the results and conclusions are clearly explained and easy to follow. There are a few points that I think would benefit from further consideration or elaboration from the authors, which I detail below.

First, I am a little sceptical of the extent to which the tasks are able to measure navigational or perceptual experience with the scenes. The training procedure seems like it wouldn't require obtaining substantial navigational experience as the environments are all relatively simple and only require participants to follow basic paths, rather than encouraging more active exploration of a more complex environment. Furthermore, in the same/different task, all images show the same view of the environment (meaning they show the exact same image in the "same environment" condition). The task is therefore really a simple image-matching task and doesn't require participants to meaningfully extract the perceptual or spatial features of the scenes. An alternative would have been to present different views of the scenes, which would have prevented the use of image-matching and encouraged further engagement with the scenes themselves. Ultimately, the authors do still find a response time difference between the navigation conditions, but the effect does appear quite small. I wonder if the design choices could be obscuring larger effects, which might have been better evident if the navigational and perceptual tasks had encouraged greater encoding of the spatial and perceptual features of the environment. I think it would be helpful for the authors to explain their reasons for not employing such designs, or to at least give some consideration to alternative designs.

Figure 1B illustrates that the non-navigable condition includes a more complicated environment than the navigable condition, and requires following a longer path with more turns in it. I guess this is a necessary consequence of the experiment design, as the non-navigable condition requires participants to turn around and find an alternative route. Still, this does introduce spatial and perceptual differences between the two navigation conditions, which could be a confounding factor. What do the response times for the "matched" condition in the same/different task look like if they are broken down by the navigable and non-navigable environments? If there is a substantial difference between them, it could be that this is driving the difference between the matched and mismatched conditions, rather than the matching/mismatching experience itself.

In both experiments, the authors determined their sample sizes via a priori power analyses. This is good, but a bit more detail on these analyses would be helpful. How were the effect sizes estimated? The authors say it was based on other studies with similar methodologies - does this mean the effect sizes were obtained from a literature search? If so, it would be good to give some details of the studies included in this search, and how the effect size was obtained from these (e.g., it is generally recommended to take a lower bound over studies). Or is the effect size based on standard guidelines (e.g., Cohen's d ≈ 0.5 is a medium effect size)? If so, why are the effect sizes different for the two studies?

Reviewer #2 (Public Review):

Summary:

Li and colleagues applied virtual reality (VR) based training to create different navigational experiences for a set of visually similar scenes. They found that participants were better at visually discriminating scenes with different navigational experiences compared to scenes with similar navigational experiences. Moreover, this experience-based effect was also reflected in the fMRI data, with the PPA showing higher discriminability for scenes with different navigational experiences. Together, their results suggest that previous navigational experiences shape visual scene representation.

Strengths:

(1) The work has theoretical value as it provides novel evidence to the ongoing debate between visual and non-visual contributions to scene representation. While the idea that visual scene representation can encode navigational affordances is not new (e.g., Bonner & Epstein, 2017, PNAS), this study is one of the first to demonstrate that navigational experiences can causally shape visual scene representation. Thus, it serves as a strong test for the hypothesis that our visual scene representations involve encoding top-down navigational information.

(2) The training paradigm with VR is novel and has the potential to be used by the broader community to explore the impact of experience on other categorical visual representations.

(3) The converging evidence from behavioral and fMRI experiments consolidates the work's conclusion.

Weaknesses:

(1) While this work attempts to demonstrate the effect of navigational experience on visual scene representation, it's not immediately clear to what extent such an effect necessarily reflects altered visual representations. Given that scenes in the navigable condition were more explored and had distinct contextual associations than scenes in the non-navigable condition (where participants simply turned around), could the shorter response time for a scene pair with mismatched navigability be explained by the facilitation of different contextual associations or scene familiarities, rather than changes in perceptual representations? Especially when the visual similarity of the scenes was high and different visual cues might not have been immediately available to participants, the different contextual associations and/or familiarity could serve as indirect cues to facilitate participants' judgment, even if perceptual representations remained intact.

(2) Similarly, the above-chance fMRI classification results in the PPA could also be explained by the different contextual associations and/or scene familiarities between navigable and non-navigable scenes, rather than different perceptual processes related to scene identification.

(3) For the fMRI results, the specificity of the experience effect on the PPA is not strictly established, making the statement "such top-down effect was unique to the PPA" groundless. A significant interaction between navigational conditions and ROIs would be required to make such a claim.

(4) For the behavioral results, the p-value of the interaction between groups and the navigational conditions was 0.05. I think this is not a convincing p-value to rule out visual confounding for the training group. Moreover, from Figure 2B, there appears to be an outlier participant in the control group who deviates dramatically from the rest of the participants. If this outlier is excluded, will the interaction become even less significant?

(5) Experiment 1 only consists of 25 participants in each group. This is quite a small sample size for behavioral studies when there's no replication. It would be more convincing if an independent pre-registered replication study with a larger sample size could be conducted.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation