The inevitability and superfluousness of cell types in spatial cognition

Xiaoliang Luo; Robert M Mok; Bradley C Love

doi:10.7554/eLife.99047.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Reviewing Editor
Panayiota Poirazi
FORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece
Senior Editor
Panayiota Poirazi
FORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece

Reviewer #1 (Public Review):

Summary:
This study investigated spatial representations in deep feedforward neural network models (DDNs) that were often used in visual tasks. The authors create a three-dimensional virtual environment, and let a simulated agent randomly forage in a smaller two-dimensional square area. The agent "sees" images of the room within its field of view from different locations and heading directions. These images were processed by DDNs. Analyzing model neurons in DDNs, they found response properties similar to those of place cells, border cells and head direction cells in various layers of deep nets. A linear readout of network activity can recover key spatial variables. In addition, after removing neurons with strong place/border/head direction selectivity, one can still decode these spatial variables from the remaining neurons in the DNNs. Based on these results, the authors argue that that the notion of functional cell types in spatial cognition is misleading.

Strengths:
This paper contains interesting and original ideas, and I enjoy reading it. Most previous studies (e.g., Banino, Nature, 2018; Cueva & Wei, ICLR, 2018; Whittington et al, Cell, 2020) using deep network models to investigate spatial cognition mainly relied on velocity/head rotation inputs, rather than vision (but see Franzius, Sprekeler, Wiskott, PLoS Computational Biology, 2007). Here, the authors find that, under certain settings, visual inputs alone may contain enough information about the agent's location, head direction and distance to the boundary, and such information can be extracted by DNNs. If confirmed, this is potentially an interesting and important observation.

Weaknesses:
While the findings reported here are interesting, it is unclear whether they are the consequence of the specific model setting, and how well they would generalize. Furthermore, I feel the results are over-interpreted. There are major gaps between the results actually shown and the claim about the "superfluousness of cell types in spatial cognition". Evidence directly supporting the overall conclusion seems to be weak at the moment.

Major concerns:

(1) The authors reported that, in their model setting, most neurons throughout the different layers of CNNs show strong spatial selectivity. This is interesting and perhaps also surprising. It would be useful to test/assess this prediction directly based on existing experimental results. It is possible that the particular 2-d virtual environment used is special. The results will be strengthened if similar results hold for other testing environments.

In particular, examining the pictures shown in Fig. 1A, it seems that local walls of the 'box' contain strong oriented features that are distinct across different views. Perhaps the response of oriented visual filters can leverage these features to uniquely determine the spatial variable. This is concerning because this is a very specific setting that is unlikely to generalize.

(2) Previous experimental results suggest that various function cell types discovered in rodent navigation circuits persist in dark environments. If we take the modeling framework presented in this paper literally, the prediction would be that place cells/head direction cells should go away in darkness. This implies that key aspects of functional cell types in the spatial cognition are missing in the current modeling framework. This limitation needs to be addressed or explicitly discussed.

(3) Place cells/border cell/ head direction cells are mostly studied in the rodent's brain. For rodents, it is not clear whether standard DNNs would be good models of their visual systems. It is likely that rodent visual system would not be as powerful in processing visual inputs as the DNNs used in this study.

(4) The overall claim that those functional cell types defined in spatial cognition are superfluousness seems to be too strong based on the results reported here. The paper only studied a particular class of models, and arguably, the properties of these models have a major gap to those of real brains. Even though, in the DNN models simulated in this particular virtual environment, (i) most model neurons have strong spatial selectivity; (ii) removing model neurons with the strongest spatial selectivity still retain substantial spatial information, why this is relevant to the brain? The neural circuits may operate in a very different regime. Perhaps a more reasonable interpretation of the results would be: these results raise the possibility that those strongly selective neurons observed in the brain may not be essential for encoding certain features, as something like this is observed in certain models. It is difficult to draw definitive conclusions about the brain based on the results reported.

https://doi.org/10.7554/eLife.99047.1.sa3

Reviewer #2 (Public Review):

Summary:
The authors aim at challenging the relevance of cell populations with characteristic selectivity for specific aspects of navigation (e.g. place cells, head direction and border cells) in the processing of spatial information. Their claim is that such cells naturally emerge in any system dealing with the estimation of position in an environment, without the need for a special involvement of these cells in the computations. In particular the work shows how when provided with spatial error signals, networks designed for invariant object recognition spontaneously organize the activity in their hidden layers into a mixture of spatially selective cells, some of them passing classification criteria for place, head direction or border cells. Crucially, these cells are not necessary for position decoding, nor are they the most informative when it comes to the performance of the network in reconstructing spatial position from visual scenes. These results lead the authors to claim that focusing on the classification of specific cell types is hindering rather than helping advancement in the understanding of spatial cognition. In fact they claim that the attention should rather be pointed at understanding highly-dimensional population coding, regardless of its direct interpretability or its appeal to human observers.

Strengths:
Methodologically the paper is consistent and convincingly support the author claims regarding the role of cell types in coding for spatial aspects of cognition. It is also interesting how the authors leverage on established machine learning systems to provide a sort of counter-argument to the use of such techniques to establish a parallel between artificial and biological neural representations. In the recent past similar applications of artificial neural networks to spatial navigation have been directed at proving the importance of specific neural substrates (take for example Banino et al. 2018 for grid cells), while in this case the same procedure is used to unveil them as epiphenomena, so general and unspecific to be of very limited use in understanding the actual functioning of the neural system. I am quite confident that this stance regarding the role of place cells and co. could gather large sympathy and support in the greater part of the neuroscience community, or at least among the majority of theoretical neuroscientists with some interest in the hippocampus and higher cognition.

Weaknesses:
My criticism of the paper can be articulated in three main points:
- What about grid cells? Grid cells are notably not showing up in the analyses of the paper. But they surely can be considered as the 'mother' of all tailored spatial cells of the hippocampal formation. Are they falling outside the author's assessment of the importance of this kind of cells? Some discussion of the place grid cells occupy in the vision of the authors would greatly help.
- The network used in the paper is still guided by a spatial error signal, and the network is trained to minimize spatial decoding error. In a sense, although object classfication networks are not designed for spatial navigation, one could say that the authors are in some way hacking this architecture and turning it into a spatial navigation one through learning. I wonder if their case could be strengthened by devising a version of their experiment based on some form of self-supervised or unsupervised learning.
- The last point is more about my perception of the community studying hippocampal functions, rather than being directed at the merits of the paper itself. My question is whether the paper is fighting an already won battle. That is whether the focus on the minute classification of response profiles of cells in the hippocampus is in fact already considered an 'old' approach, very useful for some initial qualitative assessments but of limited power when asked to provide deeper insight into the functioning of hippocampal computations (or computations of any other brain circuit).

https://doi.org/10.7554/eLife.99047.1.sa2

Reviewer #3 (Public Review):

Summary:
In this paper, the authors demonstrate the inevitably of the emergence of some degree of spatial information in sufficiently complex systems, even those that are only trained on object recognition (i.e. not "spatial" systems). As such, they present an important null hypothesis that should be taken into consideration for experimental design and data analysis of spatial tuning and its relevance for behavior.

Strengths:
The paper's strengths include the use of a large multi-layer network trained in a detailed visual environment. This illustrates an important message for the field: that spatial tuning can be a result of sensory processing. While this is a historically recognized and often-studied fact in experimental neuroscience, it is made more concrete with the use of a complex sensory network. Indeed, the manuscript is a cautionary tale for experimentalists and computational researchers alike against blindly applying and interpreting metrics without adequate controls.

Weaknesses:
However, the work has a number of significant weaknesses. Most notably: the degree and quality of spatial tuning is not analyzed to the standards of evidence historically used in studies of spatial tuning in the brain, and the authors do not critically engage with past work that studies the sensory influences of these cells; there are significant issues in the authors' interpretation of their results and its impact on neuroscientific research; the ability to linearly decode position from a large number of units is not a strong test of spatial information, nor is it a measure of spatial cognition; and the authors make strong but unjustified claims as to the implications of their results in opposition to, as opposed to contributing to, work being done in the field.

The first weakness is that the degree and quality of spatial tuning that emerges in the network is not analyzed to the standards of evidence that have been used in studies of spatial tuning in the brain. Specifically, the authors identify place cells, head direction cells, and border cells in their network and their conjunctive combinations. However, these forms of tuning are the most easily confounded by visual responses, and it's unclear if their results will extend to forms of spatial tuning that are not. Further, in each case, previous experimental work to further elucidate the influence of sensory information on these cells has not been acknowledged or engaged with.

For example, consider the head direction cells in Figure 3C. In addition to increased activity in some directions, these cells also have a high degree of spatial nonuniformity, suggesting they are responding to specific visual features of the environment. In contrast, the majority of HD cells in the brain are only very weakly spatially selective, if at all, once an animal's spatial occupancy is accounted for (Taube et al 1990, JNeurosci). While the preferred orientation of these cells are anchored to prominent visual cues, when they rotate with changing visual cues the entire head direction system rotates together (cells' relative orientation relationships are maintained, including those that encode directions facing AWAY from the moved cue), and thus these responses cannot be simply independent sensory-tuned cells responding to the sensory change) (Taube et al 1990 JNeurosci, Zugaro et al 2003 JNeurosci, Ajbi et al 2023).

As another example, the joint selectivity of detected border cells with head direction in Figure 3D suggests that they are "view of a wall from a specific angle" cells. In contrast, experimental work on border cells in the brain has demonstrated that these are robust to changes in the sensory input from the wall (e.g. van Wijngaarden et al 2020), or that many of them are not directionally selective (Solstad et al 2008).

The most convincing evidence of "spurious" spatial tuning would be the emergence of HD-independent place cells in the network, however, these cells are a small minority (in contrast to hippocampal data, Thompson and Best 1984 JNeurosci, Rich et al 2014 Science), the examples provided in Figure 3 are significantly more weakly tuned than those observed in the brain, and the metrics used by the authors to quantify place cell tuning are not clearly defined in the methods, but do not seem to be as stringent as those commonly used in real data. (e.g. spatial information, Skaggs et al 1992 NeurIPS).

Indeed, the vast majority of tuned cells in the network are conjunctively selective for HD (Figure 3A). While this conjunctive tuning has been reported, many units in the hippocampus/entorhinal system are *not* strongly hd selective (Muller et al 1994 JNeurosci, Sangoli et al 2006 Science, Carpenter et al 2023 bioRxiv). Further, many studies have been done to test and understand the nature of sensory influence (e.g. Acharya et al 2016 Cell), and they tend to have a complex relationship with a variety of sensory cues, which cannot readily be explained by straightforward sensory processing (rev: Poucet et al 2000 Rev Neurosci, Plitt and Giocomo 2021 Nat Neuro). E.g. while some place cells are sometimes reported to be directionally selective, this directional selectivity is dependent on behavioral context (Markus et al 1995, JNeurosci), and emerges over time with familiarity to the environment (Navratiloua et al 2012 Front. Neural Circuits). Thus, the question is not whether spatially tuned cells are influenced by sensory information, but whether feed-forward sensory processing alone is sufficient to account for their observed turning properties and responses to sensory manipulations.

These issues indicate a more significant underlying issue of scientific methodology relating to the interpretation of their result and its impact on neuroscientific research. Specifically, in order to make strong claims about experimental data, it is not enough to show that a control (i.e. a null hypothesis) exists, one needs to demonstrate that experimental observations are quantitatively no better than that control.

Where the authors state that "In summary, complex networks that are not spatial systems, coupled with environmental input, appear sufficient to decode spatial information." what they have really shown is that it is possible to decode *some degree* of spatial information. This is a null hypothesis (that observations of spatial tuning do not reflect a "spatial system"), and the comparison must be made to experimental data to test if the so-called "spatial" networks in the brain have more cells with more reliable spatial info than a complex-visual control.

Further, the authors state that "Consistent with our view, we found no clear relationship between cell type distribution and spatial information in each layer. This raises the possibility that "spatial cells" do not play a pivotal role in spatial tasks as is broadly assumed." Indeed, this would raise such a possibility, if 1) the observations of their network were indeed quantitatively similar to the brain, and 2) the presence of these cells in the brain were the only evidence for their role in spatial tasks. However, 1) the authors have not shown this result in neural data, they've only noticed it in a network and mentioned the POSSIBILITY of a similar thing in the brain, and 2) the "assumption" of the role of spatially tuned cells in spatial tasks is not just from the observation of a few spatially tuned cells. But from many other experiments including causal manipulations (e.g. Robinson et al 2020 Cell, DeLauilleon et al 2015 Nat Neuro), which the authors conveniently ignore. Thus, I do not find their argument, as strongly stated as it is, to be well-supported.

An additional weakness is that linear decoding of position is not a strong test, nor is it a measure of spatial cognition. The ability to decode position from a large number of weakly tuned cells is not surprising. However, based on this ability to decode, the authors claim that "'spatial' cells do not play a privileged role in spatial cognition". To justify this claim, the authors would need to use the network to perform e.g. spatial navigation tasks, then investigate the network's ability to perform these tasks when tuned cells were lesioned.

Finally, I find a major weakness of the paper to be the framing of the results in opposition to, as opposed to contributing to, the study of spatially tuned cells. For example, the authors state that "If a perception system devoid of a spatial component demonstrates classically spatially-tuned unit representations, such as place, head-direction, and border cells, can "spatial cells" truly be regarded as 'spatial'?" Setting aside the issue of whether the perception system in question does indeed demonstrate spatially-tuned unit representations comparable to those in the brain, I ask "Why not?" This seems to be a semantic game of reading more into a name then is necessarily there. The names (place cells, grid cells, border cells, etc) describe an observation (that cells are observed to fire in certain areas of an animal's environment). They need not be a mechanistic claim (that space "causes" these cells to fire) or even, necessarily, a normative one (these cells are "for" spatial computation). This is evidenced by the fact that even within e.g. the place cell community, there is debate about these cells' mechanisms and function (eg memory, navigation, etc), or if they can even be said to serve only a single function. However, they are still referred to as place cells, not as a statement of their function but as a history-dependent label that refers to their observed correlates with experimental variables. Thus, the observation that spatially tuned cells are "inevitable derivatives of any complex system" is itself an interesting finding which *contributes to*, rather than contradicts, the study of these cells. It seems that the authors have a specific definition in mind when they say that a cell is "truly" "spatial" or that a biological or artificial neural network is a "spatial system", but this definition is not stated, and it is not clear that the terminology used in the field presupposes their definition.

In sum, the authors have demonstrated the existence of a control/null hypothesis for observations of spatially-tuned cells. However, 1) It is not enough to show that a control (null hypothesis) exists, one needs to test if experimental observations are no better than control, in order to make strong claims about experimental data, 2) the authors do not acknowledge the work that has been done in many cases specifically to control for this null hypothesis in experimental work or to test the sensory influences on these cells, and 3) the authors do not rigorously test the degree or source of spatial tuning of their units.

https://doi.org/10.7554/eLife.99047.1.sa1

Author response:

We thank the reviewers for their engagement and constructive comments. This provisional response aims to clarify key misconceptions, address major criticisms, and outline our revision plans.

A primary concern of the reviewers appears to be our model's limitations in addressing a broad range of empirical findings. This, however, misinterprets our core contribution. Our work centers on a cautionary tale that before advocating for newly discovered cell types and their purported special roles in spatial cognition—an approach prevalent in the field—such claims must be tested against alternative (null) hypotheses that may contradict intuitive expectations. We present such an alternative hypothesis regarding spatial cells and their assumed privileged roles. We show that key findings in the field - spatial “cell types”, arise in a set of null models without spatial grounding (including untrained variants) despite the models not being a model for spatial processing, and we also found that they had no privileged role for representing spatial information.

Our proposal is not a new model attempting to explain the brain, and therefore we do not aim to capture every empirical finding. Indeed, we would not expect an object recognition model (and its untrained variant) with no explicit spatial grounding to account for all phenomena in spatial cognition. This underscores our key point: if there exists a basic, spatially agnostic model that can explain certain degrees of empirical findings using criteria from the literature (i.e. place, head-direction and border cells), what implications does this have for the more complex theories and models proposed as underlying mechanisms of special cell types?

Regarding concerns about the limited scope and generalizability of our setting, we will clarify that we considered multiple DNN architectures, both trained and untrained, on multiple decoding tasks (position, head direction, and nearest-wall distance). We plan to extend our experiments further as detailed in the revision plan below.

Further, there was a methodological concern about using a linear decoder on a fixed DNN for spatial decoding tasks being a form of "hacking". However, linear readout is standard practice in neuroscience to characterize information available in a neural population. Moreover, our tests on untrained networks also showed spatial decoding capabilities, suggesting it's not solely due to the linear readout.

For our full revision plan:

(1) We will revise the manuscript to better reflect these above points, clarifying our paper's stance and improving the writing to reduce misconceptions.

(2) We will address individual public reviews in more detail.

(3) We intend to address key reviewer recommendations, focusing on better situating our work within the broader context of the existing literature whilst emphasizing the null hypothesis perspective.

(4) In general, we will consider additional aspects of the literature and conduct new experiments to strengthen the relevance of our work to existing work. We highlight a number of potential experiments which we believe can address reviewer concerns:

a. Blurring the visual inputs to DNNs to match rodent perception.

b. Vary environmental settings to verify whether our findings are more

generalizable (which we predict to be the case).

c. Vary the environment to assess remapping effects, which will strengthen the

connection of our work to the literature.