Hierarchical temporal prediction captures motion processing along the visual pathway
Abstract
Visual neurons respond selectively to features that become increasingly complex from the eyes to the cortex. Retinal neurons prefer flashing spots of light, primary visual cortical (V1) neurons prefer moving bars, and those in higher cortical areas favor complex features like moving textures. Previously, we showed that V1 simple cell tuning can be accounted for by a basic model implementing temporal prediction - representing features that predict future sensory input from past input (Singer et al., 2018). Here we show that hierarchical application of temporal prediction can capture how tuning properties change across at least two levels of the visual system. This suggests that the brain does not efficiently represent all incoming information; instead, it selectively represents sensory inputs that help in predicting the future. When applied hierarchically, temporal prediction extracts time-varying features that depend on increasingly high-level statistics of the sensory input.
Data availability
All custom code used in this study was implemented in Python. The code for the models and analyses shown in Figures 1-8 and associated sections can be found at https://bitbucket.org/ox-ang/hierarchical_temporal_prediction/src/master/. The V1 neural response data (Ringach et al., 2002) used for comparison with the temporal prediction model in Figure 6 came from http://ringachlab.net/ ("Data & Code", "Orientation tuning in Macaque V1"). The V1 image response data used to test the models included in Figure 9 were downloaded with permission from https://github.com/sacadena/Cadena2019PlosCB (Cadena et al., 2019). The V1 movie response data used to test these models were collected in the Laboratory of Dario Ringach at UCLA and downloaded from https://crcns.org/data-sets/vc/pvc-1 (Nahaus and Ringach, 2007; Ringach and Nahaus, 2009). The code for the models and analyses shown in Figure 9 and the associated section can be found at https://github.com/webstorms/StackTP and https://github.com/webstorms/NeuralPred. The movies used for training the models in Figure 9 are available at https://figshare.com/articles/dataset/Natural_movies/24265498.
Article and author information
Author details
Funding
Wellcome Trust (WT108369/Z/2015/Z)
- Ben DB Willmore
- Andrew J King
- Nicol S Harper
University of Oxford Clarendon Fund
- Yosef Singer
- Luke CL Taylor
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2023, Singer et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,095
- views
-
- 164
- downloads
-
- 9
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Biochemistry and Chemical Biology
- Neuroscience
The buildup of knot-like RNA structures in brain cells may be the key to understanding how uncontrolled protein aggregation drives Alzheimer’s disease.
-
- Neuroscience
Evidence accumulation models (EAMs) are the dominant framework for modeling response time (RT) data from speeded decision-making tasks. While providing a good quantitative description of RT data in terms of abstract perceptual representations, EAMs do not explain how the visual system extracts these representations in the first place. To address this limitation, we introduce the visual accumulator model (VAM), in which convolutional neural network models of visual processing and traditional EAMs are jointly fitted to trial-level RTs and raw (pixel-space) visual stimuli from individual subjects in a unified Bayesian framework. Models fitted to large-scale cognitive training data from a stylized flanker task captured individual differences in congruency effects, RTs, and accuracy. We find evidence that the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations, demonstrating how our framework can be used to relate visual representations to behavioral outputs. Together, our work provides a probabilistic framework for both constraining neural network models of vision with behavioral data and studying how the visual system extracts representations that guide decisions.