Vocalization categorization behavior explained by a feature-based auditory categorization model

Abstract

Vocal animals produce multiple categories of calls with high between- and within-subject variability, over which listeners must generalize to accomplish call categorization. The behavioral strategies and neural mechanisms that support this ability to generalize are largely unexplored. We previously proposed a theoretical model that accomplished call categorization by detecting features of intermediate complexity that best contrasted each call category from all other categories. We further demonstrated that some neural responses in the primary auditory cortex were consistent with such a model. Here, we asked whether a feature-based model could predict call categorization behavior. We trained both the model and guinea pigs on call categorization tasks using natural calls. We then tested categorization by the model and guinea pigs using temporally and spectrally altered calls. Both the model and guinea pigs were surprisingly resilient to temporal manipulations, but sensitive to moderate frequency shifts. Critically, the model predicted about 50% of the variance in guinea pig behavior. By adopting different model training strategies and examining features that contributed to solving specific tasks, we could gain insight into possible strategies used by animals to categorize calls. Our results validate a model that uses the detection of intermediate-complexity contrastive features to accomplish call categorization.

Data availability

All data generated or analyzed during this study are included in the manuscript and supporting file; Source Data files have been provided for Figures 3 - 12.

Article and author information

Author details

  1. Manaswini Kar

    Center for Neuroscience, University of Pittsburgh, Pittsburgh, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Marianny Pernia

    Department of Neurobiology, University of Pittsburgh, Pittsburgh, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9889-3577
  3. Kayla Williams

    Department of Neurobiology, University of Pittsburgh, Pittsburgh, United States
    Competing interests
    The authors declare that no competing interests exist.
  4. Satyabrata Parida

    Department of Neurobiology, University of Pittsburgh, Pittsburgh, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2896-2522
  5. Nathan Alan Schneider

    Center for Neuroscience, University of Pittsburgh, Pittsburgh, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9145-5427
  6. Madelyn McAndrew

    Center for the Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, United States
    Competing interests
    The authors declare that no competing interests exist.
  7. Isha Kumbam

    Department of Neurobiology, University of Pittsburgh, Pittsburgh, United States
    Competing interests
    The authors declare that no competing interests exist.
  8. Srivatsun Sadagopan

    Center for Neuroscience, University of Pittsburgh, Pittsburgh, United States
    For correspondence
    vatsun@pitt.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1116-8728

Funding

National Institutes of Health (R01DC017141)

  • Srivatsun Sadagopan

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Animal experimentation: All experimental procedures conformed to the NIH Guide for the Care and Use of Laboratory Animals and were approved by the institutional animal care and use committee of the University of Pittsburgh (protocol number 21069431).

Copyright

© 2022, Kar et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 789
    views
  • 151
    downloads
  • 8
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Manaswini Kar
  2. Marianny Pernia
  3. Kayla Williams
  4. Satyabrata Parida
  5. Nathan Alan Schneider
  6. Madelyn McAndrew
  7. Isha Kumbam
  8. Srivatsun Sadagopan
(2022)
Vocalization categorization behavior explained by a feature-based auditory categorization model
eLife 11:e78278.
https://doi.org/10.7554/eLife.78278

Share this article

https://doi.org/10.7554/eLife.78278

Further reading

    1. Biochemistry and Chemical Biology
    2. Neuroscience
    Silvia Galli, Marco Di Antonio
    Insight

    The buildup of knot-like RNA structures in brain cells may be the key to understanding how uncontrolled protein aggregation drives Alzheimer’s disease.

    1. Neuroscience
    Paul I Jaffe, Gustavo X Santiago-Reyes ... Russell A Poldrack
    Research Article

    Evidence accumulation models (EAMs) are the dominant framework for modeling response time (RT) data from speeded decision-making tasks. While providing a good quantitative description of RT data in terms of abstract perceptual representations, EAMs do not explain how the visual system extracts these representations in the first place. To address this limitation, we introduce the visual accumulator model (VAM), in which convolutional neural network models of visual processing and traditional EAMs are jointly fitted to trial-level RTs and raw (pixel-space) visual stimuli from individual subjects in a unified Bayesian framework. Models fitted to large-scale cognitive training data from a stylized flanker task captured individual differences in congruency effects, RTs, and accuracy. We find evidence that the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations, demonstrating how our framework can be used to relate visual representations to behavioral outputs. Together, our work provides a probabilistic framework for both constraining neural network models of vision with behavioral data and studying how the visual system extracts representations that guide decisions.