Core genes can have higher recombination rates than accessory genes within global microbial populations

  1. Asher Preska Steinberg
  2. Mingzhi Lin
  3. Edo Kussell  Is a corresponding author
  1. New York University, United States

Abstract

Recombination is essential to microbial evolution, and is involved in the spread of antibiotic resistance, antigenic variation, and adaptation to the host niche. However, assessing the impact of homologous recombination on accessory genes which are only present in a subset of strains of a given species remains challenging due to their complex phylogenetic relationships. Quantifying homologous recombination for accessory genes (which are important for niche-specific adaptations) in comparison to core genes (which are present in all strains and have essential functions) is critical to understanding how selection acts on variation to shape species diversity and genome structures of bacteria. Here, we apply a computationally efficient, non-phylogenetic approach to measure homologous recombination rates in the core and accessory genome using >100,000 whole genome sequences from Streptococcus pneumoniae and several additional species. By analyzing diverse sets of sequence clusters, we show that core genes often have higher recombination rates than accessory genes, and for some bacterial species the associated effect sizes for these differences are pronounced. In a subset of species, we find that gene frequency and homologous recombination rate are positively correlated. For S. pneumoniae and several additional species, we find that while the recombination rate is higher for the core genome, the mutational divergence is lower, indicating that divergence-based homologous recombination barriers could contribute to differences in recombination rates between the core and accessory genome. Homologous recombination may therefore play a key role in increasing the efficiency of selection in the most conserved parts of the genome.

Data availability

Lists of SRA accession numbers corresponding to the raw reads used to build the multi-sequence alignments analyzed in this manuscript are included as Figure 2 - source data 1 and Figure 3 - source data 1. All SRA files, reference genomes, and complete genome assemblies are available through NCBI. All sequence collections used are listed in Supplementary File 5. For the PubMLST sequence collections, PubMLST was used to identify whole genome sequences (by filtering for strains in the 'Genome Collection' of each species where the sequence length is at least that of the reference genome), then the raw reads were downloaded from NCBI using their SRA numbers. Accession numbers for reference genomes used for each microbial species are also listed in Supplementary File 5.All original code has been deposited at GitHub and is publicly available. Links are given below:- https://github.com/kussell-lab/mcorr- https://github.com/kussell-lab/mcorr-clustering- https://github.com/kussell-lab/ReferenceAlignmentGenerator- https://github.com/kussell-lab/PangenomeAlignmentGenerator

The following previously published data sets were used

Article and author information

Author details

  1. Asher Preska Steinberg

    Department of Biology, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  2. Mingzhi Lin

    Department of Biology, New York University, New York, United States
    Competing interests
    The authors declare that no competing interests exist.
  3. Edo Kussell

    Department of Biology, New York University, New York, United States
    For correspondence
    edo.kussell@nyu.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0590-4036

Funding

National Institutes of Health (R01-GM097356)

  • Edo Kussell

Simons Foundation (Simons Foundation Awardee of the Life Sciences Research Foundation)

  • Asher Preska Steinberg

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2022, Preska Steinberg et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,940
    views
  • 547
    downloads
  • 23
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Asher Preska Steinberg
  2. Mingzhi Lin
  3. Edo Kussell
(2022)
Core genes can have higher recombination rates than accessory genes within global microbial populations
eLife 11:e78533.
https://doi.org/10.7554/eLife.78533

Share this article

https://doi.org/10.7554/eLife.78533

Further reading

    1. Genetics and Genomics
    2. Microbiology and Infectious Disease
    Iti Mehta, Jacob B Hogins ... Larry Reitzer
    Research Article

    Polyamines are biologically ubiquitous cations that bind to nucleic acids, ribosomes, and phospholipids and, thereby, modulate numerous processes, including surface motility in Escherichia coli. We characterized the metabolic pathways that contribute to polyamine-dependent control of surface motility in the commonly used strain W3110 and the transcriptome of a mutant lacking a putrescine synthetic pathway that was required for surface motility. Genetic analysis showed that surface motility required type 1 pili, the simultaneous presence of two independent putrescine anabolic pathways, and modulation by putrescine transport and catabolism. An immunological assay for FimA—the major pili subunit, reverse transcription quantitative PCR of fimA, and transmission electron microscopy confirmed that pili synthesis required putrescine. Comparative RNAseq analysis of a wild type and ΔspeB mutant which exhibits impaired pili synthesis showed that the latter had fewer transcripts for pili structural genes and for fimB which codes for the phase variation recombinase that orients the fim operon promoter in the ON phase, although loss of speB did not affect the promoter orientation. Results from the RNAseq analysis also suggested (a) changes in transcripts for several transcription factor genes that affect fim operon expression, (b) compensatory mechanisms for low putrescine which implies a putrescine homeostatic network, and (c) decreased transcripts of genes for oxidative energy metabolism and iron transport which a previous genetic analysis suggests may be sufficient to account for the pili defect in putrescine synthesis mutants. We conclude that pili synthesis requires putrescine and putrescine concentration is controlled by a complex homeostatic network that includes the genes of oxidative energy metabolism.

    1. Microbiology and Infectious Disease
    Yue Sun, Jingwei Li ... Xin Deng
    Research Article

    The model Gram-negative plant pathogen Pseudomonas syringae utilises hundreds of transcription factors (TFs) to regulate its functional processes, including virulence and metabolic pathways that control its ability to infect host plants. Although the molecular mechanisms of regulators have been studied for decades, a comprehensive understanding of genome-wide TFs in Psph 1448A remains limited. Here, we investigated the binding characteristics of 170 of 301 annotated TFs through chromatin immunoprecipitation sequencing (ChIP-seq). Fifty-four TFs, 62 TFs, and 147 TFs were identified in top-level, middle-level, and bottom-level, reflecting multiple higher-order network structures and direction of information flow. More than 40,000 TF pairs were classified into 13 three-node submodules which revealed the regulatory diversity of TFs in Psph 1448A regulatory network. We found that bottom-level TFs performed high co-associated scores to their target genes. Functional categories of TFs at three levels encompassed various regulatory pathways. Three and 25 master TFs were identified to involve in virulence and metabolic regulation, respectively. Evolutionary analysis and topological modularity network revealed functional variability and various conservation of TFs in P. syringae (Psph 1448A, Pst DC3000, Pss B728a, and Psa C48). Overall, our findings demonstrated a global transcriptional regulatory network of genome-wide TFs in Psph 1448A. This knowledge can advance the development of effective treatment and prevention strategies for related infectious diseases.