Unifying the known and unknown microbial coding sequence space
Abstract
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40%-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1,749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.
Data availability
We used public data as described in the Methods section and Appendix 1-table 5.The code used for the analyses in the manuscript is available at https://github.com/functional-dark-side/functional-dark-side.github.io/tree/master/scripts. A list with the program versions can be found in https://github.com/functional-dark-side/functional-dark-side.github.io/blob/master/programs_and_versions.txt.The code to create the figures is available at https://github.com/functional-dark-side/vanni_et_al-figures, and the data for the figure can be downloaded from https://doi.org/10.6084/m9.figshare.12738476.v2. A reproducible version of the workflow is available at https://github.com/functional-dark-side/agnostos-wf.The data is publicly available at https://doi.org/10.6084/m9.figshare.12459056.
Article and author information
Author details
Funding
Max Planck Society
- Chiara Vanni
European Union's Horizon 2020 (INMARE)
- Antonio Fernàndez-Guerra
Biotechnology and Biological Sciences Research Council
- Alex Mitchell
European Molecular Biology Laboratory
- Robert D Finn
Spanish Agency of Science MICIU/AEI (INTERACTOMA RTI2018-101205-B-I00)
- Emilio O Casamayor
Spanish Ministry of Economy and Competitiveness (MAGGY (CTM2017-87736-R))
- Silvia G Acinas
- Pablo Sánchez
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2022, Vanni et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 6,649
- views
-
- 977
- downloads
-
- 67
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
Mass spectrometry imaging (MSI) is a powerful technology used to define the spatial distribution and relative abundance of metabolites across tissue cryosections. While software packages exist for pixel-by-pixel individual metabolite and limited target pairs of ratio imaging, the research community lacks an easy computing and application tool that images any metabolite abundance ratio pairs. Importantly, recognition of correlated metabolite pairs may contribute to the discovery of unanticipated molecules in shared metabolic pathways. Here, we describe the development and implementation of an untargeted R package workflow for pixel-by-pixel ratio imaging of all metabolites detected in an MSI experiment. Considering untargeted MSI studies of murine brain and embryogenesis, we demonstrate that ratio imaging minimizes systematic data variation introduced by sample handling, markedly enhances spatial image contrast, and reveals previously unrecognized metabotype-distinct tissue regions. Furthermore, ratio imaging facilitates identification of novel regional biomarkers and provides anatomical information regarding spatial distribution of metabolite-linked biochemical pathways. The algorithm described herein is applicable to any MSI dataset containing spatial information for metabolites, peptides or proteins, offering a potent hypothesis generation tool to enhance knowledge obtained from current spatial metabolite profiling technologies.
-
- Computational and Systems Biology
- Microbiology and Infectious Disease
Antimicrobial peptides (AMPs) are attractive candidates to combat antibiotic resistance for their capability to target biomembranes and restrict a wide range of pathogens. It is a daunting challenge to discover novel AMPs due to their sparse distributions in a vast peptide universe, especially for peptides that demonstrate potencies for both bacterial membranes and viral envelopes. Here, we establish a de novo AMP design framework by bridging a deep generative module and a graph-encoding activity regressor. The generative module learns hidden ‘grammars’ of AMP features and produces candidates sequentially pass antimicrobial predictor and antiviral classifiers. We discovered 16 bifunctional AMPs and experimentally validated their abilities to inhibit a spectrum of pathogens in vitro and in animal models. Notably, P076 is a highly potent bactericide with the minimal inhibitory concentration of 0.21 μM against multidrug-resistant Acinetobacter baumannii, while P002 broadly inhibits five enveloped viruses. Our study provides feasible means to uncover the sequences that simultaneously encode antimicrobial and antiviral activities, thus bolstering the function spectra of AMPs to combat a wide range of drug-resistant infections.