Protein evidence of unannotated ORFs in Drosophila reveals diversity in the evolution and properties of young proteins
Abstract
De novo gene origination, where a previously non-genic genomic sequence becomes genic through evolution, has been increasingly recognized as an important source of evolutionary novelty across diverse taxa. Many de novo genes have been proposed to be protein-coding, and in several cases have been experimentally shown to yield protein products. However, the systematic study of de novo proteins has been hampered by doubts regarding the translation of their transcripts without the experimental observation of protein products. Using a systematic, ORF-focused mass-spectrometry-first computational approach, we identify almost 1000 unannotated open reading frames with evidence of translation (utORFs) in the model organism Drosophila melanogaster, 371 of which have canonical start codons. To quantify the comparative genomic similarity of these utORFs across Drosophila and to infer phylostratigraphic age, we further develop a synteny-based protein similarity approach. Combining these results with reference datasets on tissue- and life-stage-specific transcription and conservation, we identify different properties amongst these utORFs. Contrary to expectations, the fastest-evolving utORFs are not the youngest evolutionarily. We observed more utORFs in the brain than in the testis. Most of the identified utORFs may be of de novo origin, even accounting for the possibility of false-negative similarity detection. Finally, sequence divergence after an inferred de novo origin event remains substantial, raising the possibility that de novo proteins turn over frequently. Our results suggest that there is substantial unappreciated diversity in de novo protein evolution: many more may exist than have been previously appreciated; there may be divergent evolutionary trajectories; and de novo proteins may be gained and lost frequently. All in all, there may not exist a single characteristic model of de novo protein evolution, but instead, there may be diverse evolutionary trajectories for de novo proteins.
Data availability
Raw MS data are deposited in PRIDE under accession number PXD032197. Relevant scripts and intermediate files can be found in our Github repository https://github.com/LiZhaoLab/utORF_mass_spec.
Article and author information
Author details
Funding
National Institute of General Medical Sciences (R35GM133780)
- Li Zhao
National Institute of General Medical Sciences (T32GM007739)
- Eric B Zheng
Robertson Foundation
- Li Zhao
Rita Allen Foundation (Rita Allen Foundation Scholar)
- Li Zhao
Vallee Foundation (Vallee Scholar)
- Li Zhao
Monique Weill-Caulier Trust
- Li Zhao
Alfred P. Sloan Foundation (Alfred P. Sloan Research Fellowship)
- Li Zhao
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2022, Zheng & Zhao
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,746
- views
-
- 361
- downloads
-
- 13
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Evolutionary Biology
- Genetics and Genomics
Copy number variants (CNVs) are an important source of genetic variation underlying rapid adaptation and genome evolution. Whereas point mutation rates vary with genomic location and local DNA features, the role of genome architecture in the formation and evolutionary dynamics of CNVs is poorly understood. Previously, we found the GAP1 gene in Saccharomyces cerevisiae undergoes frequent amplification and selection in glutamine-limitation. The gene is flanked by two long terminal repeats (LTRs) and proximate to an origin of DNA replication (autonomously replicating sequence, ARS), which likely promote rapid GAP1 CNV formation. To test the role of these genomic elements on CNV-mediated adaptive evolution, we evolved engineered strains lacking either the adjacent LTRs, ARS, or all elements in glutamine-limited chemostats. Using a CNV reporter system and neural network simulation-based inference (nnSBI) we quantified the formation rate and fitness effect of CNVs for each strain. Removal of local DNA elements significantly impacts the fitness effect of GAP1 CNVs and the rate of adaptation. In 177 CNV lineages, across all four strains, between 26% and 80% of all GAP1 CNVs are mediated by Origin Dependent Inverted Repeat Amplification (ODIRA) which results from template switching between the leading and lagging strand during DNA synthesis. In the absence of the local ARS, distal ones mediate CNV formation via ODIRA. In the absence of local LTRs, homologous recombination can mediate gene amplification following de novo retrotransposon events. Our study reveals that template switching during DNA replication is a prevalent source of adaptive CNVs.
-
- Developmental Biology
- Evolutionary Biology
Seahorses, pipefishes, and seadragons are fishes from the family Syngnathidae that have evolved extraordinary traits including male pregnancy, elongated snouts, loss of teeth, and dermal bony armor. The developmental genetic and cellular changes that led to the evolution of these traits are largely unknown. Recent syngnathid genome assemblies revealed suggestive gene content differences and provided the opportunity for detailed genetic analyses. We created a single-cell RNA sequencing atlas of Gulf pipefish embryos to understand the developmental basis of four traits: derived head shape, toothlessness, dermal armor, and male pregnancy. We completed marker gene analyses, built genetic networks, and examined the spatial expression of select genes. We identified osteochondrogenic mesenchymal cells in the elongating face that express regulatory genes bmp4, sfrp1a, and prdm16. We found no evidence for tooth primordia cells, and we observed re-deployment of osteoblast genetic networks in developing dermal armor. Finally, we found that epidermal cells expressed nutrient processing and environmental sensing genes, potentially relevant for the brooding environment. The examined pipefish evolutionary innovations are composed of recognizable cell types, suggesting that derived features originate from changes within existing gene networks. Future work addressing syngnathid gene networks across multiple stages and species is essential for understanding how the novelties of these fish evolved.