Introduction

Organisms have evolved various strategies for the spatiotemporal regulation of gene expression 13. This is important because aberrant gene expression can result in phenotypic defects or diseases, while the variation and evolution of gene expression patterns frequently promote phenotypic diversification and adaptation 4. Although variations in mRNA abundance are widely observed within or between species, protein abundance tends to show stronger evolutionary constraint 5,6, as observed in yeasts 79, primates 10,11, and other organisms 1214. Nevertheless, the molecular mechanisms by which the conservation of protein abundance across species is achieved are largely unknown 5,6.

Eukaryotic mRNA translation is a crucial step in gene expression and is highly regulated by multilayered mechanisms 1517. Upstream open reading frames (uORFs), short open reading frames in the 5-terminal untranslated regions (5’UTRs) of eukaryotic mRNAs, play crucial roles in regulating mRNA translation. Approximately 50% of eukaryotic genes contain uORFs 18, and their evolution has been tightly shaped by natural selection 1923. The functions of uORFs have been explored in various contexts, including development 2330, disease 3135 and stress responses 3643. The prevailing consensus is that uORFs typically repress downstream coding sequence (CDS) translation by sequestering ribosomes, a process influenced by factors such as uORF length, position, and sequence context 34,4348. However, under stress conditions, certain uORFs can facilitate CDS translation by promoting ribosome reinitiation, illustrating their context-dependent functions 42,43,4852.

Gene expression noise, which arises from the inherent stochasticity of biological processes such as transcription and translation, is generally detrimental to organismal fitness 53 and is primarily determined at the translational level 54. Recent studies suggest uORFs might play essential roles in buffering translational noise and stabilizing protein expression. For example, Wu et al. (2022) demonstrated that uORFs reduce protein production rates to stabilize TOC1 protein levels, ensuring precise circadian clock function in plants 55. Similarly, Bottorff et al. (2022) used a human cell reporter system to show that a ribosome stall in cytomegaloviral UL4 uORFs buffers against CDS translation reductions 56. Under stress, translation initiation is typically downregulated, yet most human mRNAs resistant to this inhibition contain translated uORFs, with a single uORF often being sufficient for resistance 42. The computational model of Initiation Complexes Interference with Elongating Ribosomes (ICIER) suggests that derepression of downstream translation is a general mechanism of uORF-mediated stress resistance 43. Despite these findings, the current understanding of uORFs in stabilizing translation is limited to single-gene cases or stressed conditions. It remains unclear whether and how uORFs affect gene translation variability on a genome-wide scale during evolution and development, and whether the identified mechanisms are universal or vary significantly among different taxa. To address these questions, a combination of modeling, genome-wide analyses, and comparative studies across species is required.

In this study, we first adapted the ICIER framework 43 to simulate the translating ribosome on a mRNA to quantitatively measure the extent to which uORF translation reduces the translational variability of the downstream CDS under different translation contexts. We then compared the translatomes of two closely related Drosophila species, D. melanogaster and D. simulans, and further supported the notion that uORFs could buffer the fluctuations of CDS translation during the development and evolution of Drosophila. The patterns also reappeared among primates and human populations. We next knocked out the bcd uORF, a case showing significant buffering effect in our data, and observed wide changes in embryonic transcriptome and phenotypic defects in D. melanogaster. Together, our results demonstrate a novel role for uORFs by maintaining translation stabilization during Drosophila evolution and development.

Results

An extended ICIER model for quantifying uORF buffering in CDS translation

To quantitatively assess how uORF translation modulates the variability of downstream CDS translation, we adapted the ICIER model 43, originally grounded in the totally asymmetric simple exclusion process (TASEP). TASEP has been extensively utilized to model the stochastic nature of ribosome movement along mRNA, capturing the effects of ribosome traffic jams, where ribosomes may slow down or stall when a site ahead is occupied 5760. The ICIER model extends this by simulating the interplay between scanning (40S) and elongating (80S) ribosomes, particularly focusing on how uORFs impact the overall translation efficiency of the main CDS 43.

We extended the original ICIER model 43 with several major modifications (Fig. 1A). First, while the original ICIER model only considered the scenario where the elongating ribosome (80S) causes downstream scanning ribosomes (40S) to dissociate from the mRNA when they move along the mRNA and collide, recent findings have shown that upstream dissociation can also play a critical role in uORF-mediated regulation 56. To incorporate this, we accounted for more complex ribosome interactions, including three possible scenarios where the 80S collides with 40S: i) 80S only causes the downstream 40S to dissociate from the mRNA with a probability of Kdown (“downstream dissociation”, Kdown ranging from 0 to 1), following the original ICIER model; ii) the 80S only causes the upstream 40S to dissociate from the mRNA with a probability of Kup (“upstream dissociation”, Kup > 0 and Kdown = 0); and iii) a combination of the downstream and upstream dissociation models (“double dissociation”, Kup > 0 and Kdown > 0).

Modeling simulation of uORF-mediated translation buffering.

(A) Model schema of the modified ICIER model (on the top); the parameters are listed in the box below the schema. (B) Heatmap showing the CVs of CDS TE (NEC) under different ICDS (x-axis) and IuORF (y-axis) combinations with a uniform distribution of Rin input and the downstream dissociation model. The left panels elicited by the dotted lines from specific squares of right heatmap were two examples showing the distribution of NEC under ICDS = 0.8 & IuORF = 0 (top panel, without uORF) and ICDS = 0.8 & IuORF = 0.4 (bottom panel, with uORF). (C) Heatmap showing CVs of CDS TE (NEC) under different IuORF (x-axis) and IuORF (y-axis) combinations with a uniform distribution of Rin input and the downstream dissociation model. The left panels elicited by the dotted lines from specific squares of right heatmap were two examples showing the distribution of NEC under LuORF = 2 & IuORF = 0.2 (top panel) and LuORF = 30 & IuORF = 0.2 (bottom panel). (D) Heatmap showing median δ [log2NECNEU=)] under different ICDS (x-axis) and IuORF (y-axis) combinations with a uniform distribution of Rin input and the downstream dissociation model. The left panel elicited by the dotted line from a specific square of right heatmap was an example showing the distribution of δ under ICDS = 0.8 & IuORF = 0.2. The vertical dashed line indicated the median value of δ.

Second, the original ICIER model only considered the ribosome collision and dissociation in the uORF and counted the 40S scanning ribosome escaping from the uORF as a proxy of the CDS translation rate. In our extended model, we also considered the ribosome collision and dissociation events in the CDS that is downstream of the uORF. We recorded the number of 80S ribosomes that completed translation at the stop codon of a CDS (NEC) or uORF (NEu) during a given time interval, using these counts as proxies to quantify the translation efficiency (TE) of CDS or uORF. These indices allowed us to directly and quantitively measure the impact of uORF-mediated translational buffering.

Third, while the original ICIER model only considered the effect of a single uORF, we also considered the buffering effects of two uORFs, allowing us to test the possible combinatorial effects of uORF-mediated translation regulation in this more complex yet more common scenario, as previous studies have shown uORFs tend to be clustered in genes 18.

These extensions allow for a more comprehensive exploration of uORF-mediated translational buffering, offering deeper insights into how these regulatory elements might stabilize protein synthesis across varying translation contexts.

uORF-mediated buffering of CDS translation across different parameter settings

To systematically investigate the extent to which uORF translation modulates the variability of downstream CDS translation, we conducted simulations across various parameter settings using three dissociation models: upstream, downstream, and double dissociation, each reflecting different possible interactions between scanning and elongating ribosomes on mRNA. We considered a range of parameters crucial to the translation process (Table S1), including the length of the 5’ leader before the uORF (fixed at 150 nucleotides), the length of the uORF itself (ranging from 2 to 100 codons), the distance between the uORF stop codon and the CDS start codon (150 nucleotides), the length of the CDS (500 codons), and the length of the 3’ UTR (150 nucleotides). Additionally, we modeled the probabilities associated with ribosome movement and initiation, such as the probability of a 40S ribosome moving to the next nucleotide (vs = 0.3), the probability of an 80S ribosome moving to the next position within the uORF (vEu = 0.3) and within the CDS (vEC = 0.5), and the probability of loading a new 40S ribosome at the 5’ end of the mRNA (Rin). We also explored different probabilities of translation initiation at both the uORF start codon (IuORF) and the CDS start codon (ICDS). Key parameters were adapted from the original ICIER model 43, ensuring a robust basis for comparison while allowing exploration of additional variables that influence uORF-mediated translational buffering.

In our simulations, Rin values were varied to simulate fluctuations in translational resources, such as ribosome availability, that could arise from genetic differences or environmental changes during evolution or development. By generating 1,000 Rin values following either uniform or exponential distributions (ranging from 0 to 0.1, Fig. S1A), we aimed to capture the natural variability in ribosome loading rates that might occur across different species, individuals, or developmental stages. These values were then fed into the simulation models to evaluate their impact on both the level and variability of translation efficiency for CDSs, with the number of ribosomes completing translation on CDSs (NEC) in a given time interval used as proxies for translational efficiency (Fig. S1B).

Across all model settings, uORF translation (IuORF > 0) consistently reduced CDS translation efficiency (NEC) by about 30% to 80% as IuORF increased from 0.1 to 0.5, compared to scenarios where the uORF was absent or untranslated (IuORF = 0) (Fig. S2 and S3). This confirms the inhibitory effect of uORF on downstream CDS translation. Notably, the coefficient of variation (CV), a measurement of variability for translation efficiency (NEC), was lower when the uORF was translated (Fig. 1B). These CV values further decreased by approximately 10% to 25% as IuORF increased from 0.1 to 0.5 (Fig. 1B), and this buffering effect persisted across different parameter settings (Fig. S4 and S5). Moreover, the CV of translation efficiency (NEC) further decreased by about 6% to 30% as uORF length (IuORF) increased from 2 to 100 codons (Fig. 1C and S6-7). These simulation results indicate that uORF translation can reduce variability in downstream CDS translation, with the buffering capacity positively correlated with both uORF translation initiation efficiency and length under the applied simulation conditions.

To more quantitatively investigate the relationship between changes in uORF (NEU) and CDS (NEC) translation, for each parameter setting, we used the median value (Rin2) of the 1000 Rin inputs as the baseline, calculating the corresponding NEC2 and NEU2values. We then calculated the difference in changes in NEC (Δ NEC) and NEU (Δ NEU) relative to NEC2 and NEU2 for each Rin value. Across different parameter settings, ΔNEU was consistently and significantly positively correlated with ΔNEC (P < 0.001, Spearman’s correlation) (Fig. S8-9), indicating that fluctuations of the ribosome loading rate influence the translation of both uORFs and CDSs in the same direction. Nevertheless, our simulations showed that variations in Rin led to a larger change in NEU than in NEC, as the median value of δ [defined by log2(ΔNEC/ΔNEU)] was consistently less than 0 across various IuORF and ICDS combinations (Fig. 1D and Fig. S10-11). This finding suggests that uORFs buffer against upstream fluctuations, exhibiting greater translational fluctuations than downstream CDSs.

Simulations of mRNAs with two uORFs revealed patterns consistent with a buffering role for uORFs (Fig. S12). To compare the buffering effects of a single uORF versus two uORFs, we calculated the ratio of the CV of NEC with two uORFs to that with a single uORF. A ratio less than 1 suggests that two uORFs provide greater buffering than a single uORF. For comparability, we examined the CV of NEC where the IuORF in the single-uORF model equals IuORF= (IuORF of the first uORF) in the two-uORF model, both ranging from 0 to 0.5. The CV ratio consistently remained below 1 across a range of IuORF> (IuORF of the second uORF) (Fig. S13), indicating that two uORFs offer stronger buffering than a single uORF.

Collectively, these simulations collectively suggest: (1) uORF-mediated translational control buffers against CDS translation variability, (2) uORFs exhibit greater translational fluctuations than downstream CDSs, and (3) the buffering capacity of uORFs positively correlates with their translation initiation efficiency and length. Subsequently, we sought to validate these simulation results in a biological context by confirming the uORF-mediated buffering effect during organismal evolution and development, as both processes face environmental and/or genetic changes that frequently disturb mRNA translation.

Generating matched translatome data from two Drosophila species for comparative analysis

To validate the uORF-mediated translational buffering during Drosophila evolution, we performed a comparative analysis of the translatomes of two closely related Drosophila species, D. melanogaster and D. simulans, which diverged approximately 5.4 million years ago 61. We generated high-throughput sequencing data for D. simulans, including transcriptome (mRNA-Seq) and translatome (Ribo-Seq) profiles from various developmental stages and tissues, such as embryos at 0-2 h, 2-6 h, 6-12 h, and 12-24 h, third-instar larvae, P7-8 pupae, female and male bodies, and female and male heads. In total, we obtained approximately 786 million high-quality reads for D. simulans (Table S2). These datasets were designed to be directly comparable to the previously published D. melanogaster data 23, with identical embryonic stages and tissue types. This comprehensive comparative translatome analysis, utilizing matched developmental stages and tissues between the two species, allowed us to test the uORF-mediated translational buffering effects during evolution.

Translational conservation and dominance of uORFs between Drosophila species

Given that the translation of uORFs is a crucial determinant for their functional impact, we first characterized the translational profiles of uORFs in D. melanogaster and D. simulans to explore the evolutionary roles of uORFs in gene regulation. We identified 18,412 canonical uORFs shared between the two species (referred to as conserved uORFs hereafter), 2,789 uORFs specific to D. melanogaster and 2,440 uORFs specific to D. simulans. The translational efficiencies (TEs) of the conserved uORFs were highly correlated between the two species across all developmental stages and tissues examined, with Spearman correlation coefficients ranging from 0.478 to 0.573 (Fig. 2A). Notably, conserved uORFs exhibited significantly higher translational efficiencies (TEs) compared to species-specific uORFs in both species. The median TE of conserved uORFs was 1.62 times that of non-conserved uORFs in D. simulans, while the corresponding ratio in D. melanogaster was 1.52 (Fig. 2B).

Conservation and translation of uORFs between D. melanogaster and D. simulans.

(A) Spearman’s correlation coefficients (Rho, represented by the bars) of conserved TEs between Dm (D. melanogaster) and Ds (D. simulans). ***, P < 0.001. Data for the female head sample is shown as an example in the right panel. The x- and y-axes represent the uORF TEs in Dm and Ds. (B) The median of TE of conserved and species-specific uORFs in each sample. Each dot represents the median TE of a sample for a specific uORF class. Data from the female head sample is shown as an example in the right panel. P values were obtained from Wilcoxon rank sum tests. ***, P < 0.001. (C) uORFs were ranked by decreasing TEs within each gene. The uORF with the highest TE within each gene was defined as the dominant uORF (#1). EC2” represents the second highest uORF TE and the same goes for EC3” and “>3”. Each dot represents the median TE of a sample. (D) Fraction of conserved uORFs among dominant uORFs and other translated uORFs in each sample. The paired samples in Dm and Ds were linked together. The P value was obtained by the paired Wilcoxon signed rank test. ***, P < 0.001. (E) Absolute values of the interspecific TE fold changes (log2TE-FC) of dominant uORFs and the other translated uORFs in each sample. The paired samples in Dm and Ds were linked together. The median value of each sample is shown. The P value was obtained via the paired Wilcoxon signed rank test. ***, P < 0.001. Data from the female head sample were used as an example in the right panel.

In D. melanogaster, 7,259 (52.2%) genes had no uORFs, 2,687 (19.3%) had a single uORF, and 3,961 (28.5%) contained multiple uORFs. Among genes with multiple uORFs, one uORF generally emerged as dominant, displaying a higher TE than the others within the same gene (Fig. 2C). The median TE of the dominant uORF was 4.84 times that of the second-highest uORF within the same gene in D. melanogaster, and the corresponding ratio was 5.21 times in D. simulans (Fig. 2C). To assess the consistency of this dominance across different tissues and developmental stages, we identified 3,072 multiple-uORF genes in D. melanogaster with at least one translated uORF (TE > 0.1 in at least five stages/tissues). Of these, 569 genes consistently used the same dominant uORF across the measured samples, significantly higher than the number expected under randomness (5 genes, 95% confidence interval: 1-10) based on shuffling the TEs of uORFs 1,000 times (Fig. S14). This trend was also observed in D. simulans and persisted under different thresholds for defining “translated uORFs” (Fig. S14). Furthermore, the dominant uORFs showed a higher proportion of conserved uATGs than the other translated uORFs (median proportion 82.5% versus 78.8%, P < 0.001) (Fig. 2D). Additionally, TE fold-changes (|log2TE-FC|) between the two species were, on an average, 23.2% smaller for dominant uORFs than for other conserved uORFs (Fig. 2E), suggesting that dominant uORFs are more likely under stronger stabilizing selection. These findings suggest that, in genes with multiple uORFs, the dominantly translated uORF may play a more important role in regulating CDS translation than the other uORFs.”

uORFs and CDSs show correlated translation differences between D. melanogaster and D. simulans

To investigate the relationship between translational changes in uORFs and their downstream CDSs across species, we analyzed the translation efficiency (TE) of uORFs and corresponding downstream CDSs in D. melanogaster and D. simulans. Consistent with our simulations that the translation of a uORF is tightly linked to that of downstream CDS, uORFs exhibited a significant positive correlation with the TE of their downstream CDSs in all samples analyzed (P < 0.001, Spearman’s correlation) (Fig. 3A). We then compared the interspecific TE change of a uORF (βu = TEuORF,&.2/TEuORF,2CD) with that of its corresponding CDS (βc = TECDS,&.2/TECDS,2CD) between D. melanogaster and D. simulans. We found βu is significantly positively correlated with βc across all samples (P values < 0.001, Spearman’s correlation) (Fig. 3B). These results align well with our simulations, which showed that fluctuations in translational factors (such as ribosomes) influence both uORF and CDS translation in the same direction (Fig. S8-S9).

uORFs reduce CDS translational divergence between D. melanogaster and D. simulans.

(A) The correlation of uORF TEs and the corresponding CDS TEs in 10 samples of Dm (D. melanogaster) and Ds (D. simulans). The bars represent Spearman’s correlation coefficient (Rho). In all samples, we obtained both P values < 0.001. Data for the female head sample of Dm and Ds are shown as examples in the right panel. (B) Correlations between interspecific uORF TE changes (log2βu) and CDS TE changes (log2βC) in 10 samples. The x-axis was divided into 50 equal bins with increasing βu. Spearman’s correlation coefficients (Rho) are shown at the top left. ***, P < 0.001 in the correlation test. (C) Genes expressed in female heads (mRNA RPKM > 0.1 in both species) were classified into three classes according to whether a gene had a conserved and dominantly translated uORF or not. Boxplots showing interspecific CDS TE variability |log2(βc)| of different gene classes. P values were calculated using Wilcoxon rank sum tests between the neighboring groups. ***, P < 0.001. (D) Genes expressed in female heads were classified into three classes according to the length of translated uORFs. Boxplots showing interspecific CDS TE variability |log2(βc)| of different gene classes. P values were calculated using Wilcoxon rank sum tests between the neighboring groups. ***, P < 0.001.

uORFs buffer interspecific translational divergence of CDSs

While the direction of translational efficiency (TE) changes for uORFs and CDSs tends to be consistent, our simulations suggest that the magnitude of TE changes in CDSs is generally smaller due to the buffering effect of uORF translation (Fig. 1B, S4-5). To validate this, we first identified uORFs and CDSs with significant interspecific TE differences by assessing whether βu or βC significantly deviated from 1, using an established statistical framework 23. This analysis uncovered 1,151 to 4,189 CDSs with significant interspecific TE changes (FDR < 0.05) (Table 1), with genes involved in development, morphogenesis, and differentiation being significantly enriched during embryonic stages (Fig. S15). Conversely, genes related to metabolism, response to stimuli, and signaling were enriched in larval, pupal, and adult stages (Fig. S15). Additionally, we identified 144 to 1,193 uORFs with significant TE differences between species, accounting for approximately 1-15% of expressed uORFs (Table 1). The smaller number of uORFs showing significant TE changes compared to CDSs between D. melanogaster and D. simulans likely reflects their shorter length and reduced statistical power, rather than indicating that uORFs are less variable in translation than CDSs.

Numbers of genes showing different magnitudes of TE changes between uORFs and CDS at the interspecific level.

To further investigate, we quantitatively compared the magnitude of interspecific TE changes between uORFs and their corresponding CDSs using a previous method 23. We defined γ =βc/βu for a uORF-CDS pair within the same mRNA and tested whether γ was significantly different from 1 to identify pairs with differential TE changes between uORFs and CDSs (Fig. S16). When γ < 1 and βu >1, or γ > 1 and β$u<1, it indicates that the TE change for the CDS is smaller than that for the uORF (Fig. S16). Among CDS-uORF pairs where βu > 1, nearly all (8-487) showed a significant γ < 1 in each sample, except for one pair from the pupal stage where γ > 1 (Table 1). This suggests that the magnitude of TE changes in CDSs was generally smaller than in uORFs when uORF TE increased, and vice versa when uORF TE decreased (βu < 1) (Table 1). These comparative translatome analyses indicate that uORFs buffered mRNA translation changes during the evolutionary divergence of D. melanogaster and D. simulans.

uORF buffering is influenced by its conservation, dominance, and length

To investigate how the conservation level and translation patterns of uORFs influence their buffering capacity on CDS translation, we categorized genes expressed in each pair of samples into three classes: Class I, genes with uORFs conserved and dominantly translated in both Drosophila species; Class II, genes with conserved uORFs translated in both species but not dominantly in at least one; and Class III, the remaining expressed genes. We then compared the interspecific translation efficiency (TE) changes of the CDS (|βc|) across these three categories. Significant differences in |βc| were observed, with a consistent hierarchy of Class I < II < III across all pairs of samples (Fig. 3C and S17). On average, Class I genes exhibited an average of 8.18% and 23.8% lower |βc| values compared to Class II and Class III, respectively. This indicates that conserved and dominantly translated uORFs exert a stronger buffering effect on CDS translation.

To further validate the simulation results suggesting that longer uORFs have a stronger buffering effect (Fig. 1C and S6-7), we divided genes expressed in each pair of samples into three groups: those without translated uORFs (No), those with short uORFs (short, total length below the median), and those with long uORFs (long, total length above the median). Consistently, longer uORFs were associated with stronger buffering effects on CDS translation across all pairs of samples (Fig. 3D and S18). Specifically, genes with longer uORFs showed 12.7% and 26.5% lower |βc| values compared to genes with short uORFs or no uORFs, respectively.

Overall, these findings underscore that the buffering capability of a uORF is positively correlated with its conservation level, translation dominance, and length.

uORFs buffer translational fluctuations during Drosophila development

Gene expression undergoes dynamic changes during Drosophila development, with significant alterations in the translation program to meet developmental demands, including shifts in ribosome loading rates 24,62,63. Therefore, we extended our analysis to investigate the role of uORFs in buffering these translational fluctuations, hypothesizing that uORFs could mitigate them during development. According to our hypothesis, if a gene has a translated uORF in D. melanogaster but not in its orthologous gene in D. simulans, then the translation of this gene is likely more stable across developmental stages in D. melanogaster than its ortholog in D. simulans, and vice versa.

To test this hypothesis, we compared the CV of CDS TE across 10 developmental stages in D. melanogaster and in D. simulans, respectively. Genes with translated uORFs (TE>0.1 in at least one sample) in D. melanogaster exhibited significantly 22.5% smaller CVs in D. melanogaster than their orthologs lacking these uORFs in D. simulans, indicating more stable translation (Fig. 4A). Consistently, for genes with translated uORFs in D. simulans but not in D. melanogaster, the CVs were 13.3% lower in D. simulans compared to their orthologs lacking these uORFs in D. melanogaster (Fig. 4B). Consistent results were observed when a uORF is required to be translated (TE > 0.1) in all 10 samples (Fig. S19A), in the 4 embryonic stages (Fig. S19B) or in the 6 stages including embryos, larva, and pupa (Fig. S19C). Moreover, within each species, genes with translated uORFs also showed less variability in CDS TE across developmental stages compared to those without translated uORFs, with a 31.8% reduction in D. melanogaster and a 28.9% reduction in D. simulans (Fig. 4C). This effect was consistent across different thresholds for defining “translated uORFs” (Fig. S20).

uORFs could reduce CDS translational fluctuation during Drosophila development.

(A) The CV of TECDS across 10 Dm (D. melanogaster) samples and 10 Ds (D. simulans) samples. The selected gene with uORFs translated (TE > 0.1) in at least one Dm sample but its homologous gene without translated uORF in Ds samples. Each pair of dots linked by a gray line represents a pair of homologous genes in Dm and Ds. ***, P < 0.001, Wilcoxon signed-rank test. (B) The CV of TECDS across 10 Dm samples and 10 Ds samples. The selected gene with uORFs translated (TE > 0.1) in at least one Ds sample but its homologous gene without translated uORF in Dm samples. Each pair of dots linked by a gray line represents a pair of homologous genes in Dm and Ds. ***, P < 0.001, Wilcoxon signed-rank test. (C) Within each Drosophila species, the CV of TECDS of genes with translated uORFs compared to genes without the translated uORFs. The P values are obtained by the Wilcoxon rank sum test. ***, P < 0.001.

These results suggest that uORFs function as translational buffers, reducing gene translation fluctuations during Drosophila development.

Knocking out the uORF of bcd increased bcd CDS translation in D. melanogaster

After verifying the uORFs-mediated translational buffering during Drosophila evolution and development, we next aimed to directly explore the biological function of these buffering-capable uORFs in vivo. We first applied stringent criteria to identify uORFs with significant buffering effects on CDS translation between D. melanogaster and D. simulans. Specifically, we looked for 1) uORFs with significant TE changes (| log2 (βu)| > 1.5, adjusted P < 0.05), 2) negligible changes in its corresponding CDS translation (|log2(βu)| < 0.05, adjusted P > 0.05), and 3) a significant difference between the magnitude of these changes (|log₂(γ)| > 1.5, adjusted P < 0.05). We identified 131 uORF-CDS pairs in 103 genes that meet these criteria in at least one stage/tissue (Table S3), with a majority of the genes (67%, 69 out of 103) from embryonic stages (Fig. S21A), suggesting a crucial role for uORFs in maintaining translational stability during early development. Among these genes, one notable case is the bicoid (bcd) gene, a master regulator of anterior-posterior axis patterning during early embryogenesis 6466. The bicoid gene contains a 4-codon uORF (excluding the stop codon) in its 5’ UTR, and branch length score (BLS) analysis 18 showed that the start codon (uATG) of the bicoid uORF is highly conserved across the Drosophila phylogeny, with a BLS of 0.90 (on a scale from 0 to 1, where higher values indicate greater conservation) (Fig. 5A and S21B). Ribo-Seq data revealed that in 0-2 h embryos, the TE of the bcd uORF varied more than threefold, while the TE of the CDS was virtually the same between D. melanogaster and D. simulans (Fig. 5B and Table S3). This suggests that the bcd uORF buffers translation during early development.

The strong buffering uORF of bcd and it knockout.

(A) Multiple sequence alignment of the bcd uORF and partial CDS in D. melanogaster and 20 other Drosophila species. The uORF and CDS are boxed in green and purple, respectively. The start codons of the uORF and CDS are boxed in red. (B) The coverage of mRNA-Seq (top), Ribo-Seq (middle), and TEs (bottom) of the bcd uORF and CDS in 0-2 h embryos of D. melanogaster (red) and D. simulans (blue). The uORF and CDS are denoted at the lower panel with dark green triangles and purple boxes, respectively. The 2 dashed lines mark the CDS region. The uORF TE, CDS TE and their interspecific changes were labeled at the bottom. (C) Genotypes of WT and two uORF knock-out strains (uKO1 and uKO2) generated by CRISPR-Cas9 technology. The uORF is boxed in dark green, and the red ATG represents the start codon of the uORF in the D. melanogaster genome.

To investigate the regulatory role of the bcd uORF, we used CRISPR-Cas9 to knock out its start codon in D. melanogaster, generating two mutant homozygotes (uKO1/uKO1 and uKO2/uKO2) with a genetic background matched to that of the wild-type (WT) (Figs. 6A and S22). To determine whether these mutations enhanced bcd CDS translation, we performed ribosome fractionation followed by qPCR 6769, comparing bcd mRNA levels in polysome and monosome fractions (P-to-M ratio) from 0-2 h embryos of uKO1/uKO1, uKO2/uKO2, and WT flies (Fig. 6B). A larger P-to-M ratio means more mRNAs are enriched in the polysome fractions and bound by more ribosomes, thus indicative of higher translation efficiency. P-to-M ratios were higher in the mutants compared to WT at 29°C (Fig. 6C), with a similar but less pronounced trend at 25°C, where the difference between uKO2/uKO2 and WT was not statistically significant (Fig. 6C). These findings, along with the known impact of temperature on gene expression and phenotypic plasticity 70,71, suggest that the regulatory function of the uORF and overall translation efficiency are temperature-sensitive, highlighting a complex interplay between environmental conditions and gene regulation. To further confirm the bcd uORF’s regulatory function, we conducted dual-luciferase reporter assays. The 5’UTR from bcd mutant (uKO1 or uKO2) or WT was cloned into a Renilla luciferase reporter construct (Fig. 6D). Luciferase activity was significantly higher in uKO1 and uKO2 compared to the WT 5’UTR, confirming the repressive role of the bcd uORF.

Knocking out the bcd uORF increases CDS translation and perturbs the transcriptome during D. melanogaster embryogenesis.

(A) Dual-luciferase assay for bcd WT uORF and mutated uORF. The reporter structures of the WT and uORF mutants are illustrated on the left. The uORF mutant sequence was the same as that in the fly mutant created with CRISPR-Cas9 technology. The relative activity of Renilla luciferase was normalized to that of firefly luciferase. Error bars represent the S.E. of six biological replicates. Asterisks indicate statistical significance (***, P < 0.001). (B) Two ribosome fractions (monosome and polysome) of 0-2 h embryos were separated in a sucrose density gradient. Relative RNA abundance in the monosome and polysome fractions was quantified by real-time quantitative PCR. (C) P-to-M ratio of bcd mRNA (bcd mRNA abundance in polysome fraction/bcd mRNA abundance in monosome fraction) at 25°C (left) and 29°C (right). The P-to-M ratios of mutants were normalized to WT controls at 25°C and at 29°C, respectively. Error bars represent the S.E. of six biological replicates. Asterisks indicate statistical significance (*, P < 0.05; **, P < 0.01; ***, P < 0.001; n.s., P > 0.05). (D) The number of DEGs in each stage and their intersection with each other at 25°C (top) and 29°C (bottom). (E) Gene ontology analysis of DEGs at 29°C in each stage. The biological process (BP) terms with q-values < 0.05 in each stage are indicated in red and others are indicated in white.

bcd uORF mutants show wide transcriptomic alteration during Drosophila embryogenesis

Since Bcd regulates the expression of many zygotic genes 6466, we anticipated that the increased translation of bcd resulting from disrupting the uORF would influence Drosophila transcriptomes and phenotypes. To verify this notion, we performed RNA sequencing on embryos from WT and uKO2/uKO2 flies at four developmental stages (0-2 h, 2-6 h, 6-12 h, and 12-24 h) under both 25°C and 29°C conditions, using two biological replicates (Table S4 and Fig. S23). Differential expression analysis revealed widespread alterations in gene expression between WT and mutant embryos, with the number of differentially expressed genes (DEGs) increasing over developmental time and at higher temperature. At 25°C, we identified 674, 817, 1047, and 2,041 DEGs in 0-2 h, 2-6 h, 6-12 h, and 12-24 h embryos, respectively; while at 29°C, we detected 3,884, 2,358, 2,901, and 4,164 DEGs in the corresponding stages (Fig. 6D). The majority of DEGs were stage-specific, with only a small fraction consistently differentially expressed across all four stages. Functional enrichment analysis of the DEGs revealed distinct biological pathways affected at each stage, including cell morphogenesis and pattern specification in 0-2 h embryos, metabolic processes and tissue development in 2-6 h and 6-12 h embryos, and mitochondrial respiration in 12-24 h embryos (Fig. 6E). Notably, direct targets of Bcd 72 were significantly enriched among the DEGs in three out of the four stages, with the exception of 2-6 h embryos (Fig. S24). RT-qPCR validation of 20 target genes of Bcd confirmed the reliability of the RNA-seq differential expression analysis (Fig. S25). Together, these findings demonstrate that disruption of the bcd uORF leads to widespread transcriptional changes during Drosophila development, affecting processes ranging from embryogenesis to postembryonic metabolism.

bcd uORF mutants display decreased hatching rates and starvation resistance

Given the widespread transcriptome alterations, we anticipated phenotypic abnormalities in the bcd uORF mutants. As expected, both uKO1/uKO1 and uKO2/uKO2 mutants exhibited significantly lower hatching rates compared to WT (P = 1.4×10−6 and 7.9×10−6, respectively, Wilcoxon rank sum test [WRST]; Fig. 7A). At 25°C, uKO1/uKO1 mutants produced fewer offspring than WT flies (P < 0.001, WRST, Fig. 7B). Given that bcd is a maternal gene, we expected reciprocal crosses between uKO1/uKO1 mutants and WT flies to produce different outcomes. Indeed, crossing uKO1/uKO1 males with WT females resulted in offspring numbers similar to those from WT crosses, while crossing uKO1/uKO1 females with WT males yielded offspring numbers comparable to crosses between uKO1/uKO1 mutants (Fig. 7B). This verified that the reduction in offspring is due to maternal defects in the uKO1/uKO1 mutants. Similar patterns were observed for uKO2/uKO2 mutants at 25°C (Fig. 7B). Notably, the fecundity reduction in uORF-KO mutants was more pronounced at 29°C (Fig. 7C). Furthermore, crossing uKO1/uKO1 and uKO2/uKO2 mutants produced significantly fewer progeny than WT flies (Fig. 7C), ruling out genetic background or off-target effects. Collectively, these data demonstrate that disrupting the bcd uORF significantly impaired hatchability and fertility.

Knockout of the bcd uORF reduces offspring number and starvation resistance.

(A) Comparison of the hatching rates (%) of mutant and WT offspring (n=20, Wilcoxon rank sum test; ***, P < 0.001). (B) The offspring number per maternal parent in different crosses over 10 days at 25°C. Asterisks indicate significant differences between various crosses and crosses of WT females with WT males (n=20, Wilcoxon rank sum test; *, P < 0.05; **, P < 0.01; ***, P < 0.001; n.s., P > 0.05). The different crosses were denoted as the x-axis labels. (C) The offspring number per maternal parent in different crosses over 10 days at 29°C. (D) Survival curves of WT and mutant adult flies of females (left) and males (right) under starvation conditions. The black line represents the WT, the red line represents the uKO1/uKO1 mutant, and the blue line represents the uKO2/uKO2 mutant. Asterisks indicate significant differences compared to the WT. (n=200, log-rank test; ***, P < 0.001; n.s., P > 0.05).

We also found both uKO1/uKO1 and uKO2/uKO2 female mutants perished significantly faster than WT flies under starvation conditions (Fig. 7D). Males showed similar tendencies, although the difference was not statistically significant for uKO1/uKO1 mutants (Fig. 7D). These data suggest that the knockout of bcd uORF diminished starvation resistance in adults, likely due to embryogenesis abnormalities induced by the bcd uORF deletion, even in those that successfully developed to adulthood.

Conservation of uORF-mediated translational buffering in primates

To explore the generality of uORF-mediated translational buffering across evolutionary clades, we analyzed previously published transcriptome and translatome data from three tissues (brain, liver, and testis) in humans and macaques 73. We identified 33,680 canonical uORFs in humans and 29,516 in macaques, with 24,385 conserved between the two species. Despite the larger number of uORFs in primates compared to Drosophila due to differences in genome size and gene number, the median TE of conserved uORFs was 1.79 times that of non-conserved uORFs in humans, and the corresponding ratio was 3.43 in macaques (Fig. 8A and Fig. S26). TEs of uORFs were positively correlated between humans and macaques across all tissues (P < 0.001, Fig. 8B). Additionally, significant positive correlations were observed between the TEs of uORFs and their corresponding coding sequences (CDSs) in all tissues (Fig. S27). Although interspecific TE divergence of uORFs (βu) and CDSs (βC) were positively correlated (P < 0.001, Fig. 8C), uORFs generally exhibited larger divergence (Table S5). Notably, longer uORFs showed stronger buffering effects on CDS translation, reducing interspecific TE divergence by 12.9% and 38.0% compared to genes with short or no uORFs (Fig. 8D and Fig. S28).

uORFs function as translational buffers in primates.

(A) Boxplots showing the TEs of conserved and species-specific uORFs between Hs (H. sapiens) and Mm (M. mulatta). Data for the brain is shown as an example. Wilcoxon rank sum tests. ***, P < 0.001. (B) Spearman’s correlation coefficient (Rho) of uORFs’ TE between humans and macaques. The Rho values in the brain, liver, and testis were shown as bar plots. ***, P < 0.001. Data for the brain is shown as an example in the right panel. (C) Correlation between interspecific uORF TE changes (log2βu) and corresponding CDS TE changes (log2βC) in three tissues. The x-axis was divided into 50 equal bins with increasing βu. (D) Genes expressed in brains were classified into three classes according to the total length of translated uORFs. Boxplots showing interspecific CDS TE variability |log2(βc)| of different gene classes. P values were calculated using Wilcoxon rank sum tests between the neighboring groups. ***, P < 0.001. (E) Genes expressed in brains (mRNA RPKM > 0.1 in both species) were classified into three classes according to whether a gene had a conserved and dominantly translated uORF (TE > 0.1) in both species or not. Boxplots showing interspecific CDS TE variability |log2(βc)| of different gene classes. P values were calculated using Wilcoxon rank sum tests between the neighboring groups. ***, P < 0.001. (F) Boxplot showing the coefficients of variation (CVs) of CDS TE among the 69 lymphoblastoid cell lines (LCLs). Expressed genes (mean mRNA RPKM > 0.1) were divided into 20 bins with increased mRNA expression levels. In each bin, the genes were divided into two fractions according to whether the gene had a translated uORF or not. Wilcoxon rank sum tests. *, P < 0.05; **, P < 0.01; ***, P < 0.001.

As in Drosophila, we categorized expressed human genes into three classes based on uORF conservation and translation: Class I, genes with conserved and dominantly translated uORFs in both humans and macaques; Class II, genes with conserved uORFs translated in both species but not dominantly in at least one; and Class III, the remaining expressed genes (Fig. 8E and Fig. S29). Consistent with findings in Drosophila, significant differences in |βc| between humans and macaques were observed in the order of Class I < II < III, with Class I genes showing 9.8% and 17.1% lower |βc| values compared to Class II and Class III genes, respectively (Fig. 8E and Fig. S29). These similarities between primates and Drosophila—two clades that diverged over 700 million years ago 74 —suggest that uORF-mediated translational buffering is a widespread mechanism for stabilizing gene translation across evolutionary clades.

We also analyzed matched mRNA-Seq and Ribo-Seq data from 69 human lymphoblastoid cell lines 75,76 to test whether uORFs buffer against translational variability across different individuals within humans. Genes with translated uORFs exhibited, on average, 8.65% lower CVs in CDS TE across individuals than genes without translated uORFs (Fig. 8F), with longer uORFs showing stronger buffering effects (Fig. S30). Collectively, these findings suggest that uORFs play a crucial role in reducing translational variability both across evolutionary clades and within species.

Discussion

Translational control is vital for maintaining protein homeostasis and cellular activities 77,78. However, the mechanisms underlying the conservation of protein abundance across species remain largely unknown 5,6. In this study, we extended the ICIER model and conducted simulations to explore uORFs’ regulatory roles in buffering translation during evolution. Our simulations demonstrated that uORF translation reduces variability in downstream CDS translation, with buffering capacity positively correlated with uORF translation efficiency, length, and number. Comparative translatome analyses across developmental stages of two Drosophila species provided evidence that uORFs mitigate interspecific differences in CDS translation. Similar patterns were observed between humans and macaques, and between human individuals, suggesting that uORF-mediated buffering is an evolutionarily conserved mechanism in animals. Additionally, in vivo experiments showed that knocking out the bcd uORF led to aberrant embryogenesis and altered starvation resistance, with more pronounced phenotypic effects at higher temperatures, supporting the role of uORFs in fine-tuning translation and phenotypic outcomes.

While the prevailing consensus on uORFs’ functions is that they repress the translation of downstream CDS by sequestering ribosomes 34,4348, recent studies based on suggest uORFs might play essential roles in buffering translational noise and stabilizing protein expression 43,55,56. While these studies have primarily focused on the role of uORFs in single-gene cases or under stress conditions, our work expands this knowledge by demonstrating that uORFs act as a general mechanism for buffering translational variability on a genome-wide scale across multiple species and developmental stages. Our findings also suggest that uORF-mediated translational buffering is not merely a response to environmental stress but an evolutionarily conserved mechanism integral to maintaining protein homeostasis across species. Organisms can buffer phenotypic variation against environmental or genotypic perturbations through “canalization” 79. The discovery that uORFs are crucial for Drosophila development links them to canalization, providing evidence that uORFs reduce interspecific CDS translation differences and enhance phenotypic stability. While previous studies primarily focused on the stabilization of transcriptional level 8084 or protein level 55,82,8587, our study opens new avenues for exploring how organisms maintain phenotypic robustness at the translational level through molecular mechanisms like uORF-mediated buffering. This study lays the groundwork for future research into the evolutionary and functional roles of uORFs, particularly in the context of canalization and adaptive evolution.

The original ICIER model was devised to elucidate how a single uORF can confer resistance to global translation inhibition on its host gene during stress conditions 43. It proposed that an elongating ribosome (80S) causes downstream 40S subunits to dissociate from the mRNA after collision. Our revised model expands this scope to include bidirectional dissociation, where an 80S ribosome can release both upstream and downstream 40S subunits, reflecting recent findings that 80S primarily causes upstream 40S dissociation during 40S/80S collisions 56. Mechanistically, increasing the loading rate of 40S ribosomes onto the mRNA (Rin) elevates the density of 40S and 80S ribosomes scanning the uORF, thereby raising the likelihood of 40S-80S collisions and subsequent 40S dissociation. This leads to a reduced flow of 40S ribosomes reaching the CDS relative to the initial increase in 40S at the 5’ end of the mRNA and uORF. Conversely, a decrease in 40S ribosome loading rate reduces the density of scanning ribosomes, diminishing the probability of 40S-80S collisions and 40S dissociation, and partially restoring the flow of 40S ribosomes to the CDS. Overall, the 40S-80S collision mechanism and subsequent 40S dissociation at the uORF moderate downstream CDS translation less than changes in ribosome loading at the 5’ end of the mRNA or uORF. The revised model underscores the analogy of uORFs as “molecular dams” adeptly modulating the stream of ribosomes and cushioning downstream CDS translation against fluctuations. Further, the revised model integrates ribosomal collisions and separations within the CDS located downstream of the uORF, shifting the focus from merely tracking 40S escape rates from the uORF to directly quantifying completed translations by 80S ribosomes at the terminus of both uORFs and CDSs. This methodology provides a more direct and quantitative assessment of CDS translation efficiency, offering new insights into the buffering role of uORFs. Additionally, our study broadens its perspective to account for the potential interactive effects of multiple uORFs within a gene, recognizing the prevalent clustering of these regulatory elements and their combinatorial influence on translational control. In essence, our study presents a nuanced and comprehensive expansion of the ICIER model, paving the way for a deeper understanding of uORFs as pivotal elements in the maintenance of protein synthesis stability under variable translational conditions. However, we realize the existence of alternative mechanisms governing uORF function, such as those found on traditional repression models 55 or ribosome queuing and re-initiation strategies 56. As such, further computational studies are needed to dissect and understand the full spectrum of uORF-mediated translational regulation.

The bcd gene, crucial for Drosophila embryogenesis, has been well studied 65,66,8892, but its uORF remains underexplored. Our study showed that deleting the bcd uORF led to abnormal phenotypes and reduced fitness, underscoring the importance of uORF-mediated translation. These effects were more severe under heat stress (29℃) than at normal temperatures (25℃), suggesting uORFs play a vital role in buffering phenotypic variation. Rescue experiments, typically used to confirm that phenotypic changes are due to specific genetic edits, faced challenges in our study and previous uORF-KO experiments in animals 93,94 and plants 9598. We altered the start codon of the bcd uORF to minimize effects on the 5’ UTR, without impacting the coding sequence, making traditional rescue experiments impractical. To control for genetic background variations, we backcrossed two bcd uORF-KO mutant strains with w1118 flies for nine generations, using w1118 as a control. Both uORF-KO mutants (uKO1/uKO1 and uKO2/uKO2) consistently showed reduced progeny, ruling out background or off-target effects. Crosses between the two mutants produced fewer offspring than WT crosses, confirming that the reduced progeny was due to the uORF deletions, not genetic background. As bcd is maternally inherited, we expected defective phenotypes only in offspring from mutant females crossed with wild-type males. This was confirmed: uKO1/uKO1 males crossed with wild-type females had normal offspring, while uKO1/uKO1 females crossed with wild-type males had reduced offspring, similar to mutant crosses. Similar results were observed for uKO2/uKO2 mutants. These findings confirm that disrupting bcd uORFs significantly reduces hatchability and fertility, effects not attributable to genetic background. Our results highlight the critical role of bcd uORFs in development and fecundity, and provide a basis for future studies on uORF regulation in other genes across species.

Taken together, this study demonstrates the role of uORF-mediated translational buffering in mitigating variability in gene translation during species divergence and development, using large-scale comparative transcriptome and translatome data. Our work addresses gaps in understanding the stabilization of gene expression at the translational level and offers new insights into the evolutionary and functional significance of uORFs.

Materials and Methods

Modeling the uORF-mediated buffering effect on CDS translation

We adapted the stochastic ICIER framework developed by Andreev et al. 43 based on the TASEP model, with several major modifications. First, we concatenated a 500-codon CDS downstream of the uORF. A leaky scanning ribosome from the uORF would initiate translation at the CDS start codon with a probability of ICDS. The probability of elongating ribosomes moving along the CDS (vEC= 0.5) in each action was higher than that along uORFs (vEu= 0.3) in the model settings, considering that uORFs usually encode blocking peptides 99103 or contain stalling codons 56,104,105. Second, we recorded the number of elongating ribosomes that completed translation at the stop codon of a CDS (NEC) or uORF (NEu) during a given time period and regarded them as proxies for quantifying the CDS or uORF TE. Third, we considered three models for simulating the consequences of scanning ribosomes colliding with elongating ribosomes mentioned above: a downstream dissociation model, an upstream dissociation model, and a double dissociation model.

In detail, the mRNA molecule structure in the simulations consisted of five parts: a 5’-leader before the uORF (150 nucleotides), the uORF (ranging from 2 to 100 codons), a segment between the uORF and CDS (150 nucleotides), the CDS (500 codons) and the 3’UTR (150 nucleotides). The Rin determined the loading rate of the 40S scanning ribosomes on the mRNA 5’-terminus, and it roughly represented the availability of trans translational resources in the cell. We generated two sets of Rin values (Fig. S1A) to simulate the variation in the availability of translational resources (40S scanning ribosomes, etc.): i) 1000 values of Rin were generated from a random generator of a uniform distribution, U (0, 0.1); ii) 1000 values were first generated from a random generator of an exponential distribution, E (1), and then each of the 1000 values was divided by 70 to obtain 1000 values of Rin. IuORF and ICDS represent the probability of translation at the start codon of a uORF or CDS, respectively. We used different combinations of IuORF and ICDS to test how translational initiation strength influenced the buffering effect of uORFs. vS determined the scanning rate of the 40S ribosome, and vEuand vEC determined the elongation rate of the 80S ribosome at a uORF or CDS, respectively. All the parameters used in our simulation are listed in Table S1.

The mRNA molecule was modeled as an array, and the value of each position in the array represented the occupation status of ribosomes along the mRNA molecule. The process of translation was simulated by a series of discrete actions referred to Andreev et al. 43. Upon each action, there was a probability that certain events would occur, including the addition of a 40S ribosome to the 5’-terminus, the transformation of a 40S ribosome into an 80S ribosome at a CDS or uORF start codon, the movement of a 40S ribosome or 80S ribosome to the next codon in the CDS or uORF, etc. More specific operations in each action were described in Supplementary Materials and Methods. The simulation code is freely available at GitHub: https://github.com/lujlab/uORF_buffer.

After 1,000,000 actions, we recorded the number of 80S ribosomes that completed translation at the uORF (NEu) and CDS (NEC) for each Rin input, which was used to represent the protein production rates (i.e. translation efficiency) of the uORF and CDS, respectively. To test the effects of different factors, we used various combinations of the parameters in Table S1 in the simulations. For a given distribution of Rin input (1000 values) and the dissociation model, we obtained the corresponding 1000 NEC values and their CV values by calculating the ratio of the standard deviation to the mean of NEC.

In the double-uORF simulation, we only adopted a uniform distribution of Rin input and the downstream dissociation model. Most parameters and simulation processes were similar to those in the single-uORF simulation, except that an additional uORF was introduced upstream of the CDS with an initiation probability of IuORF2.

Annotation of uORFs in D. melanogaster and D. simulans

We downloaded the whole-genome sequence alignment (maf) of D. melanogaster (dm6) and 26 other insect species from the University of California Santa Cruz (UCSC) genome browser (genome.ucsc.edu) 106. To improve the genome correspondence between D. melanogaster and D. simulans, we adopted a newly published reference-quality genome of D. simulans (NCBI, ASM438218v1) 107 to replace the original D. simulans genome in UCSC maf. We softly masked repetitive sequences in the new genome by RepeatMasker 4.1.1 (http://www.repeatmasker.org) and aligned it to dm6 with lastz 108 in runLastzChain.sh following UCSC guidelines. The lastz alignment parameters and the scoring matrix were the same as the parameters of dm6 and droSim1 in UCSC. Chained alignments were processed into nets by the chainNet and netSyntenic programs. The alignment was integrated into the multiple alignments of 27 species with multiz 109.

Based on the genome annotation of D. melanogaster (FlyBase r6.04, https://flybase.org/), we used the Galaxy platform to parse the multiple sequence alignments of 5’UTRs in Drosophila. The 5’UTR sequences of each annotated transcript from D. melanogaster and the corresponding sequences in D. simulans were extracted from the maf. The start codons of putative uORFs were identified by scanning all the ATG triplets (uATGs) within the 5’UTRs of D. melanogaster and D. simulans. uATGs that overlapped with any annotated CDS region were removed. The presence or absence of each uATG of D. melanogaster was determined at orthologous sites in D. simulans based on multiple genome alignments, and vice versa. The conserved uORF was defined as a uORF where its uATG was present in both D. melanogaster and the corresponding orthologous positions of D. simulans. For each protein-coding gene, we only considered the canonical transcript. For transcripts containing multiple uORFs, we defined the uORF that showed the highest TE as the dominant uORF. The branch length score (BLS) of the uATGs was calculated as we previously described 18.

Fly materials and general raising conditions

The sim4 strain of D. simulans was used to generate all the libraries of D. simulans in this study. All flies were raised on standard corn medium and grown in 12 h light: 12 h dark cycles at 25 °C for general conditions or at 29 °C for specific experimental design. The samples of embryos at different stages, larva, pupa, bodies and heads were collected following a previous protocol 23.

Processing Drosophila mRNA-Seq and Ribo-Seq data

Ribo-Seq and matched mRNA-Seq libraries for different developmental stages and tissues of D. simulans were constructed as we previously described 23 and sequenced on an Illumina HiSeq-2500 sequencer (run type: single-end; read length: 50 nt) according to the manufacturer’s protocol. The 3’ adaptor sequences (TGGAATTCTCGGGTGCCAAGG) were trimmed using Cutadapt 3.0 with default parameters 110, and the NGS reads were mapped to the genomes of yeast, Wolbachia, Drosophila viruses and the sequences of tRNAs, ribosomal RNAs (rRNAs), small nuclear RNAs (snRNAs) or small nucleolar RNAs (snoRNAs) of D. melanogaster (FlyBase r6.04) and D. simulans (FlyBase r1.3) using Bowtie2 version 2.2.3 111 with the parameters -p8 --local -k1. The mapped reads of these genomes/sequences were further removed in the downstream analysis.

After filtering, the mRNA-Seq and Ribo-Seq reads were mapped to the reference genomes of D. melanogaster (FlyBase, r6.04) and D. simulans (NCBI, ASM438218v1) respectively using the Spliced Transcripts Alignment to a Reference (STAR) algorithm 112. For Ribo-Seq reads, we assigned a mapped RPF (27-34 nt in length) to its P-site using the psite script from Plastid 113. The uniquely mapped reads were extracted and then mapped to the CDS of D. melanogaster and D. simulans respectively using STAR 112. The P-sites of RPF or mRNA-Seq reads that overlapped with a CDS were counted separately. The reads that were not mapped to CDSs were then mapped to the 5’UTRs of D. melanogaster and D. simulans. The P-sites of RPF or mRNA-Seq reads that overlapped with uORFs were counted separately.

The TE of a given uORF or CDS was calculated as the RPKMP-site/RPKMmRNA ratio. For a few uORFs with RPKMmRNA = 0 but RPKMP-site > 0, a pseudocount of 0.1 was added to both RPKMmRNA and RPKMP-site to avoid dividing a positive value by zero. In each sample, expressed genes were defined as the genes with a CDS RPKMmRNA > 0.1 in both species, and translated uORFs were defined as uORFs with a TE > 0.1.

Testing the statistical significance of the difference in the interspecific TE change between a uORF and its downstream CDS

For this analysis, we adopted the methods developed by Zhang et al. 23. Briefly, for the samples of female bodies or male bodies of D. melanogaster with biological replicates, we obtained the log2(TE) and SE of the log2(TE) of a uORF or CDS by contrasting the RPF counts against mRNA-Seq read counts using DESeq2. Then, we fitted the SE values against the normalized mRNA counts and log2(TE) values using the gam function in the R package mgcv, with a log link to obtain the SE ∼ mRNA counts + log2TE functions. For other samples without biological replicates, we estimated the SE of the log2(TE) for a feature (CDS or uORF) by applying the fitted functions obtained based on the biological replicates of female and male bodies to the observed mRNA counts and log2(TE) values. We identified uORFs whose TEs differed significantly between the paired samples of D. melanogaster and D. simulans by testing whether the value obtained for log2 (βu) = log2 (TEuORF,sim) – log2 (TEuORF,mel) was significantly different from 0. Based on the SE of the log2(TE) derived as described above, the SE of the log2(βu) can be derived as follows:

As the Wald statistic, , follows a standard normal distribution under the null hypothesis of log2 (βu) = 0, we calculated the P value as follows: . The identification of CDSs whose TEs differed significantly between the paired samples of D. melanogaster and D. simulans was similar to the procedures above.

Then, we defined (where βC = TECDS,sim / TECDS,mel) and tested whether log2 (γ) was significantly different from 0 to determine whether the magnitude of interspecific TE changes in CDSs and uORFs was significantly different. We obtained log2TEuORF,sim, log2TEoORF,mel, log2TECDS,sim, and log2TECDS,mel as described above and estimated SE_log2TEuORF,sim, SE_log2TEuORF,mel, SE_log2TECDS,sim, and SE_log2TECDS,mel based on the biological replicates of female and male bodies of D. melanogaster. Finally, log2(γ) also follows a normal distribution with SE denoted as SE_log2(γ) =

As the Wald statistic, , follows a standard normal distribution under the null hypothesis that log2 (βu) = 0, we calculated the P value as follows: .

Knocking out a uORF with CRISPR-Cas9 technology in D. melanogaster

We searched for possible sgRNA target sites near the uATG start codon of a uORF using the Benchling website (https://www.benchling.com/crispr/) to design optimal single guide RNA (sgRNA) sequences with high specificity and low off-target effects. We then synthesized single-stranded complementary DNAs (ssDNAs) and annealed them to obtain double-stranded DNA (dsDNA), which served as the template for sgRNA expression. The template sequences of the sgRNA used for bcd uATG-KO are listed in Table S6. The dsDNA was then ligated into the BbsI-digested pU6B vector. The pU6B-sgRNA plasmid was purified and injected into the embryos of transgenic Cas9 flies collected within one hour of laying at the Tsinghua Fly Center as described in Ni et al. 114. The injected embryos were kept at 25°C and 60% humidity until adulthood (G0). The G0 adult flies that hatched from injected embryos were individually crossed with other flies (y sc v) to increase the number of offspring. Then, the F1 progeny were crossed with flies carrying an appropriate balancer (Dr, e/TM3, Sb). After F2 spawning, the F1 individuals were screened for mutations of interest by genotyping. The primers used for genotyping are listed in Table S6. The F2 progeny whose parents showed positive genotyping results were then screened for the yellow gene to separate the chromosome carrying nos-Cas9. The screened F2 males were crossed with the flies containing the same balancer as above. After F2 genotyping, the progeny (F3) of positive F2 individuals showing the same mutation status were crossed individually to generate homozygous mutants in the F4 generation. The original homozygous mutants were sequentially outcrossed with w1118 flies for 9 generations to purify the genetic background (Fig. S22B).

Ribosome fraction analysis by sucrose gradient fractionation and RT‒qPCR

The 0-2 h embryos obtained from w1118, uKO1/uKO1, and uKO2/uKO2 mutants raised at 25°C and 29°C were collected and homogenized in a Dounce homogenizer with lysis buffer [50 mM Tris pH 7.5, 150 mM NaCl, 5 mM MgCl2, 1% Triton X-100, 2 mM dithiothreitol (DTT), 20 U/ml SuperaseIn (Ambion), 0.5 tablets of proteinase inhibitor (Roche), 100 µg/ml emetine (Sigma Aldrich), and 50 µM guanosine 5′-[β,γ-imido]triphosphate trisodium salt hydrate (GMP-PNP) (Sigma Aldrich)] at 4°C. The lysates were clarified by centrifugation at 4°C and 20,000×g for 8 min, and the supernatants were transferred to new 1.5 ml tubes. 10-45% sucrose gradients were prepared in buffer (250 mM NaCl, 50 mM Tris pH 7.5, 15 mM MgCl2, 0.5 mM DTT, 12 U/ml RNaseOUT, 0.5 tablets of protease inhibitor, and 20 µg/ml emetine) using a Gradient Master (Biocomp Instruments) in ULTRA-CLEAR Thinwall Tubes (Beckman Coulter). A sample volume of up to 500 µl was applied to the top of each gradient. After ultracentrifugation with a Hitachi P40ST rotor at 35,000 × rpm for 3 h at 4°C, the monosome and polysome fractions were collected, flash-frozen in liquid nitrogen, and stored at −80°C until further use.

The RNA in the monosome and polysome fractions was extracted separately using TRIzol reagent (Life Technologies, Inc.) and chloroform (Beijing Chemical Works) following the manufacturer’s instructions and were reverse transcribed into cDNA using the PrimeScript™ II 1st Strand cDNA Synthesis Kit (Takara). RT‒qPCR analysis of bcd cDNA and its targets was performed using PowerUp™ SYBR™ Green Master Mix (Thermo Fisher) following the manufacturer’s instructions with rp49 as an internal control. The primer sequences employed for RT-qPCR are listed in Table S6. For each sample, the ratio of bcd mRNA abundance in the polysome fraction to that in the monosome fraction was calculated as the P-to-M ratio. Six biological replicates were performed for each sample.

Dual-luciferase reporter assays

The wild-type (WT) 5’UTR of bcd was cloned from cDNA by PCR, and uATG mutations were introduced into 5’UTR using specific amplification primers (listed in Table S6). The WT and mutated 5’UTR sequences were ligated into a linearized reporter plasmid (psiCHECK-2 vector, Promega). The whole sequence of all the plasmids was validated by Sanger sequencing.

Drosophila S2 cells were cultured in Schneider’s Insect Medium (Sigma) plus 10% (by volume) heat-inactivated fetal bovine serum, 100 U/ml penicillin and 100 µg/ml streptomycin (Thermo Fisher) at 25℃ without CO2 for 24 h to reach 1–2×106cells/ml before further treatments. Plasmid transfection was conducted with Lipofectamine 3000 (L3000001, Thermo Fisher) according to the supplier’s protocol. The Renilla luciferase activity associated with WT or uORF-mutated 5’UTRs was measured according to the manual of the Dual-Luciferase Reporter Assay System (Promega) 32 h after transfection and was normalized to the activity of firefly luciferase.

mRNA-Seq in the embryos of mutant and WT flies

We collected 0-2 h, 2-6 h, 6-12 h, and 12-24 h embryos of w1118 and uKO2/uKO2 mutants raised at 25°C and 29°C and conducted mRNA-Seq. Library construction and sequencing with PE150 were conducted by Annoroad on the Illumina Nova6000 platform. Two biological replicates were sequenced for each sample. The clean data were mapped to the reference genome of D. melanogaster (FlyBase, r6.04) using STAR 112. Reads mapped to the exons of each gene were tabulated with htseq-count 115. The differentially expressed genes were identified using DESeq2 116. The Gene Ontology (GO) analyses were conducted using the “clusterProfiler” package in R 117.

Measurement of embryo hatchability

We collected embryos from WT and mutant flies and manually seeded them in vials containing standard corn medium at a density of 30 embryos per vial and 20 vials per strain. The embryos were cultivated under standard conditions (60% humidity, 12 h light: 12 h dark cycles at 25°C). We counted the number of pupae in each vial after the completion of pupation and calculated the corresponding hatching rate = numberpupa/ 30

Quantification of offspring number per female fly

Newly hatched virgins were picked out and allowed to mature for two days in separate vials. They were then mated by placing one virgin female with three male flies for two days. After that, each female parent was transferred to a new vial to count the offspring number produced in each 10-day period at 25°C and 29°C. All assays were performed with 20 females per genotype.

Measurement of starvation resistance in adult flies

We selected 3- to 5-day-old adult males and females and placed them in the starvation medium (1.5% agar), with 10 flies per vial and 10 vials for both males and females from each strain. We made observations every 6 h or 12 h to count the number of deaths under starvation conditions until all flies had starved to death. The survival curves were plotted by the ggsurvplot package in R.

mRNA-Seq and Ribo-Seq data analysis in primates

The mRNA-Seq and Ribo-Seq data from the brains, livers, and testes of humans and macaques were downloaded from reference 73 with accession number E-MTAB-7247 (ArrayExpress). The mRNA-Seq and Ribo-Seq data of human lymphoblastoid cell lines (LCL) from Yoruba individuals were downloaded from references 75,76 under accession numbers GSE61742 and E-GEUV-1. The matched mRNA-Seq and Ribo-Seq libraries of 69 individuals were used.

The uORF annotation and downstream analysis procedures for the human and macaque data were similar to those applied in Drosophila as described above. The differential analysis of translational efficiency in humans and macaques was conducted by Xtail 118. In each pair of human-macaque samples, expressed genes were defined as the genes with a CDS RPKMmRNA > 0.1 in both species. The translated uORFs in a sample were defined as uORFs with a TE > 0.1. For the human cell line data, expressed genes were defined as genes with a mean CDS RPKMmRNA > 0.1 across the cell lines, and translated uORFs were defined as uORFs with a mean TE > 0.1

Acknowledgements

We thank Drs. Wei Xie, Weiwei Zhai, and Xionglei He for constructive comments and suggestions. This work was supported by grants from the Ministry of Science and Technology of the People’s Republic of China (2022YFE0132000), the Yunnan Provincial Science and Technology Project at Southwest United Graduate School (202302A0370006), the National Natural Science Foundation of China (32070597) and the Natural Science Foundation of Beijing (5212006). We thank the National Center for Protein Sciences at Peking University for their technical assistance. Some of the analyses were performed on the High-Performance Computing Platform of the Center for Life Sciences.

Declaration of interests

The authors declare no competing interests.

Data availability statement

All deep-sequencing data generated in this study, including single-ended mRNA-Seq and Ribo-Seq data of 10 developmental stages and tissues of Drosophila simulans and paired-end mRNA-Seq data of 0-2 h, 2-6 h, 6-12 h, and 12-24 h Drosophila melanogaster embryos, were deposited in the China National Genomics Data Center Genome Sequence Archive (GSA) under accession numbers CRA003198, CRA007425, and CRA007426. The mRNA-Seq and Ribo-Seq data for the different developmental stages and tissues of Drosophila melanogaster were published in our previous paper 23 and were deposited in the Sequence Read Archive (SRA) under accession number SRP067542.

Supporting information

Supplementary figures and tables

Supplementary Table 3