Author Response
The following is the authors’ response to the original reviews.
eLife assessment
This important study addresses the fundamentally unresolved question of why many thousands of small-effect loci contribute more to the heritability of a trait than the large-effect lead variants. The authors explore resource competition within the transcriptional machinery as one possible explanation with a simple theoretical model, concluding that the effects of resource competition would be too small to explain the heritability effects. The topic and approximation of the problem are very timely and offer an intuitive way to think about polygenic variation, but the analysis of the simple model appears to be incomplete, leaving the main claims only partially supported.
We thank eLife for recognizing the importance of our work. We hope the revised manuscript addresses the reviewers’ reservations.
Public Reviews:
Reviewer #1 (Public Review):
This study explores whether the extreme polygenicity of common traits can be explained in part by competition among genes for limiting molecular resources (such as RNA polymerases) involved in gene regulation. The authors hypothesise that such competition would cause the expression levels of all genes that utilise the same molecular resource to be correlated and could thus, in principle, partly explain weak trans-regulatory effects and the observation of highly polygenic architectures of gene expression. They study this hypothesis under a very simple model where the same molecule binds to regulatory elements of a large number m of genes, and conclude that this gives rise to trans-regulatory effects that scale as 1/m, and which may thus be negligible for large m.
We thank the reviewer for their thorough and thoughtful review of our manuscript.
The main limitation of this study lies in the details of the mathematical analysis, which does not adequately account for various small effects, whose magnitude scales inversely with the number m of genes that compete for the limiting molecular resource. In particular, the fraction of "free" molecule (which is unbound to any of the genes) also scales as 1/m, but is not accounted for in the analysis, making it difficult to assess whether the quantitative conclusions are indeed correct.
It is explicitly accounted for in the supplement.
Second, the questions raised in this study are better analysed in the framework of a sensitivity or perturbation analysis, i.e., by asking how changes in expression level or binding affinity at one gene (rather than the total expression level or total binding affinity) affect expression level at other genes.
In the context of complex traits, where an increase in gene expression can either increase or decrease the trait, we believe the most important quantity of interest is variation in expression and, therefore, trait variation. Nevertheless, our results do show that the relative change in expression due to competition is also small.
Thus, while the qualitative conclusion that resource competition in itself is unlikely to mediate trans-regulatory effects and explain highly polygenic architectures of gene expression traits probably holds, the mathematical reasoning used to arrive at this conclusion requires more care.
In my opinion, the potential impact of this kind of analysis rests at least partly on the plausibility of the initial hypothesis- namely whether most molecular resources involved in gene regulation are indeed "limiting resources". This is not obvious, and may require a careful assessment of existing evidence, e..g., what is the concentration of bound vs. unbound molecular species (such as RNA polymerases) in various cell types?
We intentionally looked at the most extreme case of extreme resource limitation, and we conclude that since extreme resource limitation is a small effect, the same would be true of weak resource limitation, when unbound molecules play an important role. We put more emphasis on this point in our revised text.
Reviewer #1 (Recommendations For The Authors):
While the main conclusion that resource competition in itself is unlikely to mediate trans effects and explain high levels of polygenicity may well be correct, I am not convinced that the mathematical reasoning presented in support of this conclusion is entirely correct. I will attempt to outline my concerns mainly in the context of section 2, since the arguments in sections 3 and 4 build upon this.
(a) The key assumption underlying the approximations in equations 3, 4, and 5 is that there is very little free polymerase, in other words /_0 is a small quantity. However, the second and third terms that emerge in equation 7 are also small quantities and (as far as I can see) of the same order as /_0. Thus, one cannot simply use equation 4 or 5 as a starting point to derive eq. 7 and should instead use the exact x_i = (g_i [G])/ (1+g_tot [G]), in order to make sure that all (and not just some) terms that are similar in order of magnitude are accounted for in the analysis.
The concentration of free polymerase is marked as [P], and we explicitly assume (just before eq. 2) that [P]<<[P]0 with [P]0 being the overall concentration of polymerase. This is a conservative assumption – we consider extreme resource competition with little free polymerase and since we since only a small effect in this extreme scenario we assume it would be a small effect also for less extreme scenarios. We put more emphasis on this point in our revised text.
More concretely, the difference between the exact x_i = (g_i [G])/ (1+g_tot [G]) and the approximate x_i = (g_i / g_tot) is precisely 1/m (for large m) in the example considered line 246 onwards. Thus, I suspect that the conclusion that Var[x_i] = (1-1/m)Var[g_i] in that example is just an artefact of starting with eqs. 4 and 5. As a sanity check, it may be useful to actually simulate resource competition explicitly (maybe using a deterministic simulation) under the explicit model [PG_i] = g_i [G] and _0 = + Sum[[PG]_i , i=1,m] without making any further approximations to see if perturbations in g_i actually produce Order [1/m] effects in the variance of x_i for the example considered line 246 onwards (this would require simulating with a few different m and plotting Var[x_i] vs. m for example).
The exact equation the reviewer is alluding to describes a scenario of non-extreme resource competition. If g_tot [G]>>1, i.e. if most polymerase is bound to a gene then x_i is equal to g_i/g_tot and this is the scenario we are considering of extreme competition. If g_tot [G]<<1, then x_i=g_i [G] and competition has no effect. While the intermediate case is interesting, we see no reason for the effects to be larger than in the extreme competition case.
We have added the results of simulations in the supplement to validate our arguments.
Lines 231-239: Because of the concerns highlighted above and questions about the validity of equation 7, I am not convinced that the interpretations given here and also in section 4 are correct.
(b) Lines 219-230 (including equations 6 and 7): I think to address the question of whether genetic changes in cis-regulatory elements for a given gene have an effect on other genes (under this model of resource competition), it is better to spell out the argument in terms of Var[ dx_i ] rather than Var[x_i], where dx_i is the change in expression level at gene i due to changes at all m genes, dg_i is the change in gene activity due to (genetic) changes in the relevant regulatory elements associated with gene i etc. Var[ dx_i ] can then be expressed as a sum of Var[dg_i], Var[dg_tot] and Cov[d g_i, dg_tot]. However, I suspect that to do this correctly, one should not start with the approximate x_i=g_i/g_tot : see previous comment.
The variance of the deviation from the mean is mathematically identical to the overall variance, Var[ dx_i ]= Var[ x_i ]. Our analysis is therefore equivalent to the suggested analysis.
Somewhere in all of this, there is also an implicit assumption that E[dg_i] is zero, i.e, mutations are as likely to increase as to decrease binding affinities so that one needs to only consider Var[dx_i] and not E[dx_i]; this assumption should be spelled out.
Our results concern the variation around trait means and therefore we have not included a possible mean effect of mutation, which would not affect the results but just shift the mean.
Some minor comments (mostly related to the introduction and general context):
- I think it would be worth connecting more with the literature on molecular competition and gene regulation (see e.g., How Molecular Competition Influences Fluxes in Gene Expression Networks, De Vos et al, Plos One 2011). Even though this literature does not frame questions in terms of "polygenicity of traits", these analyses address the same basic questions: to what extent do perturbations in gene expression at one gene affect other genes, or to what extent is there crosstalk between different genes or pathways?
We have expanded our introduction to refer to De Vos et al, as well as a few other papers we have recently become aware of. (e.g., Jie Lin & Ariel Amir
Nature Communications volume 9, Article number: 4496 (2018))
- Lines 88-89: "supports the network component of the model" is a vague phrase that does not convey much. It would be useful to clarify and make this more precise.
We have clarified this phrasing in the text.
- Lines 113-114: In the context of "selective constraint", it may also be worth discussing previous work by one of the authors: "A population genetic interpretation of GWAS findings for human quantitative traits". What implications would stabilizing selection on multiple traits (as opposed to simple purifying selection) have for the distribution of variances across trait loci and the extent to which trait architectures appear to be polygenic?
While most definitely of great interest to some of the authors, the distribution of variance across loci does not affect our results.
References: Barton and Etheridge 2018 in line 54 is not the correct reference; it should be Barton et al 2017 (paper with Amandine Veber). Fisher 1919 in line 52 is actually Fisher 1918. The formatting of references in the next paragraph (and in various other places in the paper) is also a bit unusual, with some authors referred to by their full names and others only by their last. I believe that it may be useful to crosscheck references throughout the paper.
We have crosschecked the references in the paper.
Line 164: Some word appears to be missing here. Maybe bound -> bound to ?
Fixed
Reviewer #2 (Public Review):
The question the authors pose is very simple and yet very important. Does the fact that many genes compete for Pol II to be transcribed explain why so many trans-eQTL contribute to the heritability of complex traits? That is, if a gene uses up a proportion of Pol II, does that in turn affect the transcriptional output of other genes relevant or even irrelevant for the trait in a way that their effect will be captured in a genome-wide association study? If yes, then the large number of genetic effects associated with variation in complex traits can be explained but such trans-propagating has effects on the transcriptional output of many genes.
This is a very timely question given that we still don't understand how, mechanistically, so many genes can be involved in complex traits variation. Their approach to this question is very simple and it is framed in classic enzyme-substrate equations. The authors show that the trans-propagating effect is too small to explain the ~70% of heritability of complex traits that are associated with trans-effects. Their conclusion relies on the comparison of the order of magnitude of a) the quantifiable transcriptional effects due to Pol II competition, and b) the observed percentage of variance explained by trans effects (data coming from Liu et al 2019, from the same lab).
The results shown in this manuscript rule out that competition for limited resources in the cell (not restricted to Pol II, but applicable to any other cellular resource like ribosomes, etc) could explain the heritability of complex traits.
We thanked the Reviewer for his resounding support of our paper!
Reviewer #2 (Recommendations For The Authors):
The authors rely on simulated data, and although the conclusions hold in a biologically-realistic scenario given the big difference in effect sizes, I wonder if the authors could provide data from the literature (if available) that give the reader a point of reference for the steady state of cells in terms of free/occupied Pol II molecules and/or free/occupied transcription binding sites. This information won't change the conclusion of the manuscript, but it will put it in the context of real biological data.
We have scoured the literature, but have not found readily available data with which to validate our results (beyond that which is already referenced).
Reviewer #3 (Public Review):
Human complex traits including common diseases are highly polygenic (influenced by thousands of loci). This observation is in need of an explanation. The authors of this manuscript propose a model that competition for a single global resource (such as RNA polymerase II) may lead to a highly polygenic architecture of traits. Following an analytical examination, the authors reject their hypothesis. This work is of clear interest to the field. It remains to be seen if the model covers the variety of possible competition models.
We thank the Reviewer for his assessment, support and comments.
Reviewer #3 (Recommendations For The Authors):
This manuscript provides a straightforward and elegant quantitative argument that the competition for the RNA polymerase is not a significant source of trans-eQTLs and, more generally, of genetic variance of complex polygenic phenotypes. This is an unusual manuscript because the authors propose a hypothesis that they confidently reject based on a calculation. This negative result is intuitive. Still, the manuscript is of interest. Progress in understanding the highly polygenic architecture of complex traits is welcome, and the resource competition hypothesis is quite natural. I have three specific comments/concerns listed below.
(1) The manuscripts states that V(x_i)=V(g_i/g_tot). Unless I am missing something, this seems to result from a very strong implicit assumption that all genetic variance is due to variation in the binding of RNA polymerase, while x_i_max is a constant. I would expect that x_i_max may also be genetically variable due to many effects unrelated to the Pol II binding (e.g. transcription rate, bursting, presence of R-loops etc.). I guess that the assumption made by the authors is conservative.
Indeed. We made conservative assumptions throughout, aiming to consider the most extreme scenario in which resource competition may affect trait variation. Our logic being that if even under the most extreme scenario resource competition is a small effect then it is a small effect in all scenarios. We put more emphasis on this point in our revised text.
(2) The manuscript focuses on the competition for RNA polymerase but suggests that the lesson learned is highly generalizable. However, it is an example of a single global limiting resource resulting in first-order kinetics. What happens in a realistic scenario of competition for multiple resources associated with transcription and with downstream processes (free ribonucleotides, spliceosome, polyadenylation machinery, ribosome, post-translational modifications)? It is possible that in most cases a single resource is a limiting factor, but an investigation (or even a brief discussion) of this question would support the claim that the results are generalizable.
We expect competition for multiple resource to result in similarly weak effects. Since there is not a great number of such resources, we do not expect it to change our qualitative result. We added language to that effect in the main text.
(3) Alternatively, what happens in a scenario of competition for multiple local resources shared by a few genes (co-factors, substrates, chaperones, micro-RNAs, post-translational modification factors such as kinases, degradation factors, scaffolding proteins)? In this case, each gene would compete for resources with a few other genes increasing polygenicity without a global competition with all other genes. Intuitively, a large set of such local competitions may lead to a highly polygenic architecture.
This is indeed a scenario in which competition may be a large effect which we mention in our discussion. “the conclusions may differ in contexts where a very small number of genes compete for a highly limited resource, such as access to a particular molecular transporter”