Trade-offs in modeling context dependency in complex trait genetics

eLife Assessment

It is known from model organisms that genes' effects on traits are often modulated by environmental variables, but similar gene-by-environment (GxE) interactions have been difficult to detect using statistical analyses of genomic data, e.g., in humans. This study introduces a new framework to estimate gene-by-environment effects, treating it as a bias-variance tradeoff problem. The authors convincingly show that greater statistical power can be achieved in detecting GxE if an underlying model of polygenic GxE is assumed. This polygenic amplification model is a truly novel view with fundamental promise for the detection of GxE in genomic datasets, especially with continued development to detect more complex signals of amplification.

https://doi.org/10.7554/eLife.99210.3.sa0

Significance of the findings:

Fundamental: Findings that substantially advance our understanding of major research questions

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Convincing: Appropriate and validated methodology in line with current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results and discussion
Methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

Genetic effects on complex traits may depend on context, such as age, sex, environmental exposures, or social settings. However, it remains often unclear if the extent of context dependency, or gene-by-environment interaction (GxE), merits more involved models than the additive model typically used to analyze data from genome-wide association studies (GWAS). Here, we suggest considering the utility of GxE models in GWAS as a trade-off between bias and variance parameters. In particular, we derive a decision rule for choosing between competing models for the estimation of allelic effects. The rule weighs the increased estimation noise when context is considered against the potential bias when context dependency is ignored. In the empirical example of GxSex in human physiology, the increased noise of context-specific estimation often outweighs the bias reduction, rendering GxE models less useful when variants are considered independently. However, for complex traits, we argue that the joint consideration of context dependency across many variants mitigates both noise and bias. As a result, polygenic GxE models can improve both estimation and trait prediction. Finally, we exemplify (using GxDiet effects on longevity in fruit flies) how analyses based on independently ascertained ‘top hits’ alone can be misleading, and that considering polygenic patterns of GxE can improve interpretation.

Introduction

In organisms and study systems where the environment can be tractably manipulated, gene-by-environment interactions (GxE) are the rule, not the exception (El Soda et al., 2014; Vieira et al., 2000; Des Marais et al., 2013; Smith and Kruglyak, 2008; Paaby and Rockman, 2014). Yet, in complex (polygenic) human traits, there are but a few cases in which models that incorporate GxE explain data—such as genome-wide association study (GWAS) data—better than parsimonious models that assume additive contributions of genetic and environmental factors (Munafò et al., 2014; Kraft and Aschard, 2015; Sella and Barton, 2019). This is true for both physical environments but also for other definitions of ‘E’, broadly construed to be any context that modifies genetic effects, such as age, sex, or social setting (Zhu et al., 2023; Schwaba et al., 2023; Elgart et al., 2022; Duncan and Keller, 2011; Gibson and Lacek, 2020; Brown et al., 2016; Ge et al., 2017; Balliu et al., 2021). GWAS commonly estimate marginal additive effects of an allele on a trait. The estimand here can be thought of as the average effect of the allele over a distribution of multidimensional contexts (Veller et al., 2023). With this view, some differences in allelic effects across contexts are likely omnipresent, but may very well be small, such that the cost of including additional parameters (for context-specific effects) outweighs the benefit of measuring heterogeneous effects. Here, we consider this problem and its connection to the currently underwhelming utility of GxE models in GWAS. First, we rigorously describe the statistical trade-off involved in estimating context specificity at the level of a single variant. Then, we highlight ways in which this trade-off might change as we consider GxE in complex traits, involving numerous genetic variants simultaneously. We begin by framing the problem of estimating context specificity at an individual variant as a bias-variance trade-off. For example, consider the estimation of an allelic effect on lung cancer risk that depends on smoking status. When the allelic effect is estimated from a sample without considering smoking status, the estimate would be biased with respect to the true effect in smokers. We can estimate the effect separately in smokers and non-smokers to eliminate the bias, but the consideration of the additional parameters—smoking status-specific effects—has an associated cost of increasing the estimation variance, compared to an estimator that ignores smoking status. This bias-variance trade-off is closely related to the ‘signal-to-noise’ ratio, where the signal of interest is the true difference in context-specific allelic effects. To demonstrate this trade-off in real data, we consider sex-specific effects on physiological traits in humans. We show that for the majority of traits, it is typically unhelpful to model sex dependency for individual sites since the increase in noise vastly outweighs the signal. We then consider the extension to GxE in complex traits. Complex trait variation is primarily due to numerous genetic variants of small effects distributed throughout the genome (Fisher, 1930; Falconer and Mackay, 1996; Yengo et al., 2022; Zwick et al., 2000). Simultaneously considering GxE across multiple variants may decrease estimation noise if the extent and mode of context specificity is similar across numerous variants. This would tilt the scale in favor of context-dependent estimation. In addition, we show how conventional approaches for detecting and characterizing GxE, which focus on the most significant associations, may lead to erroneous conclusions. Finally, we discuss implications for complex trait prediction (with polygenic scores). We suggest a future focus on prediction methods that empirically learn the extent and nature of context dependency by simultaneously considering GxE across many variants.

Results and discussion

Modeling context-dependent effect estimation as a bias-variance trade-off

The problem setup

We consider a sample of $n + m$ individuals characterized as being in one of two contexts, A or B of the individuals are in context A with the remaining $m$ individuals in context B. We measure a continuous trait for each individual, denoted by

\overset{A}{\overset{⏞}{y_{1}, \dots, y_{n}}}, \overset{B}{\overset{⏞}{y_{n + 1}, \dots, y_{n + m}}} .

We begin by considering the estimation of the effect of a single variant on the continuous trait. We assume a generative model of the form

y_{i} \sim {\begin{cases} N (α_{A} + β_{A} g_{i}, σ_{A}^{2}) & if i \in {1, \dots, n} \\ N (α_{B} + β_{B} g_{i}, σ_{B}^{2}) & if i \in {n + 1, \dots, n + m}, \end{cases}

where $β_{A}$ and $β_{B}$ are fixed, context-specific effects of a reference allele at a biallelic, autosomal variant $i$ , $g_{i} \in {0, 1, 2}$ is the observed reference allele count. $α_{A}$ and $α_{B}$ are the context-specific intercepts, corresponding to the mean trait for individuals with zero reference alleles in context $A$ and $B$ , respectively. $σ_{A}^{2}$ and $σ_{B}^{2}$ are context-specific observation variances. We would like to estimate the allelic effects $β_{A}$ and $β_{B}$ .

Estimation approaches

We compare two approaches to this estimation problem. The first approach, which we refer to as GxE estimation, is to stratify the sample by context and separately perform an ordinary least squares (OLS) regression in each sample. This approach yields two estimates, ${\hat{β}}_{A}$ and ${\hat{β}}_{B}$ , the OLS estimates of $β_{A}$ and $β_{B}$ of the generative model in Equation 1, respectively. This estimation model is equivalent to a linear model with a term for the interaction between context and reference allele count, in the sense that context-specific allelic effect estimators have the same maximum likelihood estimators in the two models (see Appendix 1). The second approach, which we refer to as additive estimation, is to perform an OLS regression on the entire sample and use the allelic effect estimate to estimate both $β_{A}$ and $β_{B}$ . We denote this estimator as ${\hat{β}}_{A \cup B}$ , to emphasize that the regression is run on all individuals from context $A$ and context $B$ . This estimation model posits that for $i = 1, \dots, n + m$ ,

y_{i} \sim N (α_{A \cup B} + β_{A \cup B} g_{i}, σ_{A \cup B}^{2}),

where $α_{A \cup B}$ is the mean trait value for an individual with zero reference alleles, $β_{A \cup B}$ is the additive allelic effect and $σ_{A \cup B}^{2}$ is the observation variance which is independent of context. Notably, this model differs from the generative model assumed above: $β_{A \cup B}$ may not equal $β_{A}$ and $β_{B}$ ; in addition, this model ignores heteroskedasticity across contexts.

Error analysis

We focus on the mean squared error (MSE) of the additive and GxE estimators for the allelic effect in context $A$ . The estimator minimizing the MSE may differ between contexts A and B, but the analysis for context $B$ is analogous. When selecting between these two estimation approaches, a bias-variance decomposition of the MSE is useful. Based on OLS theory (Casella and Berger, 2021, Theorem 11.3.3), under the model specified above, we have

{\hat{β}}_{A} \sim N (β_{A}, V_{A}),

where $V_{A} = \frac{σ_{A}^{2}}{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}}$ and ${\bar{g}}_{A}$ is the mean genotype of individuals in context $A$ . The unbiasedness of the GxE estimator implies

M S E ({\hat{β}}_{A}, β_{A}) = V_{A},

where $M S E ({\hat{β}}_{A}, β_{A})$ is the mean squared error of estimating $β_{A}$ with ${\hat{β}}_{A}$ . The case of the additive estimator, ${\hat{β}}_{A \cup B}$ , is a bit more involved. As we show in the Methods section, we can write

{\hat{β}}_{A \cup B} = ω_{A} {\hat{β}}_{A} + ω_{B} {\hat{β}}_{B}

for non-negative weights $ω_{A}$ and $ω_{B}$ (that need not sum to 1). Further, we show in Equation 7 of the Methods section that $ω_{A} \propto n H_{A}$ and $ω_{B} \propto m H_{B}$ , where $H_{A}$ and $H_{B}$ are the sample heterozygosities in contexts $A$ and $B$ , respectively. Using Equation 3, we may write

\begin{array}{ll} M S E ({\hat{β}}_{A \cup B}, β_{A}) = B i a s^{2} ({\hat{β}}_{A \cup B}, β_{A}) + V a r ({\hat{β}}_{A \cup B}) \\ = ((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2} + ω_{A}^{2} V_{A} + ω_{B}^{2} V_{B}, \end{array}

where $V_{B}$ is defined analogously to $V_{A}$ . Thus, with MSE as our metric for comparison, we prefer the GxE estimator in context $A$ when

M S E ({\hat{β}}_{A \cup B}, β_{A}) > M S E ({\hat{β}}_{A}, β_{A}),

or, if and only if

((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2} + ω_{A}^{2} V_{A} + ω_{B}^{2} V_{B} > V_{A} .

We refer to Equation 4 as the ‘decision rule’, since it guides us on the more accurate estimator; to minimize the MSE, we will use the context-specific estimator if and only if the inequality is satisfied. To gain some intuition about the important parameters here, we first consider the case of equal allele frequencies (and hence equal heterozygosities) in both contexts and equal estimation variance in both contexts. In this case, the GxE estimator is advantaged by larger context specificity (larger $| β_{A} - β_{B} |$ ) and disadvantaged by larger estimation noise (larger $V_{A} = V_{B}$ ) (Figure 1). In fact, the decision boundary (i.e. the point at which the two models have equal MSE) can be written as a linear combination of $| β_{A} - β_{B} |$ and $\sqrt{V_{A}}$ (Figure 1C). In this special case, we show in the Methods section that Equation 4 is an equality when

\sqrt{\frac{m}{2 n}} | β_{A} - β_{B} | - \sqrt{V_{A}} = 0.

Figure 1

Download asset Open asset

Bias-variance trade-off for single-site estimation with equal estimation noise and equal heterozygosity across contexts.

The x-axis shows the difference in context-specific effects, while the y-axis shows the standard deviation of the context-specific estimators—both in raw measurement units. The color on the plot indicates the difference between the additive and gene-by-environment interaction (GxE) estimators in bias (A), variance (B), or mean squared error (MSE) (C). (A) Only the additive estimator is potentially biased. The bias is proportional to the difference in context-specific effects and independent of the estimation noise. (B) The difference in variance is proportional to context-specific estimation noise and independent of the difference of context-specific effects. (C) The decision boundary is linear in both the estimation noise and the difference between context-specific effects.

More generally, in the case where $H_{A} = H_{B}$ but $V_{A} \neq V_{B}$ , we show in the Methods section that we can write Equation 4 as

\frac{(β_{A} - β_{B})^{2}}{V_{A}} > \frac{1 + ω_{A}}{1 - ω_{A}} - \frac{V_{B}}{V_{A}} .

This dimensionless re-parameterization of the decision rule makes explicit its dependence on three factors. $\frac{(β_{A} - β_{B})^{2}}{V_{A}}$ can be viewed as the ‘signal-to-noise’ ratio: it captures the degree of context specificity (the signal) relative to the estimation noise in the focal context, A $\frac{1 + ω_{A}}{1 - ω_{A}}$ is the relative contribution to heterozygosity, which equals the relative contribution to variance in the independent variable of the OLS regression of Equation 2. $\frac{V_{B}}{V_{A}}$ is the ratio of context-specific estimation noises. In Appendix 1, we extend the decision rule for the case of a continuous context variable. For a given trait and context, we can consider the behavior of the decision rule across variants with variable allele frequencies and allelic effects. The ratio of estimation noises, $r := \frac{V_{A}}{V_{B}}$ , will not be constant. However, in some cases, considering a fixed $r$ across variants is a good approximation. In GWAS of complex traits, each variant often explains a small fraction of trait variance. As a result, the estimation noise is effectively a matter of trait variance and heterozygosity alone. If per-site heterozygosity is similar in strata $A$ and $B$ , as it is, for example, for autosomal variants in biological males and females, $r$ is approximately fixed across variants (Zhu et al., 2023). Figure 2 illustrates the linearity of the decision boundary under the assumption that $r$ is fixed across variants. It also shows that the slope of the decision boundary changes as a function of $r$ . Intuitively, we are less likely to prefer GxE estimation for the noisier context. In fact, for sufficiently small values of $r$ (e.g. $r < \frac{1}{3}$ for $ω_{A} = \frac{1}{2}$ ), $\frac{1 + ω_{A}}{1 - ω_{A}} - \frac{V_{B}}{V_{A}}$ will be negative. This corresponds to the situation where $V_{A} ≪ V_{B}$ , in which case the additive estimator is never preferable to the GxE estimator in estimating $β_{A}$ , as the signal-to-noise ratio is always non-negative. Typically, this will also imply that the additive estimator is greatly preferable for estimating $β_{B}$ , as ${\hat{β}}_{B}$ will be extremely noisy.

Figure 2

Download asset Open asset

The decision boundary with different ratios of context-specific estimation noises.

In all panels, the heterozygosity of the variant is assumed to be equal across contexts. The x and y axes are the same as in Figure 1. (A) Estimation noise in the focal context, $A$ , is half that of the other context, $B$ . (B) Estimation noise is equal in both contexts. (C) Estimation noise in focal context is double that of the other context.

It is natural to ask where the decision rule of Equation 4 falls with respect to empirical GWAS data. We considered the example of biological sex as the context (GxSex), and examined sex-stratified GWAS data across 27 continuous physiological traits in the UK Biobank (Bycroft et al., 2018; Zhu et al., 2023). For each of 9 million variants, we estimated the difference in sex-specific effects and the variance of each marginal effect estimator in males. Then, using an estimate of the ratio of sex-specific trait variances as a proxy for the ratio of estimation variances of males and females, we approximated the linear decision boundary between the additive and GxE estimators (Figure 3A and B; Appendix 1—figure 2, Appendix 1—figure 3). To demonstrate the accuracy of our decision rule, we employed a data-splitting technique where we estimate the MSE difference between estimators in a training set and evaluate the accuracy in a holdout set (Appendix 1—figure 1). For almost all traits examined, very few allelic effects in males are expected to be more accurately estimated using the male-specific estimator (usually between 0% and 0.1%). Notable exceptions to this rule are testosterone, sex hormone binding globulin (SHBG), and waist-to-hip ratio adjusted for body mass index, for which roughly 0.5% of allelic effects are expected to be better estimated with the GxE model (Figure 3B). However, when considering only SNPs that are genome-wide significant in males (marginal p-value $< 5 \times 10^{- 8}$ in males), many traits show a much larger proportion of effects that would be better estimated by the GxE model. At an extreme, for testosterone, all genome-wide significant SNPs are expected to be better estimated by the GxE model. In addition, a large fraction of genome-wide significant effects are better estimated with the GxE model for creatinine (62%), arm fat-free mass (24%), waist-to-hip ratio (19%), and SHBG (18%) as well (Figure 3D).

Figure 3

Download asset Open asset

Applying the decision rule to sex-dependent effects on human physiological traits.

(**A, B**) The x-axis shows the estimated absolute difference between the effect of variants in males and females. The y-axis shows the measured standard error for each variant in males, the focal context here. The dashed line shows the decision boundary for effect estimation in males. The difference in mean squared error (MSE) between estimation methods increases linearly with distance from the dashed line, as in Figure 2. If a variant falls above (below) the line, the additive (gene-by-environment interaction [GxE]) estimator has a lower MSE. (A) shows a random sample of 15K single nucleotide variants whereas (B) shows only variants with a marginal p-value less than $5 \times 10^{- 8}$ in males. (**C, D**) The percent of effects in males which would be better estimated by the GxE estimator, across continuous physiological traits. (Note the difference in scale between the two panels.) To estimate these percentages, one single nucleotide variant is sampled from each of 1700 approximately independent autosomal linkage blocks, and this procedure is repeated 10 times. Shown are average percentages across the 10 iterations.

The decision rule we derived could potentially guide more accurate allelic effect estimation approaches. However, the consideration of GxE pattern sharing across many variants (polygenic GxE) can alter both bias and variance and therefore the trade-off. In our discussion of complex traits that follows, we therefore expand on the rule through qualitative consequences of polygenic GxE, and no longer stick to the analytical single variant rule.

Context dependency in complex traits

At the single variant level, and specifically when variants are considered independently from one another, we have discussed how the accurate estimation of allelic effects can be boiled down to a bias-variance trade-off. For complex traits, genetic variance is often dominated by the contribution of numerous variants of small effects that are best understood when analyzed jointly (Sella and Barton, 2019; Sinnott-Armstrong et al., 2021; Shi et al., 2016; Boyle et al., 2017; Liu et al., 2019; Wray et al., 2018; Yengo et al., 2022). It stands to reason that to evaluate context dependence in complex traits, we would also want to jointly consider polygenic patterns, rather than just the patterns at the loci most strongly associated with a trait (Urbut et al., 2019; Gibson and Lacek, 2020; Zhang et al., 2021; Paaby and Gibson, 2016; Aschard et al., 2017; Des Marais et al., 2013). Motivated by this rationale, we recently inferred polygenic GxSex patterns in human physiology (Zhu et al., 2023). One pattern that emerged as a common mode of GxSex across complex physiological traits is ‘amplification’: a systematic difference in the magnitude of genetic effects between the sexes. Moving beyond sex and considering any context, amplification can happen if, for example, many variants regulate a shared pathway that is moderated by a factor—and that factor varies in its distribution among contexts. Amplification is but one possible mode of polygenic GxE, but can serve as a guiding example for ways in which GxE may be pervasive but difficult to characterize with existing approaches (Zhu et al., 2023; Gibson and Dworkin, 2004; Miao et al., 2022; Balliu et al., 2021). In what follows, we will therefore use the example of pervasive amplification (across causal effects) to illustrate the interpretive advantage of considering context dependency across variants jointly, rather than independently.

A focus on ‘top hits’ may lead to mischaracterization of polygenic GxE

A common approach to the analysis of context dependency involves two steps. First, categorization of context dependency (or lack thereof) is performed for each variant independently. Second, variants falling under each category are counted and annotated across the genome. Some recent examples of this approach toward the characterization of GxE in complex traits include studies of GxSex effects on flight performance in Drosophila (Spierer et al., 2021), GxSex effects on various traits in humans (Traglia et al., 2022; Bernabeu et al., 2021), and GxDietxAge effects on body weight in mice (Wright et al., 2022). Characterizing polygenic trends by summarizing many independent hypothesis tests may miss GxE signals that are subtle and statistically undetectable at each individual variant, yet pervasive and substantial cumulatively across the genome. To characterize polygenic GxE based on just the ‘top hits’ may lead to ascertainment biases, with respect to both the pervasiveness and the mode of GxE across the genome. Much like the heritability of complex traits is thought to be due to the contribution of many small (typically sub-significant) effects (Boyle et al., 2017; Sinnott-Armstrong et al., 2021), when GxE is pervasive we may expect that the sum of many small differences in context-specific effects accounts for the majority of GxE variation. For concreteness, we consider in more depth one recent study characterizing GxDiet effects on longevity in Drosophila melanogaster (Pallares et al., 2023). In this study, Pallares et al. tracked caged fly populations given one of two diets: a ‘control’ diet and a ‘high-sugar’ diet. Across 271K single nucleotide variants, the authors tested for association between alleles and their survival to a sampling point (thought of as a proxy for ‘lifespan’ or ‘longevity’) under each diet independently. Then, they classified variants according to whether or not their associations with survivorship were significant under each diet as follows:

Significant under neither diet $\to$ classify as no effect.
Significant when fed the high-sugar diet, but not when fed the control diet $\to$ classify as high-sugar-specific effect.
Significant when fed the control diet, but not when fed the high-sugar diet $\to$ classify as control-specific effect.
Significant under both diets $\to$ classify as shared effect.

This authors’ choice of four categories a variant may fall into may be motivated by the wish to test for the presence of ‘cryptic genetic variation’—genetic variation that is maintained in a context where it is functionally neutral but carries large effects in a new or stressful context (Gibson and Dworkin, 2004; Paaby and Rockman, 2014; Young et al., 2016; Des Marais et al., 2013). Indeed, of the variants Pallares et al. classified as having an effect (one-hundredth of variants tested), approximately 31% were high-sugar specific, while the remaining 69% of the variants were shared. Fewer than 1% were labeled as having control-specific effects. They concluded that high-sugar-specific effects on longevity are pervasive, compatible with the hypothesis of widespread cryptic genetic variation for longevity. This characterization of GxE, based on ‘top hits’, places an emphasis on the context(s) in which trait associations are statistically significant, rather than on estimating how the context-specific effects covary. In addition, this particular classification system also does not cover all possible ways in which context-specific effects may differ. In Appendix 1, we discuss these interpretation difficulties further.

We next show that a generative model that differs qualitatively from the cryptic genetic variation model yields results that are highly similar to those observed by Pallares et al. We simulated data under pervasive amplification. Specifically, we sampled from a mixture of 40% of variants having no effect under either diet and 60% of variants having an effect under both diets—but exactly 1.4× larger under a high-sugar diet. We then simulated the noisy estimation of these effects and employed the classification approach of Pallares et al. to the simulated data (Methods). The patterns of allelic effects in the control compared to high-sugar contexts were qualitatively similar in the experimental data and our pervasive amplification simulation. This is true both genome-wide (Figure 4A compared to Figure 4B) and for the set of variants classified as significant with their classification approach (Figure 4C compared to Figure 4D). The similarity of ascertained variants further highlights caveats of interpretation based on the classification of ‘top hits’: despite the fact that we did not simulate any variants that only have an effect under the high-sugar diet, approximately 36% of significant variants were classified as specific to the high-sugar diet (green points in Figure 4D), comparable to the 31% of variants classified as high-sugar specific in the experimental data (Figure 4C). These variants simply have sub-significant associations in the control group and significant associations in the high-sugar group. In addition, every variant in the shared category (blue points in Figure 4D) in fact has a larger effect in the high-sugar diet than in the control diet, which cannot be captured by the classification system itself but represents the only mode of GxE in our simulation.

Figure 4

Download asset Open asset

A focus on top hits may be lead to mischaracterization of polygenic gene-by-environment interactions (GxE).

(A) Data from an experiment measuring allelic effects on longevity in caged flies given one of two diets, ‘control’ and ‘high sugar’. Shown are allelic effect estimates under each diet for a random sample of approximately 12K variants. (B) Simulated data where all true allelic effects are exactly 1.4 times larger under a high-sugar diet. The effects are estimated with sampling noise mimicking the Pallares et al. data. (C) Allelic effect estimates of variants ascertained as significant and classified as ‘diet-specific’ or ‘shared’ by Pallares et al. (D) Simulated effects ascertained as significant and classified using a similar procedure to that applied in (C). While the generative mode of GxE we used in our simulations was not considered by Pallares et al., the simulation results (left panels) closely match the patterns observed in their data (right panels) across all effects (top panels) and as reflected via their classification approach (bottom panels).

To recap, we simulated a mode of GxE that is not considered in Pallares et al. (i.e. pervasive amplification) and that is at odds with their conclusions about evidence for a large discrete class of SNPs with diet-specific effects (i.e. cryptic genetic variation). The close match of our simulation to the empirical results of Pallares et al. therefore illustrates that the characterization of GxE via hypothesis testing and classification at each variant independently may lead to erroneous interpretation when applied to empirical complex trait data as well. In Appendix 1, we show that a reanalysis of the Pallares et al. data that is based on estimating the covariance of allelic effects is directly consistent with pervasive amplification as well (Appendix 1—figure 4). In conclusion, the classification of ‘top hits’ alone may not be representative of the extent of GxE nor of the most pervasive modes of GxE.

The utility of modeling GxE for complex trait prediction

Modeling context dependency of genetic effects may hold the potential for constructing polygenic scores that are more accurate or improve their portability across contexts (Patel et al., 2022; Miao et al., 2022; Turley et al., 2018; Spence et al., 2022; Wang et al., 2024; Smith et al., 2025). Evidence for the utility of GxE models in polygenic score prediction, however, has been underwhelming and GxE models are still rarely applied (Zhu et al., 2023; Schwaba et al., 2023). A key reason behind this apparent discrepancy is the bias-variance trade-off for individual variants discussed above. If context-specific effects are similar—a likely possibility for highly polygenic traits with the majority of heritability owing to small causal effects—then additive models will tend to outperform (Fisher, 1930; Falconer and Mackay, 1996; Hill et al., 2008; Young, 2019). This is because the unbiasedness of GxE estimation does not make up for the cost of additional estimator variance, resulting from sample stratification by context or the addition of explicit interaction terms (Schwaba et al., 2023). We exemplify the relative importance of variance compared to bias in polygenic scoring using simulations. We continue with the generative model of pervasive amplification as an example. Namely, we simulated a GWAS of a continuous trait with independent effects in 2500 variants (50% of variants included in the GWAS). Effects were either the same in two contexts, $A$ and $B$ , or $1.4$ times larger in context $B$ . The GWAS is conducted with either a small sample size or a large sample size, conferring low or high statistical power, respectively. We then constructed polygenic scores using 833 variants (corresponding to one-third of the causal variants), which were ascertained as most significantly associated with the trait according to either the additive model (orange and red in Figure 5) in or context-specific hypothesis tests (green and blue in Figure 5).

Figure 5

Download asset Open asset

Polygenic score performance for context-dependent prediction models.

In each simulation, a genome-wide association study (GWAS) is performed on 5000 biallelic variants, half of which have no effect in either context. Of the other half, some percent of the variants (indicated on the x-axis) had effects 1.4× larger in one of contexts and the remaining SNPs had equal effects in both contexts. The broad sense heritability was set to $0.4$ in all simulations. The y-axis shows the average, over $11, 000$ simulations, of the out-of-sample Pearson correlation between polygenic score and trait value. (A) Results with a GWAS sample size of 1000 individuals. (B) Results with a GWAS size of $50, 000$ individuals.

Even in settings with pervasive GxE, additive polygenic scores (red lines in Figure 5) outperformed context-specific scores (green lines in Figure 5). The advantage of the additive model is manifested in two ways: more accurate estimation, as discussed above, but also better identification of true associations with the trait. We considered the two advantages separately. It is sometimes better to ascertain variants using the lower variance approach and estimate effects using the lower-bias approach. In our simulations, this strategy (orange lines in Figure 5) was preferable to using the GxE model for both ascertainment and estimation (green line). It was not preferable to using the additive model (red line) for both approaches, but it was the preferable strategy under a slightly different parametric regime, corresponding to more GxE (Appendix 1—figure 5). Finally, we considered a polygenic GxE approach, as implemented in ‘multivariate adaptive shrinkage’ (mash) (Urbut et al., 2019), a method to estimate context-specific effects by leveraging common patterns of effect covariance between contexts observed across the genome. mash models the underlying distribution of effects in all contexts as a mixture of zero-centered multivariate normal distributions with different covariance structures (as well as the null matrix, to induce additional shrinkage). After estimating this distribution via maximum likelihood, mash uses it as a prior to obtain posterior effect estimates for each variant in each context. As a result, posterior effect estimates across contexts regress toward commonly observed patterns of covariance of allelic effects across contexts. In our simulations, in the presence of substantial amplification, the polygenic adaptive shrinkage approach outperformed all other methods as long as the study was adequately powered (Figure 5B). This is thanks to the unique ability (compared to the three other approaches) to leverage the sharing of signals across variants, including the extent and nature of context dependency. With low power, however, the additive model performed best (Figure 5A). We attribute this to the variance cost associated with the polygenic adaptive shrinkage approach—driven by the estimation of additional parameters for capturing the genome-wide covariance relationships.

Conclusion

When genetic variants are considered independently, the estimation of their effects in different contexts can be boiled down to a bias-variance trade-off. For complex traits, we show through example that further considering polygenic patterns of GxE can be key for understanding context-dependent genetic architecture and to aid in prediction. The notion that complex trait analyses should combine observations at top associated loci alongside polygenic trends has gained traction with additive models of trait variation; it may be similarly important in our understanding of context dependency.

Methods

Expressing the additive estimator as a linear combination of GxE estimators

In this section, we prove the result of Equation 3, stating that

{\hat{β}}_{A \cup B} = ω_{A} {\hat{β}}_{A} + ω_{B} {\hat{β}}_{B}

for some non-negative weights $ω_{A}$ and $ω_{B}$ . To do this, we will need some additional notation. Let ${\bar{g}}_{A}$ denote the average number of effect alleles in individuals in context $A$ , and let ${\bar{g}}_{A \cup B}$ denote the average effect allele count across all individuals. Similarly, let ${\bar{y}}_{A}$ denote the average trait value in context $A$ , and let ${\bar{y}}_{A \cup B}$ denote the average trait value across all individuals. As an OLS estimator, the context-specific estimator is defined as

\begin{array}{ll} {\hat{β}}_{A} = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) (y_{i} - {\bar{y}}_{A})}{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i} - \sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) {\bar{y}}_{A}}{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i} - {\bar{y}}_{A} \sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})}{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i}}{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}}, since \sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) = 0 . \end{array}

Similarly, the additive estimator can be written as

\begin{array}{ll} {\hat{β}}_{A \cup B} = \frac{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B}) (y_{i} - {\bar{y}}_{A \cup B})}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \\ = \frac{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} (by the same logic as above) \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A \cup B}) y_{i} + \sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} . \end{array}

We will show that the weights in Equation 3 depend on the effect allele frequency in the two contexts, $f_{A}$ and $f_{B}$ . We will assume mean-centered traits, such that $\sum_{i = 1}^{n} y_{i} = 0$ and $\sum_{i = n + 1}^{n + m} y_{i} = 0$ . We note that mean-centering is inconsequential for effect estimation. We can then write

\begin{array}{ll} {\hat{β}}_{A \cup B} = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A \cup B}) y_{i} + \sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A \cup B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A} + ({\bar{g}}_{A} - {\bar{g}}_{A \cup B})) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B} + ({\bar{g}}_{B} - {\bar{g}}_{A \cup B})) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i} + \sum_{i = 1}^{n} ({\bar{g}}_{A} - {\bar{g}}_{A \cup B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B}) y_{i} + \sum_{i = n + 1}^{n + m} ({\bar{g}}_{B} - {\bar{g}}_{A \cup B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i} + ({\bar{g}}_{A} - {\bar{g}}_{A \cup B}) \sum_{i = 1}^{n} y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B}) y_{i} + ({\bar{g}}_{B} - {\bar{g}}_{A \cup B}) \sum_{i = n + 1}^{n + m} y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} (by our assumption of mean centered traits) \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \cdot \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}}{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B}) y_{i}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \cdot \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B})^{2}}{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A}) y_{i}}{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}} \cdot \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B}) y_{i}}{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B})^{2}} \cdot \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B})^{2}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} \\ = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} {\hat{β}}_{A} + \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B})^{2}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}} {\hat{β}}_{B} . \end{array}

Thus, $ω_{A} = \frac{\sum_{i = 1}^{n} (g_{i} - {\bar{g}}_{A})^{2}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}}$ and $ω_{B} = \frac{\sum_{i = n + 1}^{n + m} (g_{i} - {\bar{g}}_{B})^{2}}{\sum_{i = 1}^{n + m} (g_{i} - {\bar{g}}_{A \cup B})^{2}}$ in Equation 3. We note that the numerator of $ω_{A}$ is $n$ times the sample heterozygosity in context A, and the numerator of $ω_{B}$ is $m$ times the sample heterozygosity in context B. Thus, we have shown that

ω_{A} \propto n H_{A} and ω_{B} \propto m H_{B},

where $H_{A}$ and $H_{B}$ are the sample heterozygosities in context A and B, respectively. And, in the special case where $f_{A} = f_{B}$ , because this implies that the sample heterozygosities will be approximately equal across contexts, we have that

ω_{A} \propto n H_{A} and ω_{B} \propto m H_{B} .

Linearity of the decision rule

In Equation 5, under the assumption that $V_{A} = V_{B}$ and $H_{A} = H_{B}$ , the decision boundary is expressed as a linear function of $| β_{A} - β_{B} |$ and $\sqrt{V_{A}}$ as

\sqrt{\frac{m}{2 n}} | β_{A} - β_{B} | > \sqrt{V_{A}} .

Here, we prove that the linearity of the decision rule holds in the more general case where $\frac{V_{A}}{V_{B}} = r$ for some fixed value of $r$ . Equation 5 then follows as a special case of this fact when $r = 1$ . Starting from Equation 4, we prefer the GxE estimator to the additive estimator when estimating $β_{A}$ if

\begin{array}{ll} V_{A} < ω_{A}^{2} V_{A} + ω_{B}^{2} V_{B} + ((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2} \\ ⟺ V_{A} < ω_{A}^{2} V_{A} + \frac{ω_{B}^{2}}{r} V_{A} + ((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2} \\ ⟺ V_{A} - ω_{A}^{2} V_{A} - \frac{ω_{B}^{2}}{r} V_{A} < ((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2} \\ ⟺ (1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r}) V_{A} < ((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2} \\ ⟺ V_{A} < \frac{((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2}}{1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r}} (assuming 1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r} > 0) \\ ⟺ \sqrt{V_{A}} < \frac{| (ω_{A} - 1) β_{A} + ω_{B} β_{B} |}{\sqrt{1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r}}} (again assuming 1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r} > 0) \end{array}

If our assumption that $1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r} > 0$ does not hold, we note that the GxE model is always preferable and technically speaking there exists no decision rule between the two models. Now, when heterozygosities (and thus minor allele frequencies) are equal across contexts, then Equation 8 implies $ω_{A} + ω_{B} = 1$ . Therefore, we may write the decision rule as

\begin{array}{ll} \sqrt{V_{A}} < \frac{| (1 - ω_{B} - 1) β_{A} + ω_{B} β_{B} |}{\sqrt{1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r}}} \\ ⟺ \sqrt{V_{A}} < \frac{| ω_{B} (β_{B} - β_{A}) |}{\sqrt{1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r}}} \\ ⟺ \sqrt{V_{A}} < \frac{ω_{B}}{\sqrt{1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r}}} | β_{A} - β_{B} | (by properties of the absolute value) \\ ⟺ \sqrt{V_{A}} < \frac{1 - ω_{A}}{\sqrt{1 - ω_{A}^{2} - \frac{(1 - ω_{A})^{2}}{r}}} | β_{A} - β_{B} | . \end{array}

Here, we see that for any fixed $r$ the decision rule is linear with a slope determined by $r$ (Figure 2). Now, in the special case where $r = 1$ , we have

\begin{array}{ll} \sqrt{V_{A}} < \frac{1 - ω_{A}}{\sqrt{1 - ω_{A}^{2} - (1 - ω_{A})^{2}}} | β_{A} - β_{B} | \\ ⟺ \sqrt{V_{A}} < \frac{1 - ω_{A}}{\sqrt{1 - ω_{A}^{2} - 1 - ω_{A}^{2} + 2 ω_{A}}} | β_{A} - β_{B} | \\ ⟺ \sqrt{V_{A}} < \frac{1 - ω_{A}}{\sqrt{2 ω_{A} (1 - ω_{A})}} | β_{A} - β_{B} | \\ ⟺ \sqrt{V_{A}} < \sqrt{\frac{1 - ω_{A}}{2 ω_{A}}} | β_{A} - β_{B} | \end{array}

Now, substituting the definitions of $ω_{A}$ and $ω_{B}$ in the case of equal minor allele frequencies given in Equation 8, we can write

\begin{array}{ll} \sqrt{V_{A}} < \sqrt{\frac{1}{2}} \sqrt{\frac{1 - \frac{n}{n + m}}{\frac{n}{n + m}}} | β_{A} - β_{B} | \\ ⟺ \sqrt{V_{A}} < \sqrt{\frac{1}{2}} \sqrt{\frac{\frac{m}{n + m}}{\frac{n}{n + m}}} | β_{A} - β_{B} | \\ ⟺ \sqrt{V_{A}} < \sqrt{\frac{m}{2 n}} | β_{A} - β_{B} | . \end{array}

This inequality is instead an equality under the conditions stated in Equation 5. Finally, again using the definition of $ω_{A}$ and $ω_{B}$ given in Equation 8, we note that our assumption that $1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r} > 0$ will always hold in the case of equal minor allele frequencies and $r = 1$ , as

\begin{array}{ll} 1 - ω_{A}^{2} - \frac{ω_{B}^{2}}{r} = 1 - \frac{n^{2}}{(n + m)^{2}} - \frac{m^{2}}{(n + m)^{2}} \\ = \frac{(n + m)^{2} - n^{2} - m^{2}}{(n + m)^{2}} \\ = \frac{2 n m}{(n + m)^{2}}, \end{array}

which is strictly positive.

Re-parameterized decision rule in terms of unitless quantities

In Equation 6, under the assumption that $H_{A} = H_{B}$ , we re-state the decision rule in terms of the signal-to-noise ratio. Here, we prove this result. From Equation 4, we have that we should select the GxE model to estimate $β_{A}$ if and only if

\begin{array}{ll} V_{A} < ω_{A}^{2} V_{A} + ω_{B}^{2} V_{B} + ((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2} \\ ⟺ 1 < ω_{A}^{2} + ω_{B}^{2} \frac{V_{B}}{V_{A}} + \frac{((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2}}{V_{A}} \\ ⟺ 1 - ω_{A}^{2} < ω_{B}^{2} \frac{V_{B}}{V_{A}} + \frac{((ω_{A} - 1) β_{A} + ω_{B} β_{B})^{2}}{V_{A}} . \end{array}

Now, because $H_{A} = H_{B}$ , we know by Equation 8 that $ω_{A} + ω_{B} = 1$ . Then, we may write the decision rule as

\begin{array}{ll} 1 - ω_{A}^{2} < ω_{B}^{2} \frac{V_{B}}{V_{A}} + \frac{((1 - ω_{B} - 1) β_{A} + ω_{B} β_{B})^{2}}{V_{A}} \\ ⟺ 1 - ω_{A}^{2} < ω_{B}^{2} \frac{V_{B}}{V_{A}} + \frac{(ω_{B} (β_{B} - β_{A}))^{2}}{V_{A}} \\ ⟺ 1 - ω_{A}^{2} < ω_{B}^{2} \frac{V_{B}}{V_{A}} + ω_{B}^{2} \frac{(β_{A} - β_{B})^{2}}{V_{A}} \\ ⟺ \frac{1 - ω_{A}^{2}}{ω_{B}^{2}} < \frac{V_{B}}{V_{A}} + \frac{(β_{A} - β_{B})^{2}}{V_{A}} \\ ⟺ \frac{1 - ω_{A}^{2}}{(1 - ω_{A})^{2}} < \frac{V_{B}}{V_{A}} + \frac{(β_{A} - β_{B})^{2}}{V_{A}} \\ ⟺ \frac{1 - ω_{A}^{2}}{(1 - ω_{A})^{2}} - \frac{V_{B}}{V_{A}} < \frac{(β_{A} - β_{B})^{2}}{V_{A}} \\ ⟺ \frac{(1 - ω_{A}) (1 + ω_{A})}{(1 - ω_{A})^{2}} - \frac{V_{B}}{V_{A}} < \frac{(β_{A} - β_{B})^{2}}{V_{A}} \\ ⟺ \frac{1 + ω_{A}}{1 - ω_{A}} - \frac{V_{B}}{V_{A}} < \frac{(β_{A} - β_{B})^{2}}{V_{A}} \end{array}

as is stated in Equation 6.

Simulation of GxDiet effects on longevity in Drosophila

In Figure 4, we compare the effect estimates of Pallares et al. to ones we got in simulations of pervasive amplification. Here, we detail the simulation approach. We first generated true effects under each diet. For variants $j = 1, \dots, 50, 000$ , we sampled a true effect under the high-sugar diet ( $β_{h_{j}}$ ) and under the control diet ( $β_{c_{j}}$ ). A random 60% of variants were set to have no effect under either diet, with the effects of the remaining 40% of variants sampled as

[\begin{matrix} β_{c_{j}} \\ β_{h_{j}} \end{matrix}] \sim N ([\begin{matrix} - 0.125 \\ - 0.15 \end{matrix}], 0.01 \cdot [\begin{matrix} 1 & 1.4 \\ 1.4 & 1.96 \end{matrix}]) .

This corresponds to a systematic amplification of $1.4 \times$ in the high-sugar compared to the control diet. We selected these parameters based on inspection of the resulting distribution of effects and their correspondence to the Pallares et al. data. We then simulated the effect estimation. For each variant, the effect estimate was simulated as normally distributed with mean equal to the true effect and standard deviation equal to a randomly sampled (with replacement) standard error from the effect estimates of Pallares et al. That is, given the simulated values of the true effect estimates $β_{c_{j}}$ and $β_{h_{j}}$ , we simulated effect estimates as

[\begin{matrix} {\hat{β}}_{c_{j}} \\ {\hat{β}}_{h_{j}} \end{matrix}] \sim N ([\begin{matrix} β_{c_{j}} \\ β_{h_{j}} \end{matrix}], [\begin{matrix} {\hat{s}}_{c_{k}}^{2} & 0 \\ 0 & {\hat{s}}_{h_{k}}^{2} \end{matrix}]),

where $k$ represents the index of a randomly selected variant from the empirical data of Pallares et al. and ${\hat{s}}_{c_{k}}$ and ${\hat{s}}_{h_{k}}$ are the corresponding estimated standard errors for the effect estimates in the control and high-sugar groups, respectively. This process yielded vectors of estimated effects in the high-sugar group and control group, ${\hat{β}}_{h}$ and ${\hat{β}}_{c}$ , respectively, and vectors of estimated standard errors in the high-sugar group and control group, ${\hat{s}}_{h}$ and ${\hat{s}}_{c}$ , respectively. We then performed a Z-test for each variant under each diet, yielding two vectors of p-values, $p_{h}$ and $p_{c}$ , corresponding to the high-sugar and control diets, respectively. Using these p-values, we followed a similar approach to Pallares et al. to classify the variants (Figure 4D). First, as in Pallares et al., we computed q-values separately for each diet (Storey, 2003), yielding $q_{h}$ and $q_{c}$ , corresponding to the q-values of non-zero effects in the high-sugar and control diets, respectively. Then, we employed the following classification scheme for each variant $j = 1, \dots, 50, 000$ :

If $q_{h_{j}} \geq 0.01$ and $q_{c_{j}} \geq 0.01$ $\to$ classify as no effect.
If $q_{h_{j}} < 0.01$ and $p_{c_{j}} \geq 0.1$ $\to$ classify as high-sugar-specific effect.
If $q_{c_{j}} < 0.01$ and $p_{h_{j}} \geq 0.1$ $\to$ classify as control-specific effect.
If $q_{c_{j}} < 0.01$ and $q_{h_{j}} < 0.01$ $\to$ classify as shared effect.

We note that p-value and q-value cutoffs used are nominally different than those used in the Pallares et al. study.

Polygenic score simulations

In Figure 5, we show the results of multiple simulations where we compute polygenic scores in each of two contexts under amplification. Here, we detail the generation of data in the simulations and the methods for constructing polygenic scores. As in Results and discussion, we assumed that we have $n + m$ observations of a continuous trait, where the first $n$ individuals are observed in context $A$ and the final $m$ are observed in context $B$ . For convenience, in this case we assumed $n = m$ . Now, for variants $j = 1, \dots, p$ we generated true effects in contexts $A$ and $B$ independently from the mixture model

[\begin{matrix} β_{A_{j}} \\ β_{B_{j}} \end{matrix}] \sim π_{0} δ_{0} + (1 - π_{0}) (α N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}])) + (1 - α) N ([\begin{matrix} 0 \\ 0 \end{matrix}], [\begin{matrix} \frac{3}{2} & 1 \\ 1 & \frac{2}{3} \end{matrix}])),

where $π_{0}$ (which we set to $0.5$ ) represents the proportion of SNPs with null effects in both contexts, $α$ represents the proportion of non-null SNPs which have exactly equal effects in both contexts, and $1 - α$ is the proportion of non-null SNPs which are generated as perfectly correlated but with $1.5 \times$ the standard deviation in context A. Let ${\vec{β}}_{A}$ and ${\vec{β}}_{B}$ represent the resulting p-vectors of true effects for contexts $A$ and $B$ , respectively. Next, we generated genotype counts for each of the $n + m$ individuals at all $p$ variants. Specifically, we independently generated genotypes as

\begin{aligned} f_{j} & \sim \frac{1}{2} B e t a (s_{1}, s_{2}) for j = 1, \dots, p \end{aligned}

\begin{aligned} g_{i j} & \sim B i n o m i a l (2, f_{j}) for i = 1, \dots, n + m, \end{aligned}

where $f_{j}$ is the minor allele frequency at variant $j$ in the population, $s_{1}$ and $s_{2}$ are parameters controlling the distribution of minor allele frequencies in the population, and $g_{i j}$ is the observed genotype for individual $i$ at variant $j$ . Here, we set $s_{1} = 1$ and $s_{2} = 5$ . Let $G_{A}$ and $G_{B}$ represent the generated $n \times p$ matrices of genotypes in contexts A and B, respectively. Finally, we generated the observed continuous traits for context A ( ${\vec{y}}_{A}$ ) and context B ( ${\vec{y}}_{B}$ ) as ${\vec{y}}_{A} \sim N (G_{A} {\vec{β}}_{A}, σ_{A}^{2} I_{n})$ ${\vec{y}}_{B} \sim N (G_{B} {\vec{β}}_{B}, σ_{B}^{2} I_{m}),$

\begin{array}{ll} \vec{y} A \sim N (G_{A} {\vec{β}}_{A}, σ_{A}^{2} I_{n}) \\ \vec{y} B \sim N (G_{B} {\vec{β}}_{B}, σ_{B}^{2} I_{m}), \end{array}

where $σ_{A}^{2}$ and $σ_{B}^{2}$ are the observation variances in contexts A and B, respectively, and $I_{w}$ is the $w \times w$ identity matrix. In our simulations, we set $σ_{A}^{2}$ and $σ_{B}^{2}$ such that the narrow sense heritability is 40% in each context. So that we may later test the accuracy of our polygenic scores, we generated both a training set (consisting of $n$ individuals in each context, where n=1000 in the low power simulation and $n = 50, 000$ in the high power simulation) for effect estimation and a test set (consisting of 3000 individuals in each context) using the above distributions. Figure 5 compares four distinct approaches for constructing polygenic scores, derived from three allelic effect estimation approaches: additive estimation with shrinkage, GxE estimation with shrinkage, and mash. First, the additive and GxE estimates are derived independently for each variant as described in Results and discussion. Let ${\hat{β}}_{A}$ and ${\hat{β}}_{B}$ be the p-vectors of GxE estimates of effects in context $A$ and $B$ , respectively. Similarly, let ${\hat{s}}_{A}$ and ${\hat{s}}_{B}$ be the p-vectors of the standard errors of GxE estimates of effects in context A and B, respectively. Finally, let ${\hat{β}}_{A \cup B}$ be the p-vector of estimated effects from the additive model and ${\hat{s}}_{A \cup B}$ be the p-vector of standard errors of estimated effects from the additive model. Using the GxE estimates, we also constructed estimates of the effects in each context using mash. Specifically, we ran mash on the $n \times 2$ matrices $[\begin{matrix} {\hat{β}}_{A} & {\hat{β}}_{B} \end{matrix}]$ (of effects) and $[\begin{matrix} {\hat{s}}_{A} & {\hat{s}}_{B} \end{matrix}]$ (of standard errors). mash then yields $p ({\vec{β}}_{A} | {\hat{β}}_{A}, {\hat{s}}_{A})$ and $p ({\vec{β}}_{B} | {\hat{β}}_{B}, {\hat{s}}_{B})$ , the posterior distributions of the effects in contexts $A$ and $B$ , respectively. To construct each polygenic score, we made two choices. First, a choice between the three sets of p-values (or pseudo p-values, see below) for thresholding—we include the 833 (corresponding to one-third of the causal variants) most significant variants in the polygenic score. The second choice was between the three sets of effect estimates to be used as weights in the polygenic score (Figure 5). For instance, when the GxE model was used for ascertainment, we selected the set of variants $Ω_{A} \subset {1, \dots, p}$ consisting of the variants with the 833 smallest p-values and $Ω_{B} \subset {1, \dots, p}$ consisting of the variants with the 833 smallest p-values (derived from ${\hat{β}}_{B}$ and ${\hat{s}}_{B}$ ). Then, we predicted trait values (out of sample) by multiplying the effect estimates of our chosen ‘estimation method’ (for mash we use the posterior mean) by the effect allele count at each of the selected variants for the individual in question.

Appendix 1

1 Validating and applying the decision rule to real data

To validate the decision rule of Equations 4–6 of the main text, we considered the example of sex as a context for physiological traits using UK Biobank data (Bycroft et al., 2018). For each of 27 physiological quantitative traits, we first split the sample by sex chromosome karyotypes into XX individuals (females) and XY individuals (males). For each sex, we randomly split the sample into a training set with 80% of the individuals and a test set with 20% of the individuals. We estimated within-sex GWAS in the training and test sets separately. We denote by ${\hat{β}}_{Z_{i}}^{(t)}$ the marginal effect estimate of the ith SNP in sex $Z$ in set $t$ , and by ${\hat{s}}_{Z_{i}}^{(t)}$ the corresponding standard error. We denote by ${\hat{β}}_{{M \cup F}_{i}}^{(t r a i n)}$ the estimated effect of the ith SNP in the union of training sets (with both sexes included). In our validation procedure, we treated ${\hat{β}}_{M_{i}}^{(t e s t)}$ as the ‘ground-truth’, i.e., as the true effect. While ${\hat{β}}_{M_{i}}^{(t e s t)}$ is of course a very noisy estimate of $β_{M_{i}}$ , it is an unbiased estimate, so we expect that estimates that are closer to ${\hat{β}}_{M_{i}}^{(t e s t)}$ in the aggregate will also be closer to $β_{M_{i}}$ . Using this assumption, we define two empirical quantities,

\begin{aligned} e r r o r ({\hat{β}}_{M_{i}}^{(t r a i n)}) & = ({\hat{β}}_{M_{i}}^{(t r a i n)} - {\hat{β}}_{M_{i}}^{(t e s t)})^{2} \end{aligned}

\begin{aligned} e r r o r ({\hat{β}}_{{M \cup F}_{i}}^{(t r a i n)}) & = ({\hat{β}}_{{M \cup F}_{i}}^{(t r a i n)} - {\hat{β}}_{M_{i}}^{(t e s t)})^{2} . \end{aligned}

Given $\sqrt{V_{M_{i}}}$ and $| β_{M_{i}} - β_{F_{i}} |$ , Equation 4 allows us to calculate the expected difference between $e r r o r ({\hat{β}}_{M_{i}}^{(t r a i n)})$ and $e r r o r ({\hat{β}}_{{M \cup F}_{i}}^{(t r a i n)})$ . However, we do not observe $\sqrt{V_{M_{i}}}$ or $| β_{M_{i}} - β_{F_{i}} |$ . Instead, we estimated these quantities and examined the agreement of the estimates in the two sets. We estimated $\sqrt{V_{M_{i}}}$ as ${\hat{s}}_{M_{i}}^{(t r a i n)}$ . Estimating $| β_{M_{i}} - β_{F_{i}} |$ is more difficult, as the intuitive $| {\hat{β}}_{M_{i}}^{(t r a i n)} - {\hat{β}}_{F_{i}}^{(t r a i n)} |$ is an upwardly biased estimator. Instead, we used an empirical shrinkage estimator (ash Stephens, 2017) to estimate $| β_{M_{i}} - β_{F_{i}} |$ . In essence, ash takes in a vector of effect estimates $\hat{β}$ and a vector of corresponding standard errors $\hat{s}$ , estimates a prior distribution of the true effects , and then uses this prior to obtain a posterior distribution of true effects. Specifically, for a large random sample of sites, $i = 1, \dots, p$ , we ran ash on the vectors

\begin{array}{l} \hat{β} = ({\hat{β}}_{M_{1}}^{(t r a i n)} - {\hat{β}}_{F_{1}}^{(t r a i n)}, \dots, {\hat{β}}_{M_{p}}^{(t r a i n)} - {\hat{β}}_{F_{p}}^{(t r a i n)}) \\ \hat{s} = (\sqrt{{({\hat{s}}_{M_{1}}^{(t r a i n)})}^{2} + {({\hat{s}}_{F_{1}}^{(t r a i n)})}^{2}}, . . ., \sqrt{{({\hat{s}}_{M_{p}}^{(t r a i n)})}^{2} + {({\hat{s}}_{F_{p}}^{(t r a i n)})}^{2}}) . \end{array}

and took the absolute value of ash’s output posterior mean estimates of $β_{M_{i}} - β_{F_{i}}$ as our estimate of $| β_{M_{i}} - β_{F_{i}} |$ for each site. With our estimates of $| β_{M_{i}} - β_{F_{i}} |$ and $\sqrt{V_{M_{i}}}$ in hand, we calculated the expected difference in squared errors using Equation 4 (x-axis in Figure 3A) and compared it with the actual difference between $e r r o r ({\hat{β}}_{M_{i}}^{(t r a i n)})$ and $e r r o r ({\hat{β}}_{{M \cup F}_{i}}^{(t r a i n)})$ (Figure 1). For Figure 3A and B, we used the same procedure as above to estimate the x and y axes, except we did not employ any data splitting.

2 Equivalence of regression coefficient estimates for explicit interaction term and stratified model

In the main text, we discuss the estimation of context-dependent effects through the estimation of two effects in subsamples stratified by context. The generative model defined in Equation 1 is

y_{i} \sim {\begin{cases} N (α_{A} + β_{A} g_{i}, σ_{A}^{2}) & if i \in {1, \dots, n} \\ N (α_{B} + β_{B} g_{i}, σ_{B}^{2}) & if i \in {n + 1, \dots, n + m}, \end{cases}

where $β_{A}$ and $β_{B}$ are fixed, context-specific effects of a reference allele at a biallelic, autosomal variant $i$ , $g_{i} \in {0, 1, 2}$ is the observed reference allele count. $α_{A}$ and $α_{B}$ are the context-specific intercepts, corresponding to the mean trait value for individuals with zero reference alleles in context $A$ and $B$ , respectively. $σ_{A}^{2}$ and $σ_{B}^{2}$ are context-specific observation variances. Another common way to define a generative GxE model is with a linear genotype-by-context interaction term, namely,

y_{i} \sim N (α + β_{G} g_{i} + β_{E} e_{i} + β_{G x E} g_{i} e_{i}, σ^{2}) independently for i = 1, \dots, n + m,

where $α$ is the mean trait value for individuals with zero reference alleles in context $A$ , $β_{G}$ is the genetic effect, $β_{E}$ is the contextual effect, $β_{G x E}$ in the genotype-context interaction term, $e_{i}$ is the contextual covariate for individual $i$ , defined to be 0 if the individual is in context $A$ and 1 if the individual is in context $B$ , and $σ^{2}$ is the homoskedastic noise term. Here, we show that least squares estimation of the two models above context-specific effect estimates. The OLS estimate of the intercept and context-specific allelic effects under the model described in Equation S1 are

\begin{array}{l} {\hat{α}}_{A}, {\hat{β}}_{A} = {a r g m i n}_{α_{A}, β_{A}} \sum_{i = 1}^{n} (y_{i} - (α_{A} + β_{A} g_{i}))^{2} \end{array}

\begin{array}{l} {\hat{α}}_{B}, {\hat{β}}_{B} = {a r g m i n}_{α_{B}, β_{B}} \sum_{i = n + 1}^{n + m} (y_{i} - (α_{B} + β_{B} g_{i}))^{2} . \end{array}

The OLS estimates under the model described in Equation S2 are

\hat{α}, {\hat{β}}_{G}, {\hat{β}}_{E}, {\hat{β}}_{G x E} = {a r g m i n}_{α, β_{G}, β_{E}, β_{G x E}} \sum_{i = 1}^{n + m} (y_{i} - (α + β_{G} g_{i} + β_{E} e_{i} + β_{G x E} g_{i} e_{i}))^{2} .

Since $e_{i} = 1$ if individual $i$ is in context $B$ and 0 otherwise, we can re-write Equation S5 as

\begin{array}{ll} \hat{α}, {\hat{β}}_{G}, {\hat{β}}_{E}, {\hat{β}}_{G x E} = {a r g m i n}_{α, β_{G}, β_{E}, β_{G x E}} (\sum_{i = 1}^{n} (y_{i} - (α + β_{G} g_{i}))^{2} + \sum_{i = n + 1}^{n + m} (y_{i} - (α + β_{G} g_{i} + β_{E} + β_{G x E} g_{i}))^{2}) \\ = {a r g m i n}_{α, β_{G}, β_{E}, β_{G x E}} (\sum_{i = 1}^{n} (y_{i} - (α + β_{G} g_{i}))^{2} + \sum_{i = n + 1}^{n + m} (y_{i} - ((α + β_{E}) + (β_{G} + β_{G x E}) g_{i}))^{2}) . \end{array}

By Equation S3 we have that $\sum_{i = 1}^{n} (y_{i} - (α + β_{G} g_{i}))^{2}$ is minimized by setting $α = {\hat{α}}_{A}$ and $β_{G} = {\hat{β}}_{A}$ . In turn, by Equation S4 we have that $\sum_{i = n + 1}^{n + m} (y_{i} - ((α + β_{E}) + (β_{G} + β_{G x E}) g_{i}))^{2}$ is minimized by setting $α + β_{E} = {\hat{α}}_{B}$ and setting $β_{G} + β_{G x E} = {\hat{β}}_{B}$ . Thus, both terms are simultaneously minimized by setting

\begin{aligned} α & = {\hat{α}}_{A} \end{aligned}

\begin{aligned} β_{G} & = {\hat{β}}_{A} \end{aligned}

\begin{aligned} β_{E} & = {\hat{α}}_{B} - {\hat{α}}_{A} \end{aligned}

\begin{aligned} β_{G x E} & = {\hat{β}}_{B} - {\hat{β}}_{A} . \end{aligned}

Then, under the explicit interaction model, for an individual in context $A$ the estimated intercept is ${\hat{α}}_{A}$ and the estimated genetic effect is ${\hat{β}}_{A}$ . And, for an individual in context $B$ , the estimated intercept is ${\hat{α}}_{A} + {\hat{α}}_{B} - {\hat{α}}_{A} = {\hat{α}}_{B}$ and the estimated genetic effect is ${\hat{β}}_{A} + {\hat{β}}_{B} - {\hat{β}}_{A} = {\hat{β}}_{B}$ . Thus, conditional on the context, both the explicit interaction model and the stratified model provide the same effect and intercept estimates. We note that there is still an important difference between the stratified and explicit interaction models, as the stratified model allows for heteroskedasticity between contexts where the explicit interaction model does not. We also note that the proof of equivalence relies on the context variable being binary.

3 Continuous context variable

Here, we extend our analysis of single-site estimation in a binary context to the case of a continuous context variable. Specifically, for individuals indexed by $i = 1, \dots, n$ , we consider the generative model

y_{i} \sim N (β_{0} + β_{G} g_{i} + β_{E} e_{i} + β_{G \times E} g_{i} e_{i}, σ^{2}),

where $y_{i} \in R$ is the continuous trait value for individual $i$ , $g_{i} \in {0, 1, 2}$ is the number of reference alleles carried by a diploid individual $i$ at a focal biallelic variant of interest, $e_{i} \in R$ is the continuous context for individual $i$ , $β_{0}$ is the the mean trait value for individuals with the context variable value of zero and zero reference alleles, $β_{G}$ is the main (additive) allelic effect, $β_{E}$ is the effect of the context, $β_{G \times E}$ is the main interaction effect, and $σ^{2}$ is the residual variance. For simplicity, we will assume $β_{0} = 0$ . For a binary context, we considered the difference in MSE of the ‘additive model’ and ‘GxE model’ in estimating the context-specific genetic effect. Analogously, for a continuous context we will consider the difference in MSE in estimating the total genetic effect conditional on a particular value of the context variable. To perform this analysis, we will consider the estimation of two models. The first model, which we will call the ‘GxE’ model, is described in Equation S6 above (where we omit the intercept because we assume it is 0). For convenience, we will use the following matrix notation:

y = [\begin{matrix} y_{1} \\ ⋮ \\ y_{n} \end{matrix}], X = [\begin{matrix} g_{1} & e_{1} & g_{1} e_{1} \\ ⋮ & ⋮ & ⋮ \\ g_{n} & e_{n} & g_{n} e_{n} \end{matrix}], β = [\begin{matrix} β_{G} \\ β_{E} \\ β_{G \times E} \end{matrix}] .

Then, we consider the standard least squares estimate of $\hat{β} = (X^{T} X)^{- 1} X^{T} y$ , for which

\hat{β} \sim N (β, σ^{2} (X^{T} X)^{- 1}) .

We will assume that the vector of genotypes $\vec{g}$ and the vector of context covariates $\vec{e}$ are orthogonal in our sample and are both mean centered. (In the following section, we discuss the case of gene-context correlation, where these assumptions are not met.) Then, the matrix $X^{T} X$ will be approximately diagonal. Under this approximation, we can write Equation S7 as

\hat{β} \sim N (β, Σ),

where

Σ \approx σ^{2} diag (\frac{1}{\sum_{i = 1}^{n} g_{i}^{2}}, \frac{1}{\sum_{i = 1}^{n} e_{i}^{2}}, \frac{1}{\sum_{i = 1}^{n} g_{i}^{2} e_{i}^{2}}) .

Now, we are interested in the error of the GxE model in estimating the genetic effect conditional on a particular value of the context $e$ . By Equation S6, we define the true genetic effect conditional on a particular value of the context $e$ as $β_{G} + β_{G \times E} e$ . Thus, to evaluate the error of the GxE model, the conditional expected error is

E [(({\hat{β}}_{G} + {\hat{β}}_{G \times E} e) - (β_{G} + β_{G \times E} e))^{2} | e] .

By the definition of conditional variance, we can write Equation S9 as

\begin{array}{ll} V a r [(β_{G} + β_{G \times E} e) - ({\hat{β}}_{G} + {\hat{β}}_{G \times E} e) | e] + (E [(β_{G} + β_{G \times E} e) - ({\hat{β}}_{G} + {\hat{β}}_{G \times E} e) | e])^{2} \\ = V a r [(β_{G} + β_{G \times E} e) - ({\hat{β}}_{G} + {\hat{β}}_{G \times E} e) | e] (since {\hat{β}}_{G} and {\hat{β}}_{G \times E} are unbiased estimates) \\ = V a r [{\hat{β}}_{G} + {\hat{β}}_{G \times E} e | e] (since β_{G} and β_{G \times E} are fixed parameters) \end{array}

\begin{aligned} \approx \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2}} + \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2} e_{i}^{2}} e^{2} \end{aligned}

(by Equation S8).

We will also consider the estimation error associated with the ‘additive model’, which in this case estimates the model described in Equation S6 assuming that $β_{G \times E} = 0$ . In effect, this assumes that the genetic effect is equal across all values of the context. Because $X^{T} X$ is approximately diagonal matrix, removing the $G \times E$ covariate and regression coefficient has a negligible effect on both the least squares estimates of $β_{G}$ and its sampling distribution. Thus, we can approximate the error of the additive model by calculating the conditional expectation

E [((β_{G} + β_{G \times E} e) - {\hat{β}}_{G})^{2} | e]

under the model in Equation S6. Again, by the definition of the conditional expectation, we can write Equation S10 as

\begin{array}{ll} V a r [(β_{G} + β_{G \times E} e) - {\hat{β}}_{G} | e] + (E [(β_{G} + β_{G \times E} e) - {\hat{β}}_{G} | e])^{2} \\ = \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2}} + β_{G \times E}^{2} e^{2} . \end{array}

Thus, conditional on a particular value of $e$ , we prefer the $G \times E$ estimator for estimating the genetic effect if and only if

\begin{matrix} \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2}} + \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2} e_{i}^{2}} e^{2} < \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2}} + β_{G \times E}^{2} e^{2} \end{matrix}

\begin{matrix} ⟺ & \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2} e_{i}^{2}} e^{2} - β_{G \times E}^{2} e^{2} < 0 \end{matrix}

\begin{matrix} ⟺ & \frac{σ^{2}}{\sum_{i = 1}^{n} g_{i}^{2} e_{i}^{2}} < β_{G \times E}^{2} \end{matrix}

This decision rule has a very intuitive interpretation: the $G \times E$ model is preferable when the estimation variance of the interaction term is smaller than the squared interaction effect size. While the exact difference in MSE between the additive and $G \times E$ models depends on the value of the context variable (see Equation S11), the decision rule (Equation S13) does not.

4 Gene-context correlation

The simple decision rule given in Equation S13 was derived using the assumption that the observed genotype vector and the observed context vector were orthogonal. In statistical terms, one may expect this condition to approximately hold when sampling from a population where $r (G, E)$ , the correlation between genotypes and the context, is 0. In general, this assumption may not be reasonable. When this assumption is violated, the design matrix $X$ will no longer form a diagonal matrix when left-multiplied by its transpose. This results in both (a) larger standard errors for each of the marginal coefficients and (b) non-zero sampling covariance between the coefficients. In addition, $| r (G, E) | > 0$ may result in omitted variable bias in the additive model. As a result, the decision rule of Equation S13 will not hold exactly under conditions of $| r (G, E) | > 0$ , and in fact may be far from correct when $| r (G, E) |$ is large. In general, creating a closed-form decision rule to choose between the GxE and additive models in this scenario is not possible. However, in this section we use a simulation study to gain intuition about the change in bias-variance trade-off in the presence of gene-context correlation. In our simulations, for a group of $n$ individuals, we generate genotypes as

g_{1}, \dots, g_{n} \overset{i i d}{\sim} B i n o m i a l (2, p) .

Then, for some constant $c$ , we generate context covariates as

e_{i} \sim N (c \cdot g_{i}, σ_{E}^{2})

independently for $i = 1, \dots, n$ . Finally, we generate trait values $y_{1}, \dots, y_{n}$ using Equation S6 for particular values of the remaining parameters. To explore how the bias-variance trade-off changes for different values of $r (G, E)$ , we ran a set of simulations with different values of $c$ in the range of $- 3.5$ to $3.5$ . For each of these simulations, we generated a dataset of $n = 250$ individuals with $σ_{E}^{2} = 1$ , $p = 0.1$ , $β_{0} = 0$ , $β_{G} = 0.25$ , $β_{E} = 0$ , $β_{G \times E} = - .125$ , and $σ_{y}^{2} = 7$ . The results of the simulation are shown in Appendix 1—figure 6. Non-zero $r (G, E)$ had a number of consequences. First, as the magnitude of $r (G, E)$ increases, the magnitude of the bias of the genetic effect estimate of the additive model also increases (Appendix 1—figure 6A). In our simulation, we set the genetic effect and the interaction effect to have opposite signs. When $r (G, E)$ is positive, it pulls the genetic effect estimate toward the interaction effect. In contrast, when $r (G, E)$ is negative, it pulls the genetic effect estimate away from the interaction effect. Since the $G \times E$ model includes all causal variables, it is unbiased. Second, as the magnitude of $r (G, E)$ increases, the variance of the genetic effect estimates of both the additive and $G \times E$ models increase (Appendix 1—figure 6B). This is because when variables in a regression are (positively or negatively) correlated, it is more difficult to distinguish between their effects. The $G \times E$ model appears to be affected more strongly here, which likely results from the inclusion of an additional interaction term that is highly correlated with both the genotype covariate and the context covariate. Third, the interaction term in the $G \times E$ model remains unbiased under $r (G, E)$ , and increases in variance with larger magnitude of $r (G, E)$ (Appendix 1—figure 6C and D).

5 Classification-based inference in Pallares et al., 2022

In the main text, we discuss the characterization of GxE by Pallares et al., based on significance testing under each of the diets. The authors classified variants according to whether or not their associations with survivorship were significant under each diet as follows:

Significant under neither diet $\to$ classify as no effect.
Significant when fed the high-sugar diet, but not when fed the control diet $\to$ classify as high-sugar-specific effect.
Significant when fed the control diet, but not when fed the high-sugar diet $\to$ classify as control-specific effect.
Significant under both diets $\to$ classify as shared effect.

Approximately 31% were high-sugar specific, while the remaining 69% of the variants were shared. Fewer than 1% were labeled as having control-specific effects. The authors concluded that high-sugar-specific effects on longevity are pervasive, compatible with the hypothesis of widespread cryptic genetic variation for longevity. This ‘top hits’ approach places an emphasis on the context(s) in which trait associations are statistically significant, rather than on estimating how the context-specific allelic effects covary. In addition, this particular classification system also does not cover all possible ways in which context-specific effects may differ. A key example is the case where true effects are concordant in sign but differ in magnitude. The strong positive covariance of estimated effects observed genome-wide (Figure 4A) suggests this case merits consideration. Such variants may fall into each of the existing categories, depending on the magnitude of effects and statistical power in each of the contexts. Three of the four possible classifications are clearly wrong, but what about the ‘shared’ category? The class ‘shared’ may be interpreted as suggesting lack of context dependency (Figure 1C in Pallares et al., 2023). However, it will tend to include variants having strong effects under both diets, regardless of whether or not the diet-specific effects are similar. As Pallares et al. also note, there are marked differences in the magnitude of diet-specific estimated effects of variants in the ‘shared’ category. Among the approximately 1500 variants labeled as shared, the estimated effect under the high-sugar diet is on average about $1.3 \times$ that of the estimated effect in the control diet. Notably, the classification as diet-specific does not imply that a variant has an unusually large effect under this diet. On average, variants classified as shared actually have a slightly larger estimated effect under the high-sugar diet than variants classified as high-sugar specific (t-test p= $1.6 \times 10^{- 5}$ ). Thus, instead of suggesting little-to-no effect under the control diet and a large effect under the high-sugar diet (as predicted by the hypothesis that cryptic genetic variation is pervasive), the classification as high-sugar specific may commonly just point us to intermediate size effects—large enough to be significant in the systematically larger effects context (where power is higher) yet too small to be significant in the systematically smaller effects context (where power is lower).

6 Re-analysis of GxE in the Pallares et al. experiment

In the main text, we show that the classification of significantly associated variants in Pallares et al. is consistent with pervasive amplification, despite the fact that this mode of covariance was not one of the generative modes considered by the authors. Here, we re-estimate the modes of covariance of genetic effects under the two diets, using an approach adapted from the one we have previously used (Zhu et al., 2023). Specifically, we used the framework of ‘multivariate adaptive shrinkage’ (mash) to model the covariance of effects between the high-sugar and control diets across all variants. We fit a multivariate normal mixture model to the summary statistics of Pallares et al. (Urbut et al., 2019). In short, mash takes in a set of multidimensional effect estimates and standard errors and estimates the true underlying distribution of effects as a mixture of zero-centered multivariate normal distributions and a point mass at 0. Each mixture component in the model must be pre-specified before the fitting procedure. Following (Zhu et al., 2023), we pre-specified a dense grid of covariance matrices to be input to mash. In particular, each covariance matrix is of the form

[\begin{matrix} σ_{c}^{2} & ρ σ_{h s}^{2} σ_{c}^{2} \\ ρ σ_{h s}^{2} σ_{c}^{2} & σ_{h s}^{2} \end{matrix}],

where $σ_{c}^{2}$ is the variance of the true effects in the control group, $σ_{h s}^{2}$ is the variance of the true effects in the high-sugar group, and $ρ$ is the correlation between the effects in the high-sugar and control groups. To specify the set of covariance matrices, we formed a grid encompassing possible values of the correlation of allelic effects across diets and ratio of variances under each of the diets. Specifically, we varied $ρ$ in the set of values $(- 1, - \frac{3}{4}, - \frac{1}{2}, - \frac{1}{4}, 0, \frac{1}{4}, \frac{1}{2}, \frac{3}{4}, 1)$ , and we varied the ratio of the variances, $\frac{σ_{c}^{2}}{σ_{h s}^{2}}$ on a logarithmic scale in the range 0.5 and 1.5. Taking the Cartesian product of these sets yielded a grid of covariance matrices. Because the number of pre-specified covariance matrices was large, and to avoid overfitting, we used a forward stepwise selection procedure (Hastie et al., 2009). We first started with a model with only the null covariance matrix (i.e. no effect under either diet), and then added one covariance matrix to the mixture in a greedy manner in each step, by searching over the space of all covariance matrices for a matrix that maximally improves the likelihood of the mixture model. In each step, we either decide to include the new covariance matrix in the model and move to the next step, or instead stop the procedure and not include the new matrix, if the improvement in likelihood compared to the previous step was below a pre-specified threshold. This threshold was determined by conducting a level $α = 0.05$ likelihood-ratio test. To mitigate possible linkage disequilibrium among the input SNPs, we performed the procedure described above on a subset roughly 12K out of the roughly 270K variants in the data. These SNPs each came as a random sample from 12K roughly evenly spaced chromosomal blocks (in terms of physical distance). Based on the output of this procedure—the mixture weights in the model ultimately chosen—92% of variants have non-zero effects under both diets but larger effects under the high-sugar diet (Figure 4). This suggests that instead of affecting just a small subset of variants, a high-sugar diet amplifies the effects of the vast majority of variants on lifespan. Moreover, mash assigned zero weight to covariance matrices where an effect is non-zero in one context but zero in another.

Appendix 1—figure 1

Download asset Open asset

Out-of-sample validation of decision rule for sex-stratified genome-wide association studies (GWAS) for 27 complex traits.

For each trait, we use the procedure specified in the section Validating and applying the decision rule to real data. The x-axis shows the expected mean squared error (MSE) advantage of the additive estimator in predicting the male effect based on the decision rule, where the y-axis shows the actual MSE advantage of the additive model in predicting the male effect calculated using an independent sample. Values on the x-axis are grouped into nine evenly spaced bins, and the y-axis shows bin averages (with error bars indicating 1 standard deviation, where the sample size is equal to the number of SNPs falling into each bin). For ease of viewing, data above the 99th percentile and below the 1st percentile on the x-axis are removed.

Appendix 1—figure 2

Download asset Open asset

Bias-variance trade-off at random variants in a sex-stratified genome-wide association studies (GWAS).

This figure extends Figure 3A of the main text to all physiological traits analyzed. The x-axis shows the estimated absolute difference between the effect of variants in males and females. The y-axis shows the measured standard error for each variant in males, the focal context here. The dashed line shows the decision boundary for effect estimation in males. The difference in mean squared error (MSE) between estimation methods increases linearly with distance from the dashed line, as in Figure 2. If a variant falls above (below) the line, the additive (gene-by-environment interaction [GxE]) estimator has a lower MSE.

Appendix 1—figure 3

Download asset Open asset

Bias-variance trade-off at genome-wide significant variants (p-value $< 5 \times 10^{- 8}$ in males) in a sex-stratified genome-wide association studies (GWAS).

This figure extends Figure 3B of the main text to all physiological traits analyzed. The x-axis shows the estimated absolute difference between the effect of variants in males and females. The y-axis shows the measured standard error for each variant in males, the focal context here. The dashed line shows the decision boundary for effect estimation in females. The difference in mean squared error (MSE) between estimation methods increases linearly with distance from the dashed line, as in Figure 2. If a variant falls above (below) the line, the additive (gene-by-environment interaction [GxE]) estimator has a lower MSE.

Appendix 1—figure 4

Download asset Open asset

Estimated mixture weights on covariance of true effects in Pallares et al.

The x-axis shows (symmetric) variance-covariance matrices for true effects in the high-sugar and control diets. The variance-covariance matrices displayed are the only matrices to which *mash* assigned non-zero weight (from a much larger set of possible covariance matrices, following a variable selection procedure). Variance-covariance matrices are scaled by a constant for each of interpretability. Abbreviations: C–control diet; HS–high-sugar diet.

Appendix 1—figure 5

Download asset Open asset

Polygenic score performance for context-aware models with greater amplification than in the main text.

This figure uses the same simulation parameters as Figure 4 of the main text, except that variant effects are amplified $1.5 \times$ instead of $1.4 \times$ . In each simulation, a genome-wide association study (GWAS) is performed on 5000 biallelic variants, half of which have no effect in either context. Of the other half, some percent of the variants (indicated on the x-axis) had effects $1.5 \times$ larger in one of contexts and the remaining SNPs had equal effects in both contexts. The broad sense heritability was set to $0.4$ in all simulations. The y-axis shows the average, over 1000 simulations, of the out-of-sample Pearson correlation between polygenic score and trait value. (A) Results with a GWAS sample size of 1000 individuals. (B) Results with a GWAS size of $50, 000$ individuals. We note that unlike in Figure 5, for amplification percentages greater than or equal to 75%, the strategy using ascertainment based on the additive model with gene-by-environment interactions (GxE) effect estimation (orange) outperforms the strategy of using the additive model for both tasks (red).

Appendix 1—figure 6

Download asset Open asset

Model performance under correlation between genotype and context.

(A) Bias of genetic effect estimate across different values of correlation. (B) Variance of genetic effect estimate across different values of correlation. (C) Bias of term estimating the interaction between genotype and context in an interaction model. (D) Variance of term estimating the interaction between genotype and context in an interaction model.

Data availability

All data used in this work was available via previously published studies.

The following previously published data sets were used

1. Zhu C
2. Ming MJ
3. Cole JM
4. Edge MD
5. Kirkpatrick M
6. Harpak A
(2023) Zenodo
Additive summary statistics.

https://doi.org/10.5281/zenodo.7508246
1. Pallares LF
2. Lea AJ
3. Han C
4. Filippova EV
5. Andolfatto P
6. Ayroles JF
(2022) NCBI BioProject
ID PRJNA725602. Dietary stress remodels the genetic architecture of lifespan variation in outbred Drosophila.

https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA725602
1. Zhu C
2. Ming MJ
3. Cole JM
4. Edge MD
5. Kirkpatrick M
6. Harpak A
(2022) Zenodo
Sex-specific summary statistics.

https://doi.org/10.5281/zenodo.7222725

References

1. Aschard H
2. Tobin MD
3. Hancock DB
4. Skurnik D
5. Sood A
6. James A
7. Vernon Smith A
8. Manichaikul AW
9. Campbell A
10. Prins BP
11. Hayward C
12. Loth DW
13. Porteous DJ
14. Strachan DP
15. Zeggini E
16. O’Connor GT
17. Brusselle GG
18. Boezen HM
19. Schulz H
20. Deary IJ
21. Hall IP
22. Rudan I
23. Kaprio J
24. Wilson JF
25. Wilk JB
26. Huffman JE
27. Hua Zhao J
28. de Jong K
29. Lyytikäinen L-P
30. Wain LV
31. Jarvelin M-R
32. Kähönen M
33. Fornage M
34. Polasek O
35. Cassano PA
36. Barr RG
37. Rawal R
38. Harris SE
39. Gharib SA
40. Enroth S
41. Heckbert SR
42. Lehtimäki T
43. Gyllensten U
44. Jackson VE
45. Gudnason V
46. Tang W
47. Dupuis J
48. Soler Artigas M
49. Joshi AD
50. London SJ
51. Kraft P
52. Understanding Society Scientific Group
(2017) Evidence for large-scale gene-by-smoking interaction effects on pulmonary function
International Journal of Epidemiology 46:894–904.

https://doi.org/10.1093/ije/dyw318
- PubMed
- Google Scholar
(2021) An integrated approach to identify environmental modulators of genetic risk factors for complex traits
American Journal of Human Genetics 108:1866–1879.

https://doi.org/10.1016/j.ajhg.2021.08.014
- PubMed
- Google Scholar
(2021) Sex differences in genetic architecture in the UK Biobank
Nature Genetics 53:1283–1289.

https://doi.org/10.1038/s41588-021-00912-0
- PubMed
- Google Scholar
(2017) An expanded view of complex traits: from polygenic to omnigenic
Cell 169:1177–1186.

https://doi.org/10.1016/j.cell.2017.05.038
- PubMed
- Google Scholar
(2016) Transethnic genetic-correlation estimates from summary statistics
American Journal of Human Genetics 99:76–88.

https://doi.org/10.1016/j.ajhg.2016.05.001
- PubMed
- Google Scholar
1. Bycroft C
2. Freeman C
3. Petkova D
4. Band G
5. Elliott LT
6. Sharp K
7. Motyer A
8. Vukcevic D
9. Delaneau O
10. O’Connell J
11. Cortes A
12. Welsh S
13. Young A
14. Effingham M
15. McVean G
16. Leslie S
17. Allen N
18. Donnelly P
19. Marchini J
(2018) The UK Biobank resource with deep phenotyping and genomic data
Nature 562:203–209.

https://doi.org/10.1038/s41586-018-0579-z
- PubMed
- Google Scholar
Book
1. Casella G
2. Berger RL
(2021)
Statistical Inference

Cengage Learning.
- Google Scholar
(2013) Genotype-by-environment interaction and plasticity: exploring genomic responses of plants to the abiotic environment
Annual Review of Ecology, Evolution, and Systematics 44:5–29.

https://doi.org/10.1146/annurev-ecolsys-110512-135806
- Google Scholar
1. Duncan LE
2. Keller MC
(2011) A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry
The American Journal of Psychiatry 168:1041–1049.

https://doi.org/10.1176/appi.ajp.2011.11020191
- PubMed
- Google Scholar
(2022) Correlations between complex human phenotypes vary by genetic background, gender, and environment
Cell Reports. Medicine 3:100844.

https://doi.org/10.1016/j.xcrm.2022.100844
- PubMed
- Google Scholar
(2014) Genotype×environment interaction QTL mapping in plants: lessons from Arabidopsis
Trends in Plant Science 19:390–398.

https://doi.org/10.1016/j.tplants.2014.01.001
- PubMed
- Google Scholar
Book
1. Falconer DS
2. Mackay TFC
(1996)
Introduction to Quantitative Genetics

Longman.
- Google Scholar
Book
1. Fisher RA
(1930)
The Genetical Theory of Natural Selection

Clarendon Press.
- Google Scholar
1. Ge T
2. Chen CY
3. Neale BM
4. Sabuncu MR
5. Smoller JW
(2017) Phenome-wide heritability analysis of the UK Biobank
PLOS Genetics 13:e1006711.

https://doi.org/10.1371/journal.pgen.1006711
- PubMed
- Google Scholar
1. Gibson G
2. Dworkin I
(2004) Uncovering cryptic genetic variation
Nature Reviews. Genetics 5:681–690.

https://doi.org/10.1038/nrg1426
- PubMed
- Google Scholar
1. Gibson G
2. Lacek KA
(2020) Canalization and robustness in human genetics and disease
Annual Review of Genetics 54:189–211.

https://doi.org/10.1146/annurev-genet-022020-022327
- PubMed
- Google Scholar
Book
(2009) The elements of statistical learning
Springer.

https://doi.org/10.1007/978-0-387-84858-7
- Google Scholar
(2008) Data and theory point to mainly additive genetic variance for complex traits
PLOS Genetics 4:e1000008.

https://doi.org/10.1371/journal.pgen.1000008
- PubMed
- Google Scholar
1. Kraft P
2. Aschard H
(2015) Finding the missing gene-environment interactions
European Journal of Epidemiology 30:353–355.

https://doi.org/10.1007/s10654-015-0046-1
- PubMed
- Google Scholar
1. Liu X
2. Li YI
3. Pritchard JK
(2019) Trans effects on gene expression can drive omnigenic inheritance
Cell 177:1022–1034.

https://doi.org/10.1016/j.cell.2019.04.014
- PubMed
- Google Scholar
Preprint
1. Miao J
2. Song G
3. Wu Y
4. Hu J
5. Wu Y
6. Basu S
7. Andrews JS
8. Schaumberg K
9. Fletcher JM
10. Schmitz LL
11. Lu Q
(2022) Reimagining gene-environment interaction analysis for human complex traits
bioRxiv.

https://doi.org/10.1101/2022.12.11.519973
- Google Scholar
(2014) Practitioner review: A critical perspective on gene-environment interaction models--what impact should they have on clinical perceptions and practice?
Journal of Child Psychology and Psychiatry, and Allied Disciplines 55:1092–1101.

https://doi.org/10.1111/jcpp.12261
- PubMed
- Google Scholar
1. Paaby AB
2. Rockman MV
(2014) Cryptic genetic variation: evolution’s hidden substrate
Nature Reviews. Genetics 15:247–258.

https://doi.org/10.1038/nrg3688
- PubMed
- Google Scholar
1. Paaby AB
2. Gibson G
(2016) Cryptic genetic variation in evolutionary developmental genetics
Biology 5:28.

https://doi.org/10.3390/biology5020028
- PubMed
- Google Scholar
1. Pallares LF
2. Lea AJ
3. Han C
4. Filippova EV
5. Andolfatto P
6. Ayroles JF
(2023) Dietary stress remodels the genetic architecture of lifespan variation in outbred Drosophila
Nature Genetics 55:123–129.

https://doi.org/10.1038/s41588-022-01246-1
- PubMed
- Google Scholar
1. Patel RA
2. Musharoff SA
3. Spence JP
4. Pimentel H
5. Tcheandjieu C
6. Mostafavi H
7. Sinnott-Armstrong N
8. Clarke SL
9. Smith CJ
10. Durda PP
11. Taylor KD
12. Tracy R
13. Liu Y
14. Johnson WC
15. Aguet F
16. Ardlie KG
17. Gabriel S
18. Smith J
19. Nickerson DA
20. Rich SS
21. Rotter JI
22. Tsao PS
23. Assimes TL
24. Pritchard JK
25. V.A. Million Veteran Program
(2022) Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits
American Journal of Human Genetics 109:1286–1297.

https://doi.org/10.1016/j.ajhg.2022.05.014
- PubMed
- Google Scholar
Preprint
1. Schwaba T
2. Mallard TT
3. Maihofer AX
4. Rhemtulla M
5. Lee PH
6. Smoller JW
7. Davis LK
8. Nivard MG
9. Grotzinger AD
10. Tucker-Drob EM
(2023) Comparison of the multivariate genetic architecture of eight major psychiatric disorders across sex
medRxiv.

https://doi.org/10.1101/2023.05.25.23290545
- Google Scholar
1. Sella G
2. Barton NH
(2019) Thinking about the evolution of complex traits in the era of genome-wide association studies
Annual Review of Genomics and Human Genetics 20:461–493.

https://doi.org/10.1146/annurev-genom-083115-022316
- PubMed
- Google Scholar
(2016) Contrasting the genetic architecture of 30 complex traits from summary association data
American Journal of Human Genetics 99:139–153.

https://doi.org/10.1016/j.ajhg.2016.05.013
- PubMed
- Google Scholar
(2021) GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background
eLife 10:e58615.

https://doi.org/10.7554/eLife.58615
- PubMed
- Google Scholar
1. Smith EN
2. Kruglyak L
(2008) Gene-environment interaction in yeast gene expression
PLOS Biology 6:e83.

https://doi.org/10.1371/journal.pbio.0060083
- PubMed
- Google Scholar
Preprint
1. Smith SP
2. Smith OS
3. Mostafavi H
4. Peng D
5. Berg JJ
6. Edge MD
7. Harpak A
(2025) A litmus test for confounding in polygenic scores
bioRxiv.

https://doi.org/10.1101/2025.02.01.635985
- Google Scholar
Preprint
(2022) A flexible modeling and inference framework for estimating variant effect sizes from GWAS summary statistics
bioRxiv.

https://doi.org/10.1101/2022.04.18.488696
- Google Scholar
(2021) Natural variation in the regulation of neurodevelopmental genes modifies flight performance in Drosophila
PLOS Genetics 17:e1008887.

https://doi.org/10.1371/journal.pgen.1008887
- PubMed
- Google Scholar
1. Stephens M
(2017) False discovery rates: a new deal
Biostatistics 18:275–294.

https://doi.org/10.1093/biostatistics/kxw041
- PubMed
- Google Scholar
1. Storey JD
(2003) The positive false discovery rate: a Bayesian interpretation and the q-value
The Annals of Statistics 31:2013–2035.

https://doi.org/10.1214/aos/1074290335
- Google Scholar
(2022) Sex-heterogeneous SNPs disproportionately influence gene expression and health
PLOS Genetics 18:e1010147.

https://doi.org/10.1371/journal.pgen.1010147
- PubMed
- Google Scholar
(2018) Multi-trait analysis of genome-wide association summary statistics using MTAG
Nature Genetics 50:229–237.

https://doi.org/10.1038/s41588-017-0009-4
- PubMed
- Google Scholar
(2019) Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions
Nature Genetics 51:187–195.

https://doi.org/10.1038/s41588-018-0268-8
- PubMed
- Google Scholar
Preprint
(2023) Causal interpretations of family GWAS in the presence of heterogeneous effects
bioRxiv.

https://doi.org/10.1101/2023.11.13.566950
- Google Scholar
1. Vieira C
2. Pasyukova EG
3. Zeng ZB
4. Hackett JB
5. Lyman RF
6. Mackay TFC
(2000) Genotype-environment interaction for quantitative trait loci affecting life span in Drosophila melanogaster
Genetics 154:213–227.

https://doi.org/10.1093/genetics/154.1.213
- PubMed
- Google Scholar
Preprint
1. Wang JY
2. Lin N
3. Zietz M
4. Mares J
5. Narasimhan VM
6. Rathouz PJ
7. Harpak A
(2024) Three open questions in polygenic score portability
bioRxiv.

https://doi.org/10.1101/2024.08.20.608703
- Google Scholar
1. Wray NR
2. Wijmenga C
3. Sullivan PF
4. Yang J
5. Visscher PM
(2018) Common disease is more complex than implied by the core gene omnigenic model
Cell 173:1573–1580.

https://doi.org/10.1016/j.cell.2018.05.051
- PubMed
- Google Scholar
1. Wright KM
2. Deighan AG
3. Di Francesco A
4. Freund A
5. Jojic V
6. Churchill GA
7. Raj A
(2022) Age and diet shape the genetic architecture of body weight in diversity outbred mice
eLife 11:e64329.

https://doi.org/10.7554/eLife.64329
- PubMed
- Google Scholar
1. Yengo L
2. Vedantam S
3. Marouli E
4. Sidorenko J
5. Bartell E
6. Sakaue S
7. Graff M
8. Eliasen AU
9. Jiang Y
10. Raghavan S
11. Miao J
12. Arias JD
13. Graham SE
14. Mukamel RE
15. Spracklen CN
16. Yin X
17. Chen S-H
18. Ferreira T
19. Highland HH
20. Ji Y
21. Karaderi T
22. Lin K
23. Lüll K
24. Malden DE
25. Medina-Gomez C
26. Machado M
27. Moore A
28. Rüeger S
29. Sim X
30. Vrieze S
31. Ahluwalia TS
32. Akiyama M
33. Allison MA
34. Alvarez M
35. Andersen MK
36. Ani A
37. Appadurai V
38. Arbeeva L
39. Bhaskar S
40. Bielak LF
41. Bollepalli S
42. Bonnycastle LL
43. Bork-Jensen J
44. Bradfield JP
45. Bradford Y
46. Braund PS
47. Brody JA
48. Burgdorf KS
49. Cade BE
50. Cai H
51. Cai Q
52. Campbell A
53. Cañadas-Garre M
54. Catamo E
55. Chai J-F
56. Chai X
57. Chang L-C
58. Chang Y-C
59. Chen C-H
60. Chesi A
61. Choi SH
62. Chung R-H
63. Cocca M
64. Concas MP
65. Couture C
66. Cuellar-Partida G
67. Danning R
68. Daw EW
69. Degenhard F
70. Delgado GE
71. Delitala A
72. Demirkan A
73. Deng X
74. Devineni P
75. Dietl A
76. Dimitriou M
77. Dimitrov L
78. Dorajoo R
79. Ekici AB
80. Engmann JE
81. Fairhurst-Hunter Z
82. Farmaki A-E
83. Faul JD
84. Fernandez-Lopez J-C
85. Forer L
86. Francescatto M
87. Freitag-Wolf S
88. Fuchsberger C
89. Galesloot TE
90. Gao Y
91. Gao Z
92. Geller F
93. Giannakopoulou O
94. Giulianini F
95. Gjesing AP
96. Goel A
97. Gordon SD
98. Gorski M
99. Grove J
100. Guo X
101. Gustafsson S
102. Haessler J
103. Hansen TF
104. Havulinna AS
105. Haworth SJ
106. He J
107. Heard-Costa N
108. Hebbar P
109. Hindy G
110. Ho Y-LA
111. Hofer E
112. Holliday E
113. Horn K
114. Hornsby WE
115. Hottenga J-J
116. Huang H
117. Huang J
118. Huerta-Chagoya A
119. Huffman JE
120. Hung Y-J
121. Huo S
122. Hwang MY
123. Iha H
124. Ikeda DD
125. Isono M
126. Jackson AU
127. Jäger S
128. Jansen IE
129. Johansson I
130. Jonas JB
131. Jonsson A
132. Jørgensen T
133. Kalafati I-P
134. Kanai M
135. Kanoni S
136. Kårhus LL
137. Kasturiratne A
138. Katsuya T
139. Kawaguchi T
140. Kember RL
141. Kentistou KA
142. Kim H-N
143. Kim YJ
144. Kleber ME
145. Knol MJ
146. Kurbasic A
147. Lauzon M
148. Le P
149. Lea R
150. Lee J-Y
151. Leonard HL
152. Li SA
153. Li X
154. Li X
155. Liang J
156. Lin H
157. Lin S-Y
158. Liu J
159. Liu X
160. Lo KS
161. Long J
162. Lores-Motta L
163. Luan J
164. Lyssenko V
165. Lyytikäinen L-P
166. Mahajan A
167. Mamakou V
168. Mangino M
169. Manichaikul A
170. Marten J
171. Mattheisen M
172. Mavarani L
173. McDaid AF
174. Meidtner K
175. Melendez TL
176. Mercader JM
177. Milaneschi Y
178. Miller JE
179. Millwood IY
180. Mishra PP
181. Mitchell RE
182. Møllehave LT
183. Morgan A
184. Mucha S
185. Munz M
186. Nakatochi M
187. Nelson CP
188. Nethander M
189. Nho CW
190. Nielsen AA
191. Nolte IM
192. Nongmaithem SS
193. Noordam R
194. Ntalla I
195. Nutile T
196. Pandit A
197. Christofidou P
198. Pärna K
199. Pauper M
200. Petersen ERB
201. Petersen LV
202. Pitkänen N
203. Polašek O
204. Poveda A
205. Preuss MH
206. Pyarajan S
207. Raffield LM
208. Rakugi H
209. Ramirez J
210. Rasheed A
211. Raven D
212. Rayner NW
213. Riveros C
214. Rohde R
215. Ruggiero D
216. Ruotsalainen SE
217. Ryan KA
218. Sabater-Lleal M
219. Saxena R
220. Scholz M
221. Sendamarai A
222. Shen B
223. Shi J
224. Shin JH
225. Sidore C
226. Sitlani CM
227. Slieker RC
228. Smit RAJ
229. Smith AV
230. Smith JA
231. Smyth LJ
232. Southam L
233. Steinthorsdottir V
234. Sun L
235. Takeuchi F
236. Tallapragada DSP
237. Taylor KD
238. Tayo BO
239. Tcheandjieu C
240. Terzikhan N
241. Tesolin P
242. Teumer A
243. Theusch E
244. Thompson DJ
245. Thorleifsson G
246. Timmers PRHJ
247. Trompet S
248. Turman C
249. Vaccargiu S
250. van der Laan SW
251. van der Most PJ
252. van Klinken JB
253. van Setten J
254. Verma SS
255. Verweij N
256. Veturi Y
257. Wang CA
258. Wang C
259. Wang L
260. Wang Z
261. Warren HR
262. Bin Wei W
263. Wickremasinghe AR
264. Wielscher M
265. Wiggins KL
266. Winsvold BS
267. Wong A
268. Wu Y
269. Wuttke M
270. Xia R
271. Xie T
272. Yamamoto K
273. Yang J
274. Yao J
275. Young H
276. Yousri NA
277. Yu L
278. Zeng L
279. Zhang W
280. Zhang X
281. Zhao J-H
282. Zhao W
283. Zhou W
284. Zimmermann ME
285. Zoledziewska M
286. Adair LS
287. Adams HHH
288. Aguilar-Salinas CA
289. Al-Mulla F
290. Arnett DK
291. Asselbergs FW
292. Åsvold BO
293. Attia J
294. Banas B
295. Bandinelli S
296. Bennett DA
297. Bergler T
298. Bharadwaj D
299. Biino G
300. Bisgaard H
301. Boerwinkle E
302. Böger CA
303. Bønnelykke K
304. Boomsma DI
305. Børglum AD
306. Borja JB
307. Bouchard C
308. Bowden DW
309. Brandslund I
310. Brumpton B
311. Buring JE
312. Caulfield MJ
313. Chambers JC
314. Chandak GR
315. Chanock SJ
316. Chaturvedi N
317. Chen Y-DI
318. Chen Z
319. Cheng C-Y
320. Christophersen IE
321. Ciullo M
322. Cole JW
323. Collins FS
324. Cooper RS
325. Cruz M
326. Cucca F
327. Cupples LA
328. Cutler MJ
329. Damrauer SM
330. Dantoft TM
331. de Borst GJ
332. de Groot LC
333. De Jager PL
334. de Kleijn DPV
335. Janaka de Silva H
336. Dedoussis GV
337. den Hollander AI
338. Du S
339. Easton DF
340. Elders PJM
341. Eliassen AH
342. Ellinor PT
343. Elmståhl S
344. Erdmann J
345. Evans MK
346. Fatkin D
347. Feenstra B
348. Feitosa MF
349. Ferrucci L
350. Ford I
351. Fornage M
352. Franke A
353. Franks PW
354. Freedman BI
355. Gasparini P
356. Gieger C
357. Girotto G
358. Goddard ME
359. Golightly YM
360. Gonzalez-Villalpando C
361. Gordon-Larsen P
362. Grallert H
363. Grant SFA
364. Grarup N
365. Griffiths L
366. Gudnason V
367. Haiman C
368. Hakonarson H
369. Hansen T
370. Hartman CA
371. Hattersley AT
372. Hayward C
373. Heckbert SR
374. Heng C-K
375. Hengstenberg C
376. Hewitt AW
377. Hishigaki H
378. Hoyng CB
379. Huang PL
380. Huang W
381. Hunt SC
382. Hveem K
383. Hyppönen E
384. Iacono WG
385. Ichihara S
386. Ikram MA
387. Isasi CR
388. Jackson RD
389. Jarvelin M-R
390. Jin Z-B
391. Jöckel K-H
392. Joshi PK
393. Jousilahti P
394. Jukema JW
395. Kähönen M
396. Kamatani Y
397. Kang KD
398. Kaprio J
399. Kardia SLR
400. Karpe F
401. Kato N
402. Kee F
403. Kessler T
404. Khera AV
405. Khor CC
406. Kiemeney LALM
407. Kim B-J
408. Kim EK
409. Kim H-L
410. Kirchhof P
411. Kivimaki M
412. Koh W-P
413. Koistinen HA
414. Kolovou GD
415. Kooner JS
416. Kooperberg C
417. Köttgen A
418. Kovacs P
419. Kraaijeveld A
420. Kraft P
421. Krauss RM
422. Kumari M
423. Kutalik Z
424. Laakso M
425. Lange LA
426. Langenberg C
427. Launer LJ
428. Le Marchand L
429. Lee H
430. Lee NR
431. Lehtimäki T
432. Li H
433. Li L
434. Lieb W
435. Lin X
436. Lind L
437. Linneberg A
438. Liu C-T
439. Liu J
440. Loeffler M
441. London B
442. Lubitz SA
443. Lye SJ
444. Mackey DA
445. Mägi R
446. Magnusson PKE
447. Marcus GM
448. Vidal PM
449. Martin NG
450. März W
451. Matsuda F
452. McGarrah RW
453. McGue M
454. McKnight AJ
455. Medland SE
456. Mellström D
457. Metspalu A
458. Mitchell BD
459. Mitchell P
460. Mook-Kanamori DO
461. Morris AD
462. Mucci LA
463. Munroe PB
464. Nalls MA
465. Nazarian S
466. Nelson AE
467. Neville MJ
468. Newton-Cheh C
469. Nielsen CS
470. Nöthen MM
471. Ohlsson C
472. Oldehinkel AJ
473. Orozco L
474. Pahkala K
475. Pajukanta P
476. Palmer CNA
477. Parra EJ
478. Pattaro C
479. Pedersen O
480. Pennell CE
481. Penninx BWJH
482. Perusse L
483. Peters A
484. Peyser PA
485. Porteous DJ
486. Posthuma D
487. Power C
488. Pramstaller PP
489. Province MA
490. Qi Q
491. Qu J
492. Rader DJ
493. Raitakari OT
494. Ralhan S
495. Rallidis LS
496. Rao DC
497. Redline S
498. Reilly DF
499. Reiner AP
500. Rhee SY
501. Ridker PM
502. Rienstra M
503. Ripatti S
504. Ritchie MD
505. Roden DM
506. Rosendaal FR
507. Rotter JI
508. Rudan I
509. Rutters F
510. Sabanayagam C
511. Saleheen D
512. Salomaa V
513. Samani NJ
514. Sanghera DK
515. Sattar N
516. Schmidt B
517. Schmidt H
518. Schmidt R
519. Schulze MB
520. Schunkert H
521. Scott LJ
522. Scott RJ
523. Sever P
524. Shiroma EJ
525. Shoemaker MB
526. Shu X-O
527. Simonsick EM
528. Sims M
529. Singh JR
530. Singleton AB
531. Sinner MF
532. Smith JG
533. Snieder H
534. Spector TD
535. Stampfer MJ
536. Stark KJ
537. Strachan DP
538. ’t Hart LM
539. Tabara Y
540. Tang H
541. Tardif J-C
542. Thanaraj TA
543. Timpson NJ
544. Tönjes A
545. Tremblay A
546. Tuomi T
547. Tuomilehto J
548. Tusié-Luna M-T
549. Uitterlinden AG
550. van Dam RM
551. van der Harst P
552. Van der Velde N
553. van Duijn CM
554. van Schoor NM
555. Vitart V
556. Völker U
557. Vollenweider P
558. Völzke H
559. Wacher-Rodarte NH
560. Walker M
561. Wang YX
562. Wareham NJ
563. Watanabe RM
564. Watkins H
565. Weir DR
566. Werge TM
567. Widen E
568. Wilkens LR
569. Willemsen G
570. Willett WC
571. Wilson JF
572. Wong T-Y
573. Woo J-T
574. Wright AF
575. Wu J-Y
576. Xu H
577. Yajnik CS
578. Yokota M
579. Yuan J-M
580. Zeggini E
581. Zemel BS
582. Zheng W
583. Zhu X
584. Zmuda JM
585. Zonderman AB
586. Zwart J-A
587. Chasman DI
588. Cho YS
589. Heid IM
590. McCarthy MI
591. Ng MCY
592. O’Donnell CJ
593. Rivadeneira F
594. Thorsteinsdottir U
595. Sun YV
596. Tai ES
597. Boehnke M
598. Deloukas P
599. Justice AE
600. Lindgren CM
601. Loos RJF
602. Mohlke KL
603. North KE
604. Stefansson K
605. Walters RG
606. Winkler TW
607. Young KL
608. Loh P-R
609. Yang J
610. Esko T
611. Assimes TL
612. Auton A
613. Abecasis GR
614. Willer CJ
615. Locke AE
616. Berndt SI
617. Lettre G
618. Frayling TM
619. Okada Y
620. Wood AR
621. Visscher PM
622. Hirschhorn JN
623. 23andMe Research Team
624. VA Million Veteran Program
625. DiscovEHR (DiscovEHR and MyCode Community Health Initiative)
626. eMERGE (Electronic Medical Records and Genomics Network)
627. Lifelines Cohort Study
628. PRACTICAL Consortium
629. Understanding Society Scientific Group
(2022) A saturated map of common genetic variants associated with human height
Nature 610:704–712.

https://doi.org/10.1038/s41586-022-05275-y
- PubMed
- Google Scholar
(2016) Multiple novel gene-by-environment interactions modify the effect of FTO variants on body mass index
Nature Communications 7:12724.

https://doi.org/10.1038/ncomms12724
- PubMed
- Google Scholar
1. Young AI
(2019) Solving the missing heritability problem
PLOS Genetics 15:e1008222.

https://doi.org/10.1371/journal.pgen.1008222
- PubMed
- Google Scholar
1. Zhang L
2. MacQueen A
3. Bonnette J
4. Fritschi FB
5. Lowry DB
6. Juenger TE
(2021) QTL × environment interactions underlie ionome divergence in switchgrass
G3 11:jkab144.

https://doi.org/10.1093/g3journal/jkab144
- PubMed
- Google Scholar
1. Zhu C
2. Ming MJ
3. Cole JM
4. Edge MD
5. Kirkpatrick M
6. Harpak A
(2023) Amplification is the primary mode of gene-by-sex interaction in complex human traits
Cell Genomics 3:100297.

https://doi.org/10.1016/j.xgen.2023.100297
- PubMed
- Google Scholar
(2000) Patterns of genetic variation in Mendelian and complex traits
Annual Review of Genomics and Human Genetics 1:387–407.

https://doi.org/10.1146/annurev.genom.1.1.387
- PubMed
- Google Scholar

Article and author information

Author details

Eric Weine
1. Department of Integrative Biology, The University of Texas at Austin, Austin, United States
2. Department of Population Health, The University of Texas at Austin, Austin, United States
3. Department of Human Genetics, University of Chicago, Chicago, United States
Contribution
Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0009-0001-7809-1649
Samuel Pattillo Smith
1. Department of Integrative Biology, The University of Texas at Austin, Austin, United States
2. Department of Population Health, The University of Texas at Austin, Austin, United States
Contribution
Formal analysis, Investigation, Methodology, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-6269-0276
Rebecca Kathryn Knowlton

Department of Statistics and Data Sciences, The University of Texas at Austin, Austin, United States

Contribution
Data curation, Formal analysis, Investigation, Methodology

Competing interests
No competing interests declared
Arbel Harpak
1. Department of Integrative Biology, The University of Texas at Austin, Austin, United States
2. Department of Population Health, The University of Texas at Austin, Austin, United States
Contribution
Conceptualization, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing

For correspondence
arbelharpak@utexas.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-3655-748X

Funding

National Institutes of Health (R35GM151108)

Arbel Harpak

National Institutes of Health (RF1AG073593)

Samuel Pattillo Smith

Pew Charitable Trusts (Pew Biomedical Scholarship)

Arbel Harpak

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Doc Edge, Marc Feldman, Mark Kirkpatrick, Molly Przeworski, Anil Raj, Elliot Tucker-Drob, and members of the Harpak Lab for comments on the manuscript. We thank Peter Andolfatto, Julien Ayroles, and Tom Juenger for helpful discussions. All authors were supported by NIH R35GM151108 to AH. SP Smith was also supported by NIH RF1AG073593. This study was conducted using the UK Biobank resource under application 61666, as approved by the University of Texas at Austin institutional review board (protocol 2019-02-0125).

Version history

Preprint posted: May 14, 2024
Sent for peer review: May 14, 2024
Reviewed Preprint version 1: July 17, 2024
Reviewed Preprint version 2: March 13, 2025
Version of Record published: April 10, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.99210. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

500

views
26

downloads
2

citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Article PDF

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Eric Weine
Samuel Pattillo Smith
Rebecca Kathryn Knowlton
Arbel Harpak

(2025)

Trade-offs in modeling context dependency in complex trait genetics

eLife 13:RP99210.

https://doi.org/10.7554/eLife.99210.3

Share this article

Cite this article

Bias-variance trade-off for single-site estimation with equal estimation noise and equal heterozygosity across contexts.

The decision boundary with different ratios of context-specific estimation noises.

Applying the decision rule to sex-dependent effects on human physiological traits.

A focus on top hits may be lead to mischaracterization of polygenic gene-by-environment interactions (GxE).

Polygenic score performance for context-dependent prediction models.

Out-of-sample validation of decision rule for sex-stratified genome-wide association studies (GWAS) for 27 complex traits.

Bias-variance trade-off at random variants in a sex-stratified genome-wide association studies (GWAS).

Bias-variance trade-off at genome-wide significant variants (p-value <5×10−8 in males) in a sex-stratified genome-wide association studies (GWAS).

Estimated mixture weights on covariance of true effects in Pallares et al.

Polygenic score performance for context-aware models with greater amplification than in the main text.

Model performance under correlation between genotype and context.

Author details

Eric Weine

Contribution

Competing interests

Samuel Pattillo Smith

Contribution

Competing interests

Rebecca Kathryn Knowlton

Contribution

Competing interests

Arbel Harpak

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organisms

Further reading

Bias-variance trade-off at genome-wide significant variants (p-value $< 5 \times 10^{- 8}$ in males) in a sex-stratified genome-wide association studies (GWAS).