Rare variants contribute disproportionately to quantitative trait variation in yeast

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

How variants with different frequencies contribute to trait variation is a central question in genetics. We use a unique model system to disentangle the contributions of common and rare variants to quantitative traits. We generated ~14,000 progeny from crosses among 16 diverse yeast strains and identified thousands of quantitative trait loci (QTLs) for 38 traits. We combined our results with sequencing data for 1011 yeast isolates to show that rare variants make a disproportionate contribution to trait variation. Evolutionary analyses revealed that this contribution is driven by rare variants that arose recently, and that negative selection has shaped the relationship between variant frequency and effect size. We leveraged the structure of the crosses to resolve hundreds of QTLs to single genes. These results refine our understanding of trait variation at the population level and suggest that studies of rare variants are a fertile ground for discovery of genetic effects.

https://doi.org/10.7554/eLife.49212.001

Introduction

A detailed understanding of the sources of heritable variation is a central goal of modern genetics. Genome-wide association studies (GWAS) in humans (Visscher et al., 2017) have implicated tens of thousands of DNA sequence variants in disease risk and quantitative trait variation, but these variants fail to account for the entire heritability of diseases and traits. One key question is the relative contribution of DNA sequence variants with different allele frequencies in a population to trait variation. GWAS by design only test common DNA sequence variants; however, recent studies underscore the likely importance of the contribution of rare variants to heritable variation (Wainschtein et al., 2019). Theoretical analyses have explored how factors such as mutational target size, pleiotropy, and the strength of selection shape the relationship between variant frequency and effect size (Eyre-Walker, 2010; Robinson et al., 2014; Simons et al., 2018). In particular, purifying selection against variants that negatively affect fitness is expected to keep them at low frequencies in a population, resulting in a predicted inverse relationship between effect sizes and allele frequencies for variants that influence fitness-related traits (Gibson, 2012; Goldstein et al., 2013; Kryukov et al., 2007; Pritchard, 2001).

Empirical results have been consistent with the theoretical expectation that rare variants should have larger effect sizes, or, equivalently, that variants implicated in trait variation should be shifted to lower frequencies relative to all variants. An increased burden of ultra-rare protein-truncating variants has been observed in human diseases (Ganna et al., 2018; Exome Aggregation Consortium et al., 2016), and multiple studies have found that GWAS variants with lower allele frequencies have larger effect sizes (Marouli et al., 2017; Park et al., 2011). A negative correlation between allele frequency and effect size has also been observed in maize GWAS (Wallace et al., 2014), and our previous work in yeast suggested that variants that contribute to trait variation are shifted to lower frequencies when compared to all sequence variants (Ehrenreich et al., 2012).

Recent studies employed indirect variance partitioning approaches to uncover appreciable contributions of lower frequency variants to heritability of complex traits in humans, including prostate cancer susceptibility (Mancuso et al., 2016), height (Wainschtein et al., 2019; Yang et al., 2015), and body mass index (Wainschtein et al., 2019). However, a direct comprehensive comparison of the effects of rare and common variants has been lacking in humans for two principal reasons. First, rare variants cannot be detected by GWAS by design, and sequencing studies have not reached sufficient sample sizes to find them with high statistical power (Zuk et al., 2014). As a result, most rare variants have to date escaped detection. Second, the power to detect a variant with any given effect size decreases with the frequency of the variant in the study, simply because fewer individuals in the sample carry a less-frequent variant (Zuk et al., 2014). This statistical artifact shifts the effect sizes of those rare variants that are detected upwards, confounding effect size and allele frequency and biasing any effort to measure the underlying relationship between the two.

Here, we report a comprehensive study in yeast designed to overcome these limitations. We built a mapping population consisting of approximately one thousand progeny from each of 16 biparental crosses. In this mapping population, even variants that are rare in the yeast population and occur in only a single parental strain are present in approximately 1000 progeny, resulting in high power to detect them. We mapped thousands of QTLs that account for most of the heritable variation in 38 quantitative traits and measured the QTL effect sizes. We then decoupled variant frequency from effect size by measuring the population allele frequencies of QTL lead variants detected in our panel in a separate large catalog of sequenced yeast isolates (Peter et al., 2018). Analysis of these large complementary data sets enabled us to directly and comprehensively examine the relationship between QTL effect sizes and variant frequency, characterize the genetic architecture of quantitative traits on a population scale, and improve mapping resolution, in many cases to single genes.

Results

To investigate the genetic basis of quantitative traits in the yeast population, we selected 16 highly diverse S. cerevisiae strains that capture much of the known genetic diversity of this species. Specifically, they contain both alleles at 82% of biallelic SNPs and small indels observed at minor allele frequency >5% in a collection of 1011 S. cerevisiae strains (Peter et al., 2018). We sequenced the 16 strains to high coverage in order to obtain a comprehensive set of genetic variants. We constructed a panel of 13,950 individual recombinant haploid yeast segregants by crossing each parental strain to two different strains and collecting an average of 872 progeny per cross (Figure 1; Figure 1—source data 1; Supplementary file 1). We genotyped these segregants by highly multiplexed whole-genome sequencing, with median 2.3-fold coverage per base per individual. Genotypes were called at 298,979 genetic variants, with an average of 71,117 genetic variants segregating in a single cross. We measured the growth of each segregant in 38 different environments in duplicate by automated assays and quantitative imaging (Materials and methods). Because the growth measurements in different environments are not strongly correlated, we treat them as separate phenotypes or traits (Bloom et al., 2013). The resulting genotype-by-phenotype matrix (over half a million phenotypic measurements and 158 billion combinations of genotype and phenotype) formed the basis for all downstream analyses.

Figure 1

Download asset Open asset

Multiparental cross design with 16 diverse progenitor yeast strains.

16 parental strains were chosen to represent the diversity of the *S. cerevisiae* population, as illustrated by their positions on a neighbor-joining tree based on 1011 sequenced isolates (Peter et al., 2018). These strains were crossed in a single round-robin design, with each strain crossed to two other strains, as depicted by lines connecting the colored circles. Colors indicate the ecological origins of the parental strains.

https://doi.org/10.7554/eLife.49212.002

Figure 1—source data 1 Additional information on yeast crosses and phenotypes. Strain information for the 16 haploid parents and 16 F1 hybrids between them is listed. Additional information about the conditions tested is indicated.: https://doi.org/10.7554/eLife.49212.003
Download elife-49212-fig1-data1-v2.xls

We used a variance components model (Bloom et al., 2015; de los Campos et al., 2015; Yang et al., 2010) to show that, on average, additive genetic effects accounted for just over half of the total phenotypic variance, while pairwise genetic interactions accounted for 8%, approximately 1/6 as much as additive effects (Figure 2 inset; Supplementary file 2; Figure 2—source data 1). We carried out QTL mapping to find the specific loci contributing additively to trait variation. We used a joint mapping approach that leverages information across the entire panel of 13,950 segregants (Materials and methods). We mapped 4552 QTLs at a false discovery rate (FDR) of 5%, with an average of 120 (range 52–195) QTLs per trait (Supplementary file 3; Figure 3—source data 1). The detected QTLs explain a median of 73% of the additive heritability per trait and cross, showing that we can account for most of the genetic contribution to trait variation with specific loci (Figure 2; Figure 2—source data 1). We complemented the joint analysis with QTL mapping within each cross and found a median of 12 QTLs per trait at the same FDR of 5%. The detected loci explained a median of 68% of the additive heritability (Figure 2—source data 1). The joint analysis was more powerful, explaining an additional 5% of trait variance and uncovering 458 QTLs not detected within individual crosses. Consistent with the higher statistical power of the joint analysis, these additional QTLs had smaller effect sizes (median of 0.071 SD units vs 0.083 SD units; Wilcoxon rank sum test W = 1e6, p=9e-5). All subsequent results are based on the QTLs detected in the joint analysis.

Figure 2

Download asset Open asset

Most heritable variation is explained by detected QTLs.

Whole-genome estimates of additive genetic variance (X-axis) are plotted against cross-validated estimates of trait variance explained by detected QTLs (Y-axis) for each trait-cross combination. Red points show values for the BY-RM cross. The diagonal line corresponds to detected QTLs explaining all of the estimated additive genetic variance, and is shown as a visual guide. (Inset) A histogram of the ratio of non-additive to additive genetic variance for each trait-cross combination, based on estimates from a variance component model.

https://doi.org/10.7554/eLife.49212.004

Figure 2—source data 1 Total variance explained by QTLs and within-cross variance component analyses. Results from within-cross variance components models and total variance explained by the QTL models are listed.: https://doi.org/10.7554/eLife.49212.005
Download elife-49212-fig2-data1-v2.xls

To investigate the relationship between variant frequency and QTL effects, we focused on biallelic variants observed in our panel whose frequency could be measured in a large collection of 1011 sequenced yeast strains. Based on their minor allele frequency (MAF) in this collection, we designated variants as rare (MAF <0.01) or common (MAF >0.01). By this definition, 27.8% of biallelic variants in our study were rare. For each trait, we computed the relative fraction of variance explained by these two categories of variants in the segregant panel (Materials and methods) (Yang et al., 2015). Across all traits, the median contribution of rare variants was 51.7%, despite the fact that they constituted only 27.8% of all variants and that a rare variant is expected to explain less variance than a common one with the same allelic effect size. These results are consistent with rare variants having larger effect sizes and making a disproportionate contribution to trait variation. Comparing different traits, we saw a wide range of the relative contribution of rare variants, from almost none for growth in the presence of copper sulfate and lithium chloride to over 75% for growth in the presence of cadmium chloride, in low pH, at high temperature, and on minimal medium (Figure 3A; Figure 3—figure supplement 1; Figure 3—source data 2). The results for copper sulfate and lithium chloride are consistent with GWAS for these traits in the 1011 sequenced yeast strains—these two traits had the most phenotypic variance explained by detected GWAS loci, which inherently correspond to common variants, with large contributions coming from known common copy-number variation at the CUP and ENA loci, respectively (Peter et al., 2018).

Figure 3 with 5 supplements see all

Download asset Open asset

Effect size and contribution to trait variation of rare and common variants.

(A) Stacked bar plots of additive genetic variance explained by rare (blue) and common (gray) variants. Error bars show + /- s.e. (B) Minor allele frequency (X-axis) of the lead variant at each QTL (Peter et al., 2018) is plotted against QTL effect size (Y-axis). Red points show mean QTL effect sizes for groups of approximately 100 variants binned by allele frequency. Error bars show + /- s.e.m. (C) Frequency of the derived allele of each QTL lead variant (X-axis), based on comparison with *S. paradoxus*, is plotted against QTL effect size (Y-axis). Negative values on the Y-axis correspond to variants with effects that are detrimental for growth.

https://doi.org/10.7554/eLife.49212.006

Figure 3—source data 1 Detected QTL. QTL mapping results are listed for both the within-cross and the joint analysis.: https://doi.org/10.7554/eLife.49212.012
Download elife-49212-fig3-data1-v2.xls
Figure 3—source data 2 Joint variance component estimates. Results for the joint variance component models are given. This includes results for a model with two allele frequency bins (Figure 3A, figure supplement 1A), seven allele frequency bins (Figure 3—figure supplement 1B), and seven allele frequency bins using only variants that are private to each of the 16 parents (Figure 3—figure supplement 1C).: https://doi.org/10.7554/eLife.49212.013
Download elife-49212-fig3-data2-v2.xls

In a complementary analysis, we investigated the relationship between the allele frequency of the lead variant at each QTL and the corresponding QTL effect size. Although the lead variant is not necessarily causal, in our study it is likely to be of similar frequency as the causal variant, and a simulation analysis showed that this approach largely preserves the relationship between frequency and effect size (Figure 3—figure supplement 2). Most QTLs had small effects (64% of QTLs had effects less than 0.1 SD units) and most lead variants were common (78%), consistent with previous linkage and association studies. We observed that QTLs with large effects were highly enriched for rare variants, and conversely, that rare variants were highly enriched for large effect sizes (Figure 3B; Figure 3—figure supplement 3; Figure 3—figure supplement 4). For instance, among QTLs with an absolute effect of at least 0.3 SD units, 145 of the corresponding lead variants were rare and only 90 were common. Rare variants were 6.7 times more likely to have an effect greater than 0.3 SD (Figure 3—source data 1, Fisher’s exact test, p<2e-16). Theoretical population genetics models show that for traits under negative selection, variant effect size is expected to be a decreasing function of minor allele frequency (Eyre-Walker, 2010; Pritchard, 2001). We empirically observe this relationship in our data for most of the traits examined, providing evidence that they have evolved under negative selection in the yeast population (Figure 3—figure supplement 5).

The existence of a close sister species of S. cerevisiae—S. paradoxus—allowed us to distinguish rare variants by their ancestral state. Variants that share the major allele with S. paradoxus are more likely to have arisen in the S. cerevisiae population recently than those that share the minor allele with S. paradoxus. We classified low-frequency variants as recent or ancient according to whether their major or minor allele was shared with S. paradoxus, respectively. Recently arising deleterious alleles have had less time to be purged by negative selection, and therefore recent variants are expected to have stronger effects on gene function, and hence manifest as QTLs with larger effects. Consistent with this expectation, we observed that recent variants were 1.8 times more likely than ancient variants to have an effect size greater than 0.1 SD units (Fisher’s exact test p=9e-5) (Figure 3C). We further examined the direction of QTL effects and found that recent variants were 1.5 times more likely to decrease fitness (Fisher’s exact test p=8e-3). Strikingly, no ancient variant decreased fitness by more than 0.5 SD units, whereas 41 recent variants did (Fisher’s exact test p=7e-3).

An understanding of trait variation at the level of molecular mechanisms requires narrowing QTLs to the underlying causal genes. Such fine-mapping is a challenge because genetic linkage causes variants across an extended region to show mapping signals of similar strength. Statistical fine-mapping aims to address this challenge by estimating the probability that each variant within a QTL region is causal based on the precise pattern of genotype-phenotype correlations (Farh et al., 2015; Pasaniuc and Price, 2017; Treusch et al., 2015). Our crossing design enables us to obtain higher resolution for QTLs observed in two crosses that share a parent strain by looking for consistent inheritance patterns in both. Specifically, we focused on QTLs with effects greater than 0.14 SD units and used a Bayesian framework (Farh et al., 2015) to compute the posterior probability that each variant is causal (Figure 4A). We then aggregated these probabilities to obtain causality scores for each gene in a QTL. With this approach, we resolved 427 QTLs to single causal genes at an FDR of 20%. Because some QTLs have pleiotropic effects on multiple traits, this gene set contains 195 unique genes, greatly expanding the repertoire of causal genes in yeast. We searched the literature and found that 26 of the 195 genes identified here are supported by previous experimental evidence as causal for yeast trait variation (Fay, 2013; Jerison et al., 2017; Sadhu et al., 2016; Treusch et al., 2015; Wang and Kruglyak, 2014) (Figure 4B; Figure 4—source data 1). At a more stringent FDR of 5%, we found 105 unique causal genes, which included 24 of the 26 genes with experimental evidence.

Figure 4

Download asset Open asset

QTL fine-mapping at gene-level resolution.

(A) Statistical fine-mapping of a QTL for growth in the presence of caffeine. Genetic mapping signal, shown as the coefficient of determination between genotype and phenotype (Y-axis, left), is plotted against genome position (X-axis) for crosses between 273614N and YJM981 (black) and YJM981 and CBS2888 (blue). The posterior probability of causality (PPC), plotted in red (Y-axis, right), localizes the QTL to a portion of the gene TOR1. (B) PPC is shown as black dots for 195 genes identified as causal at an FDR of 20%, sorted by PPC. Genes containing natural variants that have been experimentally validated as causal for trait variation in prior studies (Fay, 2013; Jerison et al., 2017; Sadhu et al., 2016; Treusch et al., 2015; Wang and Kruglyak, 2014) are shown in red and labeled with gene names.

https://doi.org/10.7554/eLife.49212.014

Figure 4—source data 1 Candidate causal genes and GO enrichments. Candidate causal genes per QTL are listed. GO enrichments for causal genes are listed.: https://doi.org/10.7554/eLife.49212.015
Download elife-49212-fig4-data1-v2.xls

Causal genes were highly enriched for GO terms related to the plasma membrane (45 of 522, 16.5 expected, q = 1.8e-7), metal ion transport (13 of 83, 2.6 expected, q = 0.0009), and positive regulation of nitrogen compound biosynthesis (28 of 393, 12.5 expected, q = 0.0076) (Figure 4—source data 1). Strikingly, five of the six genes involved in cAMP biosynthesis were identified as causal (IRA1, IRA2, BCY1, CYR1, and RAS1; 0.19 expected, q = 0.0002). Additional genes in the RAS/cAMP signaling pathway were also identified as causal, including GPR1, which is involved in glucose sensing, SRV2, which binds adenylate cyclase, and RHO3, which encodes a RAS-like GTPase. In yeast, the RAS/cAMP pathway regulates cell cycle progression, metabolism, and stress resistance (Tisi et al., 2014). Variation in many of these genes influenced growth on alternative carbon sources. We hypothesize that the yeast population contains abundant functional variation in genes that regulate the switch from glucose to alternative carbon sources through the RAS/cAMP pathway.

Discussion

We previously used a cross between lab (BY) and vineyard (RM) strains of yeast to show that the majority of heritable phenotypic differences arise from additive genetic effects, and we were able to detect, at genome-wide significance, specific loci that together account for the majority of quantitative trait variation (Bloom et al., 2015; Bloom et al., 2013). It has been argued that the BY lab reference strain (commonly known as S288c) used in those and many other yeast studies is genetically and phenotypically atypical compared to other yeast isolates (Warringer et al., 2011). Our results here, obtained from crosses among 16 diverse strains, generalize these findings to the S. cerevisiae population and show that S288c is not exceptional from the standpoint of genetic variation and quantitative traits. We believe that the findings that the majority of the genetic variance of most traits is additive, and that there is little additive ‘missing heritability’ in studies with sufficiently large sample sizes, will apply broadly beyond yeast.

We discovered over 4500 quantitative trait loci (QTLs) that influence yeast growth in a wide variety of conditions. These loci likely capture the majority of common variants that segregate in S. cerevisiae and have appreciable phenotypic effects on growth, and therefore provide a comprehensive starting point for more fine-grained analyses of the genetic contribution to quantitative trait variation. We were able to localize approximately 8% of the QTLs to single genes based on genetic mapping information alone. Interestingly, these genes cluster in specific functional categories and pathways, suggesting that different strains of S. cerevisiae may have evolved different strategies for nutrient sensing and response as a function of specializing in particular environmental niches (Chantranupong et al., 2015). In addition to the findings described here, we anticipate that our data set will be a useful resource for further dissecting the genetic basis of trait variation at the gene and variant level (Peltier et al., 2019), and for evaluating statistical methods aimed at inferring causal genes and variants. In particular, the set of loci and genes identified here provides an ideal starting point for massively parallel editing experiments that directly test the phenotypic consequences of sequence variants (Shendure and Fields, 2016).

By combining our results with deep population sequencing in yeast (Peter et al., 2018), we were able to examine the contributions of variants in different frequency classes to trait variation. This approach avoids statistical confounding between variant frequency and effect size that occurs when both are measured in the same study sample. We observed a broad range of genetic architectures across the traits studied here, with variation in some traits dominated by common variants, while variation in others is mostly explained by rare variants. Overall, rare variants made a disproportionate contribution to trait variation as a consequence of their larger effect sizes. A complementary mapping approach in an overlapping set of yeast isolates also revealed enrichment of rare variants with larger effects (Fournier et al., 2019). These results are consistent with the finding from GWAS that common variants have small effects, as well as with linkage studies that find rare variants with large effect sizes. Our study design also revealed a substantial component of genetic variation—variants with low allele frequency and small effect size—that has been refractory to discovery in humans because both GWAS and linkage studies lack statistical power to detect this class of variants. Recent work in humans has suggested that rare variants account for a substantial fraction of heritability of complex traits and diseases (Wainschtein et al., 2019). Our study presents a more direct and fine-grained view of this component of trait variation and implies that larger sample sizes and more complete genotype information will be needed for more comprehensive studies in other systems.

Share this article

Cite this article

Multiparental cross design with 16 diverse progenitor yeast strains.

Figure 1—source data 1

Most heritable variation is explained by detected QTLs.

Figure 2—source data 1

Effect size and contribution to trait variation of rare and common variants.

Figure 3—source data 1

Figure 3—source data 2

QTL fine-mapping at gene-level resolution.

Figure 4—source data 1

Author details

Joshua S Bloom

Contribution

For correspondence

Competing interests

James Boocock

Contribution

Competing interests

Sebastian Treusch

Present address

Contribution

Competing interests

Meru J Sadhu

Present address

Contribution

Competing interests

Laura Day

Contribution

Competing interests

Holly Oates-Barker

Contribution

Competing interests

Leonid Kruglyak

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading