Refining the resolution of the yeast genotype-phenotype map using single-cell RNA-sequencing

  1. Department of Cell and Systems Biology, University of Toronto, Ramsay Wright Laboratories, 25 Harbord St, M5S3G5, Toronto, Ontario, Canada
  2. Department of Biology, University of Toronto at Mississauga, 3359 Mississauga Rd, L5L 1C5, Mississauga, Ontario, Canada
  3. Department of Molecular Genetics, University of Toronto, Medical Science Building, Room 4386, 1 King’s College Cir, M5S1A8, Toronto, Ontario, Canada
  4. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Room 230, M5S3E1, Toronto, Ontario, Canada

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Vaughn Cooper
    University of Pittsburgh, Pittsburgh, United States of America
  • Senior Editor
    Detlef Weigel
    Max Planck Institute for Biology Tübingen, Tübingen, Germany

Reviewer #1 (Public Review):

In this paper, N'Guessan et al report a study of expression QTL (eQTL) mapping in yeast using single cells. The authors make use of advances in single-cell RNAseq (scRNAseq) in yeast to increase the efficiency with which this type of analysis can be undertaken. Building on prior research led by the senior author that entailed genotyping and fitness profiling of almost 100,000 cells derived from a cross between two yeast strains (BY and RM) they performed scRNAseq on a subset of 4,489 individual cells. To address the sparsity of genotype data in the expression profiling they used a Hidden Markov Model (HMM) to infer genotypes and then identify the most likely known lineage genotype from the original dataset. To address the relationship between variance in fitness and gene expression the authors partition the variance to investigate the sources of variation. They then perform eQTL mapping and study the relationship between eQTL and fitness QTL identified in the earlier study.

This paper seeks to address the challenging question of how quantitative trait variation and expression variation are related. scRNAseq represents an appealing approach to eQTL mapping as it is possible to simultaneously genotype individual cells and measure expression in the same cell. As eQTL mapping requires large sample sizes to identify statistical relationships, this approach is likely to dramatically increase the statistical power of such studies. However, there are several technical challenges associated with scRNAseq and the authors' study is focused on addressing those challenges. Although the authors present results suggesting the feasibility of the approach there are limitations in the conclusions that can be drawn in the current study owing to the lack of clarity in the presentation of the results. Ultimately, this study presents a proof of concept with limited novel biological insights that would nonetheless make a useful contribution to the literature if the following major points were addressed:

(1) There is insufficient information provided about the nature of data. At a minimum, the following information should be provided to enable assessment of the study: What is the total library size, how many genes are identified per cell, how many UMIs are found per cell, what is the doublet rate, and how are doublets identified (e.g. on the basis of heterozygous calls at polymorphic loci?), how many times is each genotype observed, and how many polymorphic sites are identified per cell that are the basis of genotype inferences?

(2) The prior study analyzed 18 different conditions, whereas this study only assays expression in a single condition. However, the power of the authors' approach is that its efficiency enables testing eQTLs in multiple conditions. The study would be greatly strengthened through analysis of at least one more condition, and ideally several more conditions. The previous fitness study would be a useful guide for choosing additional conditions as identifying those conditions that result in the greatest contrasts in fitness QTL would be best suited to testing the generalizations that can be drawn from the study.

(3) Alternatively, the authors could demonstrate the power of their approach by applying it to a cross between two other yeast strains. As the cross between BY and RM has been exhaustively studied, applying this approach to a different cross would increase the likelihood of making novel biological discoveries.

(4) Figure 1 is misleading as A presents the original study from 2022 without important details such as how genotypes were identified. It is unclear what the barcode is in this study and how it is used in the analysis. Is the barcode for each lineage transcribed so that it is identified in the scRNAseq data? Or, does the barcode in B refer to the cell index barcode? A clearer presentation and explanation of terms are needed to understand the method.

(5) The rationale for the analysis reported in Figure 2B is unclear. The fitness data are from the previous study and the goal is to estimate the heritability using the genotyping data from the scRNAseq data. What is the explanation for why the data don't agree for only one condition, i.e. 37C? And, what are we to understand from the overall result?

(6) Figure 3 presents an analysis of variance partitioning as a Venn diagram. This summarized result is very hard to understand in the absence of any examples of what the underlying raw data look like. For example, what does trait variation look like if only genotype explains the variance or if only gene expression explains the variance? The presented highly summarized data is not intuitive and its presentation is poor - the result that is currently provided would be easier to read in a table format, but the reader needs more information to be able to interpret and understand the result.

(7) I am concerned about the conclusions that can be drawn about expression heritability. The authors claim that expression heritability is correlated with expression levels. It seems likely that this reflects differing statistical power. How can this possibility be excluded?

(8) Conversely, the authors claim that the genes with the lowest heritability are genes involved in the cell cycle. However, uniquely in scRNAseq, cell cycle regulated genes appear to have the highest variance in the data as they are only expressed in a subset of cells. Without incorporating this fact one would erroneously conclude that the variation is not heritable. To test the heritability of cell cycle regulation genes the authors should partition the cells into each cell cycle stage based on expression.

(9) I do not understand Figure S5 and how eQTL sites are assigned to these specific classes given that the authors say that causative variation cannot be resolved because of linkage disequilibrium.

(10) The paragraph starting at line 305 is very confusing. In particular, the authors state that they identify a hotspot of regulation at the mating type locus. It is not obvious why this would be the case. Moreover, they claim that they find evidence for both MATa and MATalpha gene expression. Information is not provided about how segregants were isolated, but assuming that the authors did not dissect 25,000 tetrads to obtain 100,000 segregants I would infer that random spore using SGA was used. In that case, all cells should be MATa. The authors should clarify and explain this observation.

(11) Ultimately, it is not clear what new biological findings the authors have made. There are no novel findings with respect to causative variation underlying eQTLs and I would encourage the authors to make clearer statements in their abstract, introduction, and conclusion about the key discoveries. E.g. What are the "new associations between phenotypic and transcriptomic variations" mentioned in the abstract?

The following minor points should be addressed:

(1) The segregants should be referred to as F2 segregants as they are derived from an F1 cross.

(2) The connections to eQTLs in other organisms should be addressed in the introduction and conclusion. For example, in humans, there has been little evidence for trans eQTLs in contrast to what has been found in yeast.

(3) The authors state that an advantage of scRNAseq over bulk is that it captures rare cell populations (line 79), but this advantage is not exploited in this study.

(4) The authors use ~5% of the lineages from the original study. There is no rationale for why this is an appropriate sample size. Is there an argument for using more cells in eQTL mapping or conversely could the authors ask if fewer cells would provide similar conclusions by downsampling?

(5) I do not agree that the use of UMIs overcomes the challenges of low sequencing depth. UMIs mitigate the possible technical artifacts due to massive PCR amplification.

(6) There is an inadequate reference to prior work on scRNAseq in yeast that established the methods used by the authors and eQTL mapping in human cells using scRNAseq.

(7) The use of empty quotes in Figure 4A is confusing and an alternative presentation method should be used.

(8) The authors speculate about the use of predicted fitness instead of observed fitness, but this is something they could explicitly address in their current study.

Reviewer #2 (Public Review):

Summary:

The experiments and analysis appear to be carefully done. My concerns center on the impact of the work in its current form on the research community.

The focal yeast cross here has been the subject of many previous publications (for smaller sets of recombinant progeny), by the last author and others, including phenotyping, genotyping, transcriptomics, and proteomics. This mini-literature has proven relevant to the community because it has empirically pinpointed exactly how many variants underlie a given trait, both molecular and cellular. That is, whereas in more complex organisms we try our best to estimate/infer the full genetic architecture of varying traits from the results of mapping of necessarily weaker power, the highly-powered yeast system can access a more comprehensive mapping of the dozens of loci impinging on a given trait and learn from it. The question is what exactly we learn from the current study?

Strengths and weaknesses:

Most of the figures center on methods development and validation for the authors' single-cell RNA-seq in the yeast cross, including generating the large raw data set; analysis pipelines for mapping and genotyping (Figure 1); and higher-level analyses that recapitulate previously reported trends in heritability (Figure 2) and eQTL mapping (Figure 3 and Figure 4B-C). One potential novelty of the study is the methods per se: that is, showing that scRNA-seq works for concomitant genotyping and gene expression profiling in the natural variation context. The authors' rigor and effort notwithstanding: in my view, this can be described as modest in terms of principles. That is, the authors did a good job putting the scRNA-seq idea into practice, but their success is perhaps not surprising or highly relevant for work outside of yeast (as the discussion says). The more substantive claim by the authors for the impact of the study is that they make new observations about the role of expression in phenotype (lines 333-335). The major display item of the manuscript on this theme is Figure 4A, reporting which loci that control growth phenotype (from an earlier paper) also control expression. This is solid but I regret to say that the results strike me as modest. The discussion makes some perhaps fairly big claims that the work has helped "bridge understanding of how genetic variation influences transcriptomic variation" and ultimately cellular phenotype. But with the data as they stand, the authors have missed an opportunity to crystallize exactly how a given variant affects expression (perhaps in waves of regulators affecting targets that affect more regulators) and then phenotype, except for the speculations in the text on lines 305-319. The field started down this road years ago with Bayesian causality inference methods applied to eQTL and phenotype mapping (via e.g. the work of Eric Schadt). The authors could now try Mendelian randomization-type fine-grained detailed models for more firepower toward the same end, and/or experimental tests of the genotype-to-expression-to-phenotype relationship. I would see these directions, motivated by fundamental questions that are relevant to the field at large, as leading to a major advance for this very crowded field. As it stands, I felt their absence in this manuscript especially if the authors are selling principles about linking expression and phenotype as their take-home. I also wonder whether the co-mapping of expression and growth traits in Figure 4A would have been possible with e.g. the bulk RNA-seq from Albert et al., 2018, and I recommend that the authors repeat the Figure 4A-type analyses with the latter to justify their statement that their massive scRNA data set would actually be necessary for them to bear fruit (lines 386-388).

I also read the discussion of the manuscript as bringing to the fore some of the challenges a reader has in judging the current state of the results to be of actionable impact. The discussion, and the manuscript, will be improved if the authors can put the work in context, posing concrete questions from the field and stating how they are addressed here and what's left to do.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation