Comparative analysis of multiplexed in situ gene expression profiling technologies

  1. New York Genome Center, New York City, NY, USA
  2. Center for Genomics and Systems Biology, New York University, New York City, NY, USA

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jungmin Choi
    Korea University, Seoul, Korea, the Republic of
  • Senior Editor
    Murim Choi
    Seoul National University, Seoul, Korea, the Republic of

Reviewer #1 (Public Review):

Summary:

Hartman and Satija's manuscript constitutes a significant contribution to the field of imaging-based spatial transcriptomics (ST) through their comprehensive comparative analysis of six multiplexed in situ gene expression profiling technologies. Their findings provide invaluable insights into the practical considerations and performance of these methods, offering robust evidence for researchers seeking optimal ST technologies. However, given the simultaneous availability of similar preprints, readers should exercise caution when comparing findings to ensure reliable information. Therefore, the authors should revise their manuscript to ensure consistency among all ST technologies compared, considering findings from other preprints as well if possible.

Strengths:

(1) The manuscript offers a comprehensive and systematic comparison of six in situ gene expression profiling technologies, including both commercially available and academically developed methods, which is the most extensive study in this field.

(2) Novel metrics have been proposed by the authors to mitigate molecular artifacts and off-target signals, enhancing the accuracy of sensitivity and specificity comparisons across datasets. By emphasizing the significance of evaluating both sensitivity and specificity, the study addresses the challenge of comparing standard metrics like the number of unique molecules detected per cell, given variations in panel composition and off-target molecular artifacts. This feature is directly connected to their development of novel cell segmentation methods to improve the specificity.

(3) As a result of the analysis performed earlier, the authors illustrate how molecular false positives can distort spatially-aware differential expression analysis, underscoring the necessity for caution in interpreting downstream results.

(4) Offering guidance for the selection, processing, and interpretation of in situ spatial technologies, the study equips researchers in the field with valuable insights.

Weaknesses:

(1) Although focusing on mouse brain datasets broadens the comparison of technologies, it confines the study to a single biological context. Discussing the potential limitations of this approach and advocating for future studies in diverse tissue types would enrich the manuscript, especially for clinical FFPE applications.

(2) Providing more explicit details on the criteria used to select datasets for each technology would ensure a fair and unbiased comparison. Otherwise, it may look like the Hall of Fame for champion data sets to advertise a certain commercial product.

(3) Improving the discussion part by discussing the origins of non-specific signals and molecular artifacts, alongside the challenges related to cell segmentation across different tissue types and cell morphologies, would enrich its content. Note that all of these experimental sets have been obtained from thin mouse brain slices, which are actually 3D although they are thin like 10-20 um. As a result, there might be a chance to have partial cell overlap in the z-axis, potentially leading to transcript mixing. Additionally, many cells are probably cut so their actual transcriptomes are inherently partial information, which makes direct comparison to scRNA-seq unfair. These aspects should be included for fair comparison issues.

(4) Expanding on the potential implications of the findings for developing new computational methods to address non-specific biases in downstream analyses would augment the manuscript's impact and relevance.

Reviewer #2 (Public Review):

Summary:

In the manuscript, Hartman et al. present a detailed comparison of 6 distinct multiplexed in situ gene expression profiling technologies, including both academic and commercial systems.

The main concept of the study is to evaluate publicly accessible mouse brain datasets provided by the platforms' developers, where optimal performance in showcasing their technologies is expected. The authors stress the difficulty of making a comparison with standard metrics, e.g., the count of total molecules per cell, considering the differences in gene panel sizes across platforms. To make a fair comparison, the authors conceived a metric of specificity performance, which is called "MECR", an average of mutually exclusive gene co-expression rates in the sample. The authors found that the rate mainly depends on the choice of cell segmentation method, thus reanalyzed 5 of these datasets (excluding STARmap PLUS, due to the lack of molecule location information) with an independent cell segmentation algorithm (i.e., Baysor). Based on the reanalysis, the authors clearly suggest the best-performing platform at the end of the manuscript.

Strengths:

I consider that the paper is a valuable contribution to the community, for the following two reasons:

(1) As the authors mentioned, I fully agree that the spatial transcriptomics community indeed needs better metrics in terms of comparison across technologies, rather than traditional metrics, e.g., molecule counts per cell. In that regard, I believe introducing a new metric, MECR, is quite valuable.

(2) This work highlights the differences in results based on the choice of cell segmentation used for each platform, which suggests a need for trying out different segmentation algorithms to derive the right results. I believe this is an urgent warning that should be widespread in the community as soon as possible.

Weaknesses:

I disagree with the conclusion of the manuscript where the authors compare the technologies and suggest the best-performing ones, because of the following major points:

(1) As the authors mentioned, MECR is a measure of "specificity" not "sensitivity". Still, the comparison of sensitivity was done with the mean counts per cell (Figure 3e). However, I strongly disagree with using the mean counts per cell as a measure of sensitivity because the comparison was done with different gene panels. The counts per cell can be highly dependent on the choice of genes, especially due to optical crowding.

(2) The authors compared sensitivity based on the Baysor cell segmentation, but in fact, Baysor uses spatial gene expression for cell segmentation, which depends on the sensitivity of the platform. Thus, a comparison of sensitivity based on an algorithm that is based on sensitivity seems to be nonsensical.

Author response:

We thank both reviewers for their constructive feedback. We were grateful to see that both reviewers found our work to be valuable to the field, and agreed that new metrics (including our introduced MECR) were important for dataset evaluation. We briefly respond to two main points from the reviewers.

(1) Key findings from our manuscript. While we do evaluate publicly available datasets in our manuscript, the focus/conclusion of our work is not to return a definitive ranking of in-situ technologies. As reviewers point out, our comparative evaluation is only in a single biological context, and we further note that many of these in situ platforms are rapidly evolving with new chemistries and gene panels.

Instead, the conclusion and purpose of our manuscript was to emphasize the importance and need for new metrics when evaluating spatial datasets. We propose an option, and demonstrate how cell segmentation can affect technical metrics, but also downstream biological analysis of in-situ datasets.

(2) Comparing technologies with different gene panels. The reviewers correctly point out that comparing technologies that use different gene panels is not a perfect benchmark. We agree that differences in molecular counts could arise due to biological differences in the abundance of targeted genes.

We did address this in Supplementary Figure 4, where we perform pairwise comparisons of each technology - and compute these only using overlapping genes that were measured by both technology. Our results are consistent with the analysis of full gene sets.

While we believe that regenerating in-situ datasets with identical gene panels is beyond the scope of this work (and is likely technically infeasible), we hope that our findings are still valuable and informative to the growing spatial community.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation