Affected cell types for hundreds of Mendelian diseases revealed by analysis of human and mouse single-cell data

Abstract
Editor's evaluation
Introduction
Results
Discussion
Methods
Data availability
References
Article and author information
Metrics

Abstract

Mendelian diseases tend to manifest clinically in certain tissues, yet their affected cell types typically remain elusive. Single-cell expression studies showed that overexpression of disease-associated genes may point to the affected cell types. Here, we developed a method that infers disease-affected cell types from the preferential expression of disease-associated genes in cell types (PrEDiCT). We applied PrEDiCT to single-cell expression data of six human tissues, to infer the cell types affected in Mendelian diseases. Overall, we inferred the likely affected cell types for 328 diseases. We corroborated our findings by literature text-mining, expert validation, and recapitulation in mouse corresponding tissues. Based on these findings, we explored characteristics of disease-affected cell types, showed that diseases manifesting in multiple tissues tend to affect similar cell types, and highlighted cases where gene functions could be used to refine inference. Together, these findings expand the molecular understanding of disease mechanisms and cellular vulnerability.

Editor's evaluation

The study presents analyses linking cell-types to monogenic disorders using over-expression of known disease-associated genes in single-cell data to identify disease-affected cell types for 328 Mendelian diseases. Overall, this important study combines multiple data analyses to quantify the connection between cell types and human disorders. Compelling analyses using stringent and rigorous statistical methodologies support the conclusions of this study.

https://doi.org/10.7554/eLife.84613.sa0

Introduction

Hereditary diseases affect 6% of the world population and typically lack cure. Identification of their genetic and molecular basis is challenging, limiting their diagnosis and treatment (Ferreira, 2019). Knowledge of tissues and cell types that manifest with pathophysiological changes in patients was shown to facilitate the understanding of disease mechanisms (Hekselman and Yeger-Lotem, 2020). For example, transcriptomic analysis of muscle tissues helped to genetically diagnose patients with rare muscle disorders (Cummings et al., 2017). Likewise, identification of a rare cell type affected by cystic fibrosis opened new avenues for treatment (Montoro et al., 2018; Plasschaert et al., 2018). However, whereas affected tissues may be evident for many Mendelian diseases, the exact cell types that manifest with pathophysiological changes within affected tissues are often unknown.

Owing to massive single-cell profiling of mammalian tissues (Jones et al., 2022), investigation of diseases in cellular contexts has become feasible. In particular, it was shown that genes whose aberration leads to disease (i.e., disease genes) tend to be expressed preferentially in cell types that express pathology (denoted disease-affected cell types). For instance, 21/29 mouse homologs of human disease genes associated with nephrotic syndrome were shown to be upregulated in podocytes, the disease-relevant renal cell type (Park et al., 2018). In an inspiring manner, cell-type-specific expression of the cystic fibrosis gene CFTR in ionocytes of human and mouse airways revealed their role in cystic fibrosis (Montoro et al., 2018; Plasschaert et al., 2018). Likewise, the preferential expression of disease genes for 18 Mendelian muscle disorders revealed disease-affected muscle cell types (Eraslan et al., 2022). Thus, preferential expression of disease genes may serve as an indicator of disease-affected cell types, and has the potential to shed light on the cellular mechanisms that underlie Mendelian diseases.

Preferential expression of disease- and trait-associated genes has been used to illuminate cell types affected by complex traits in previously published studies (Dai et al., 2021; Eraslan et al., 2022; Jagadeesh et al., 2021; Kim-Hellmuth et al., 2020; Rouhana et al., 2021; Zhang et al., 2022). Rouhana et al., 2021, presented the ECLIPSER method, which tested whether the expression of genes mapped to trait-associated loci was enriched in specific cell types of a given tissue, when compared to a background set of genes associated with unrelated traits. The CSEA-DB repository applied a cell-type-specific enrichment analysis (CSEA): They assessed cell-type specificity of genes using t-statistics, and then tested whether the top 5% cell-type-specific genes were overrepresented in trait-associated genes (Dai et al., 2021). Eraslan et al., 2022, associated traits to relevant cell types by defining cell-type-specific gene modules and assessing their overlap with trait-associated genes. Zhang et al., 2022, introduced a polygenic single-cell disease relevance score (scDRS), which identified cells exhibiting excess expression of trait-associated genes. Interestingly, expression of trait-associated genes was weighted by trait-association and inversely by technical noise level in single-cell expression data, and the resulting polygenic scDRS score was compared to empirical distribution of control gene sets and all cells. Jagadeesh et al., 2022, presented sc-linker, which constructed continuous-valued gene sets that were differentially expressed in a cell type from healthy samples versus other cell types or versus the same cell type from disease samples, and then linked them to trait-related SNPs. Additional studies compared the differential expression of genes in cell types of diseased versus healthy tissues (Mathys et al., 2019; Segerstolpe et al., 2016) and showed that many trait-associated genes were cell-type specific (Smillie et al., 2019). Guan et al., 2021, defined cell-type specific gene interaction modules, which were overlapped with disease-associated genes. Lastly, the SC2disease database compared gene expression differences between cell types of diseased and healthy samples, among cell types of diseased samples, and among cell types of samples with varying degree of disease severity (Zhao et al., 2021). However, most efforts rarely validated or corroborated the inferred associations in an extensive manner.

Here, we developed a method that infers disease-affected cell types from the Preferential Expression of Disease genes in Cell Types (PrEDiCT). The summarized preferential expression of disease genes was compared to empirical distribution of control gene sets in the same cell type, allowing for the inference of associations between diseases and likely affected cell types. We applied PrEDiCT to 1,140 Mendelian diseases that manifest in six human tissues and inferred likely affected cell types for 328 diseases. We corroborated our findings by text-mining of PubMed records, expert curation, and by their recapitulation in mice. The resulting scheme and large-scale resource allowed us to show that diseases manifesting in multiple tissues tend to affect similar cell types, to refine inference of disease-affected cell types by focusing on gene functions, and to explore characteristics of disease-affected cell types.

Results

Preferential expression of disease genes indicates disease-affected cell types

To identify the cell types affected by Mendelian diseases with known disease genes, we developed the PrEDiCT scoring scheme (Figure 1A). Below we describe the PrEDiCT workflow, including data acquisition, PrEDiCT score calculation, and validation.

Figure 1 with 4 supplements see all

Download asset Open asset

Overview of PrEDiCT calculation and assessment.

(A) The PrEDiCT workflow. In step 1, we analyzed single-cell expression data from six human tissues and 129 cell types, and associated 1,140 Mendelian diseases with their affected tissues. In step 2, we calculated the preferential expression of disease genes in cell types of disease-affected tissues, used their median to produce the PrEDiCT score per disease and cell type, and assessed significance of each score. In step 3, we validated likely disease–cell-type associations (i.e., PrEDiCT ≥1, FDR <0.1) via literature text-mining, expert curation, and analysis of mouse single-cell expression data. (B) The distribution of PrEDiCT scores in human (median –0.25±0.93). (C) The preferential expression of genes causal for primary ciliary dyskinesia (PCD) and the PrEDiCT scores of PCD in lung cell types. Preferential expression values and the percentage of cells expressing a gene are indicated by the color and the size of each circle, respectively. The resulting PrEDiCT score is displayed on the right, colored by the score value. Bold outline marks likely disease-affected cell types. (D) PrEDiCT scores of Heinz body anemias across human bone marrow cell types depicted as described in panel C. (E) PrEDiCT scores of mitochondrial complex deficiencies across human skeletal muscle cell types depicted as described in panel C. Mitochondrial complex deficiencies were likely to affect slow and fast muscle cells, except for mitochondrial complex II deficiency whose PrEDiCT scores were highest yet insignificant in these cell types (FDR = 0.61 and 0.23, respectively). (F) False-positive (red) and false-negative (orange) rates of disease–cell-type associations (y-axis) across FDR thresholds (x-axis). Rates were estimated based on literature-supported pairs. The dashed line marks the FDR cutoff for likely associations. (G) The overlap between likely disease–cell-type associations (total of 489, left circle) and literature-supported associations (total of 229, right circle) out of all 34,249 possible associations. The overlap of 41 associations was significant (p<E-15, Fisher’s exact test), supporting the validity of likely associations.

Figure 1—source data 1 The Source Data file contains data used to generate Figure 1B-G.: https://cdn.elifesciences.org/articles/84613/elife-84613-fig1-data1-v2.xlsx
Download elife-84613-fig1-data1-v2.xlsx

Data of annotated human single-cell transcriptomes were obtained from Tabula Sapiens (Jones et al., 2022). We focused on tissues with two or more samples with ≥800 sequenced cells that were also sequenced in mice [(Tabula Muris, 2018); Methods]. Tissues included bone marrow, lung, skeletal muscle, spleen, tongue and trachea, and altogether were comprised of 129 cell types. Next, we associated these tissues with Mendelian diseases that affect them, based on their phenotypic abnormality annotations in Human Phenotype Ontology (HPO) database [(Köhler et al., 2021); Methods]. To assess the reliability of HPO annotations we compared them to manually curated annotations that were available in ODiseA database [(Hekselman et al., 2022); Methods]. This asserted 490 of 649 (76%) annotations available for these diseases in ODiseA, supporting the usage of HPO as an indicator of the disease-affected tissue. We then gathered the respective disease genes from the Online Mendelian Inheritance in Men (OMIM) database (Amberger et al., 2019). Overall, 1140 diseases and 2434 disease genes were associated to at least one affected tissue, with the majority associated to skeletal muscle (Figure 1—figure supplement 1 and Supplementary file 1).

To calculate PrEDiCT scores, we computed the preferential expression of disease genes in each cell type relative to other cell types of the disease-affected tissue (Methods). Next, we set the PrEDiCT score of a disease in each cell type to the median preferential expression of the respective disease genes. Statistical significance of the score was determined using a permutation test (Methods; Figure 1—figure supplement 2). In total, we analyzed 34,249 disease–cell-type associations (Supplementary file 2). PrEDiCT score distribution is shown in Figure 1B.

We identified 489 (1.4%) ‘likely’ associations (PrEDiCT ≥ 1, FDR <0.1), covering 328/1140 diseases and 102/129 cell types. Although most likely associations involved diseases with a single disease gene, the fraction of diseases with likely associations was higher for diseases with multiple disease genes (Figure 1—figure supplement 3). As proof-of-concept, primary ciliary dyskinesia (PCD), which has 31 disease genes expressed in lung and is characterized by damaged ciliary machinery in lung ciliated cells (Leigh et al., 2019), was indeed likely associated with lung ciliated cells (Figure 1C). Additionally, Heinz body anemias, which is characterized by accumulation of inclusion bodies in erythrocytes (Herman et al., 2023), was indeed likely associated with bone marrow erythrocytes (Figure 1D). Lastly, compatible with the high demand for mitochondrial activity in muscle cells, four out of five mitochondrial complex deficiencies were likely associated with slow and fast muscle cells (Figure 1E). ‘Mitochondrial complex II deficiency’ also scored highest in slow and fast muscle cells yet its PrEDiCT scores were insignificant, potentially due to the stringency of our statistical analysis (Figure 1E). As a negative control, gracile bone dysplasia that might manifest with ankyloglossia (also known as ‘tongue-tie’) is not expected to inflict on cell types of tongue tissue, and indeed no likely affected cell-type was detected (Supplementary file 2). Likewise, platelet-type bleeding disorder, which affects bone-marrow–derived platelets, had no likely affected cell-type in bone-marrow since platelets and their precursor cells (megakaryocytes) were missing from that tissue.

To assess at larger scale whether likely associations indicate disease-affected cell types, we turned to literature text-mining and to expert annotations. For text-mining, we postulated that diseases and their affected cell types will co-appear in the literature more frequently than expected by chance. To estimate co-appearance, we extracted PubMed records mentioning a disease or a cell type in our dataset and tested the significance of the co-appearance of all disease–cell-type pairs (Methods). Records were extracted by using Biopython package, which retrieves up to 9,999 records per term (Cock et al., 2009). We identified 229 ‘literature-supported’ pairs that co-appeared in the literature more often than expected by chance (adjusted p<0.001, Chi-squared test; Supplementary file 3). PCD, for example, co-appeared significantly with lung ciliated cells, as well as with respiratory goblet, mucous, and basal cells (the latter are precursors of the other three). Based on literature-supported pairs, we estimated the false-positive and false-negative rates. The false-positive rates for likely associations were low (Methods; Figure 1F). Likely disease–cell-type associations were enriched for literature supported pairs (41/489, 8.4%) relative to all disease–cell-type associations (229/34,249, 0.7%; p<E-15, Fisher’s exact test; Figure 1G). Lastly, we repeated the above analyses using expert-curated annotations of disease-affected cell types, showing a low false-positive rate (Figure 1—figure supplement 4A) and a higher enrichment for expert-curated associations (p=1.3E-4, Fisher’s exact test; Figure 1—figure supplement 4B). Altogether, these results support likely associations as indicators of disease-affected cell types.

Disease-affected cell types are recapitulated in mouse

To further assess whether PrEDiCT scores indicate disease-affected cell types, we tested whether matching cell types were likely affected in mice. For this, we downloaded mouse single-cell transcriptomes for the six tissues from Tabula Muris, 2018. These data consisted of 46 annotated cell types and some unannotated subsets (Figure 2A).

Figure 2 with 1 supplement see all

Download asset Open asset

Recapitulation of disease-affected cell types in mouse.

(A) The number of human cell types annotated by Tabula Sapiens [(Jones et al., 2022); red], and the number of mouse cell types annotated by [Tabula Muris, 2018; grey] and this study (blue). (B) The distribution of PrEDiCT scores in mouse. (C) The preferential expression of mouse orthologs of PCD disease genes and the PrEDiCT scores of PCD in mouse lung cell types. Preferential expression values and the percentage of cells expressing a gene are indicated by the color and the size of each circle, respectively. The resulting PrEDiCT score is indicated on the right colored by the score value. Bold outline marks likely disease-affected cell types. (D) PrEDiCT scores of Heinz body anemias across mouse bone marrow cell types depicted as described in panel C. (E) PrEDiCT scores of mitochondrial complex deficiencies across mouse skeletal muscle cell types depicted as described in panel C. Mitochondrial complex deficiencies were likely associated with striated muscle cells, except for mitochondrial complex II deficiency whose PrEDiCT scores were highest yet insignificant in these cell types (FDR = 0.65). (F) The correlation between PrEDiCT scores in human (X-axis) and mouse (Y-axis) cell types. Each dot represents a distinct pair. PrEDiCT scores of non-matching cell types did not correlate (left; r=−0.02, Spearman correlation), in contrast to PrEDiCT scores of matching cell types (right; r=0.38, p<E-15). (G) The cell types affected by the same disease in human and mouse tended to match each other (green) more than expected by chance (grey) according to 1,000 repeats in a permutation test. Error bars represent the standard deviation of the number of randomly matching cell types between the species. Adjusted **p<0.01 and ***p<0.001, permutation test.

Figure 2—source data 1 The Source Data file contains data used to generate Figure 2A-G.: https://cdn.elifesciences.org/articles/84613/elife-84613-fig2-data1-v2.xlsx
Download elife-84613-fig2-data1-v2.xlsx

To improve the annotation of cell types, we reanalyzed the single-cell mouse transcriptomes (Methods). Altogether, we obtained 97 cell clusters across all six tissues (Supplementary file 4). Sixteen clusters overlapped considerably with cell types annotated by Tabula Muris and were thus annotated similarly. We annotated the 81 remaining clusters by careful examination of the expression of known cell-type marker genes (Methods), thereby improving cell type annotation per tissue (Figure 2A). For instance, the number of annotated cell types in skeletal muscle increased from six to 19. Newly annotated cell types included clinically relevant cell types, such as basement-membrane residing fibroblasts that are a main source of different collagens and are essential for skeletal muscle physiology (Kivirikko et al., 1995; Zou et al., 2008). Next, we aimed to identify similar cell types between human and mouse. This was based on expression of orthologous marker genes and the matchSCore2 package [Methods; (Mereu et al., 2020)]. As expected, a large variety of human cell types had a matching mouse cell type (82/129) and vice versa (70/97; Supplementary file 5). The impact of disease genes on the matching was negligible, as shown by repeating the identification of similar cell types upon excluding disease genes from the list of marker genes (Methods).

Next, we tested whether diseases affected matching cell types between human and mouse. For this, we calculated PrEDiCT scores in mouse cell types, based on expression of mouse orthologs of human disease genes. The distribution of PrEDiCT scores was similar between mouse and human, and included 380/24,638 (1.5%) likely disease–cell-type associations (PrEDiCT ≥ 1, FDR < 0.1; Figure 2B; Supplementary file 6). Compatible with our previous proof-of-concept cases in human, PCD likely affected mouse lung ciliated epithelial cells which match human lung ciliated cells (Figure 2C), and Heinz body anemias likely affected mouse bone marrow reticulocytes and erythroblasts which match human erythrocytes (Figure 2D). Likewise, four mitochondrial complex deficiencies likely affected mouse striated muscle cells, which matched human slow and fast muscle cells (Figure 2E). Similar to the results in human, ‘mitochondrial complex II deficiency’ scored highest in these cell types, yet the scores were insignificant (Figure 2E). We compared the PrEDiCT scores of all cell-type pairs of the corresponding tissues between the species. Whereas PrEDiCT scores of non-matching cell types did not correlate (r=−0.02, Spearman correlation), PrEDiCT scores of matching types were modestly correlated (r=0.38, p<E-15, Spearman correlation; Figure 2F). Of the 328 diseases with likely affected cell types in humans, 97 diseases (30%) affected matching cell types in mice, a fraction that was larger than expected by chance (adjusted p<0.01, permutation test; Figure 2G and Supplementary file 6; Methods). This enrichment, and the observation that commonly affected cell types were not biased toward specific cell types, support the validity and generality of the PrEDiCT scheme.

Diseases with multiple inflicted tissues affect similar cell types

Most diseases with a likely affected cell type manifested clinically in two or more tissues, and were denoted multi-tissue diseases (168/328, 51%; Figure 3A). We suspected that these diseases affect a cell type that is similar between the disease-manifesting tissues. To test this hypothesis, we identified matching cell types between tissues, and then examined whether cell types affected by the same disease were enriched for matching cell types. Matching cell types were identified using matchSCore2 package [Methods; (Mereu et al., 2020)]. Overall, we identified 840 pairs of matching cell types between tissues, of which 52% were immunocytes, including macrophages, neutrophils, T cells, B cells, as well as endothelia and fibroblasts. Next, we examined whether the cell types likely affected by the same disease were enriched for matching cell types. We found that 18% (30/168) of the diseases likely affected at least one pair of matching cell types, a fraction higher than expected by chance (adjusted p<0.01, permutation test; Figure 3B and Figure 3—figure supplement 1; Methods). For instance, chronic granulomatous disease (CGD) results in splenomegaly and pneumonia due to impaired phagocytes (i.e., neutrophils, macrophages, and monocytes; Anjani et al., 2020; Leiding and Holland, 1993). These cell types were indeed the likely affected cell types for CGD in both spleen and lung (Figure 3C). Also, CGD likely affected bone-marrow phagocytes, consistent with bone-marrow transplant being a curative treatment for this disease (Leiding and Holland, 1993).

Figure 3 with 1 supplement see all

Download asset Open asset

Multi-tissue diseases tend to affect similar cell types in those tissues.

(A) The numbers of diseases with likely affected cell type across tissues (Y-axis) grouped by the number of affected tissues (X-axis). (B) The number of diseases that likely affect at least one pair of matching cell types between tissues (green). This number was higher than expected by chance (dark and light grey correspond to selecting cell types at random from the first tissue or the second one, respectively) according to 1,000 repeats in a permutation test. Only pairs of tissues with ≥2 shared diseases are shown. Error bars represent the standard deviation of the number of randomly matching cell types between the tissues. Shown are maximal adjusted p-value for each pairwise randomization: **p<0.01, ***p<0.001. (C) PrEDiCT scores of cell types affected by chronic granulomatous disease (CGD) in spleen, lung, and bone marrow. Bold outline marks likely affected cell types. Likely affected cell types that were matching among the tissues were connected by green lines.

Figure 3—source data 1 The Source Data file contains data used to generate Figure 3A-C.: https://cdn.elifesciences.org/articles/84613/elife-84613-fig3-data1-v2.xlsx
Download elife-84613-fig3-data1-v2.xlsx

Cellular context prediction refined using gene functions

So far, we assumed that the genes underlying a disease work through a single cellular context. However, the same disease phenotype might arise from mechanisms with distinct cellular contexts. For example, the bone-marrow disease hyper-IgM immunodeficiency could arise from mutations in CD40 gene, affecting B cells, or mutations in CD40 ligand (CD40LG), affecting CD4 T cells (Yazdani et al., 2019). In such cases the PrEDiCT scheme might fail to infer the affected cell types. Indeed, only CD4 T cells were predicted as likely affected by hyper-IgM immunodeficiency (PrEDiCT = 1.6, FDR <0.048). We hypothesized that the cellular context of such ‘multi-cellular diseases’ could be revealed by separately applying the PrEDiCT scheme to subsets of disease genes with distinct functions.

We first focused on diseases that alter intercellular communication. We identified seven diseases in our dataset that were caused by mutations in genes encoding ligands and their receptors. Next, we applied PrEDiCT scheme separately to ligands and to receptors (Methods). As expected, hyper-IgM immunodeficiency was found to likely affect naïve B cells through CD40 (receptor; PrEDiCT = 2.9, FDR = 0.06), and CD4αβ T cells through CD40LG (ligand; PrEDiCT = 3.7, FDR <0.001; Figure 4A). Another example is autosomal recessive limb-girdle muscular dystrophy, which is caused by mutations in laminin-α2 (LAMA2) or its receptor α-dystroglycan (DAG1). By applying PrEDiCT scheme without subdividing the disease genes, we could not infer any likely affected cell type. Yet, by separately applying PrEDiCT scheme to LAMA2 (ligand) and DAG1 (receptor), we inferred mesenchymal stem cells and satellite stem cells as the likely affected cell types, respectively (PrEDiCT = 3.8 and 3.7, FDR <0.05; Figure 4B). In support of the prediction for DAG1, disruption of DAG1 in satellite stem cells has been associated with the defective muscle regeneration seen in diseased patients (Cohn et al., 2002; Servián-Morilla et al., 2020). In support of the prediction for LAMA2, the disease was attenuated in ligand-knockout [Col6a1(-/-)] model mice by supplying wild-type mesenchymal-stem-cell derived fibroblasts. This treatment not only restored the collagen VI of the tissue, but also rescued defects in satellite stem cells (Urciuolo et al., 2013).

Figure 4

Download asset Open asset

Refining cell-type inference using gene functions.

(A) PrEDiCT scores of cell types likely affected by hyper-IgM immunodeficiency in bone marrow, calculated separately for ligand- or receptor-encoding disease genes. (B) PrEDiCT scores of cell types likely affected by autosomal recessive limb-girdle muscular dystrophy in skeletal muscle, calculated separately for ligand- and receptor-encoding disease genes. (C) PrEDiCT scores of cell types likely affected by heritable cancers in lung and trachea, calculated separately for oncogenes and tumor suppressor genes. Likely affected cell types that matched between the tissues were connected by a green line. Bold outline marks likely-affected cell types.

Figure 4—source data 1 The Source Data file contains data used to generate Figure 4A-C.: https://cdn.elifesciences.org/articles/84613/elife-84613-fig4-data1-v2.xlsx
Download elife-84613-fig4-data1-v2.xlsx

Next, we focused on heritable cancers, where disease genes could be divided by their function to either oncogenes or tumor suppressor genes (TSGs). We retrieved from the Cancer Gene Census (Sondka et al., 2018) heritable cancers that manifest in any of the tissues in our dataset (Methods; Supplementary file 7). We first applied PrEDiCT scheme jointly to all cancer-associated genes of the same tissue, and then applied it separately to oncogenes or TSGs, which allowed us to infer distinct cell types for oncogenes and TSGs. For example, in airway cancers (lung and trachea), the joint application of PrEDiCT scheme inferred both basal cells and intermediate monocytes as likely affected. The separate application inferred basal cells as likely affected by oncogenes, and intermediate and non-classical monocytes as likely affected by TSGs (Figure 4C).

Characteristics of disease-affected cell types

Are certain cell types more likely to be affected by Mendelian diseases? To answer this, we determined the ‘susceptibility’ of each cell type as the percentage of diseases affecting it out of all diseases that affect its tissue (Supplementary file 8). Most susceptible were capillary endothelial cells of the tongue (Susceptibility = 25%; Figure 5A), which were affected by diseases that result in morphologic aberrations of the tongue (e.g., telangiectasia and macroglossia). Next, we asked whether cell-type susceptibility was correlated with cell-type prevalence (i.e., the proportion of its cells out of all cells of its tissue; Supplementary file 8). The two measures did not correlate (r=−0.02, Pearson correlation; Figure 5A). For example, fast and slow muscle cells were the most susceptible cell types in skeletal muscle, while each accounting for less than 1% of the cells in that tissue.

Figure 5

Download asset Open asset

Characteristics of disease-affected cell types.

(A) Cell-type susceptibility (Y-axis) did not correlate with cell-type prevalence (X-axis). The blue line represents linear correlation (r=−0.02, Pearson correlation). (B) Cell-type susceptibility varied between cell classes (p=1.5E-3, ANOVA test). Among the cell classes with many cell types shared among tissues, immunocytes and epithelia were the least susceptible, and endothelia were the most susceptible compared to other cell types (adjusted p<0.05, Mann-Whitney U test; Methods).

Figure 5—source data 1 The Source Data file contains data used to generate Figure 5A, B.: https://cdn.elifesciences.org/articles/84613/elife-84613-fig5-data1-v2.xlsx
Download elife-84613-fig5-data1-v2.xlsx

Lastly, we assessed the susceptibility of cell classes (e.g., capillary endothelial cells were classified as endothelia; Supplementary file 8). The most common cell classes were immunocytes, epithelia and endothelia covering 61, 21, and 17 cell types, respectively, in over four tissues. Cell classes had varying susceptibility (Figure 5B; p=1.5E-3, ANOVA). Among the common cell classes, immunocytes and epithelia were the least susceptible, and endothelia were the most susceptible compared to other cell types (adjusted p<0.05, Mann-Whitney U test; Methods).

Discussion

Here, we presented the PrEDiCT scheme for identifying disease-affected cell types based on the cell-type preferential expression of Mendelian disease genes. Preferential expression of disease genes was previously explored in tissue contexts and was shown to characterize disease-affected tissues (Barshir et al., 2018; Barshir et al., 2014; Hekselman and Yeger-Lotem, 2020; Lage et al., 2008; Sonawane et al., 2017). Recently, cell-type preferential expression has been used to highlight potentially-affected cell types for Mendelian diseases and complex traits, often in combination with cell-type regulatory information and enrichment analyses (Dai et al., 2021; Eraslan et al., 2022; Jagadeesh et al., 2021; Kim-Hellmuth et al., 2020; Zhao et al., 2021; Rouhana et al., 2021; Zhang et al., 2022). However, large-scale in-silico validation was rarely conducted (Montoro et al., 2018; Plasschaert et al., 2018). Here, in contrast, we corroborated likely disease-affected cell types by literature text-mining (Figure 1), expert curation (Figure 1—figure supplement 4), and recapitulation in mouse (Figure 2).

The use of PrEDiCT scheme to identify affected cell types has limitations. Aberrations in disease genes that are preferentially expressed in a cell type do not necessarily lead to disease phenotypes in that cell type, leading to erroneous annotation of disease-affected cell types. For example, metabolic–myopathy-associated genes were upregulated in adipocytes of both muscle and breast, yet only muscle adipocytes showed myopathy phenotypes (Eraslan et al., 2022). To reduce the risk for erroneous annotations, we applied PrEDiCT only to cell types of disease-affected tissues (Figure 1—figure supplement 1). To enhance the robustness of likely associations, PrEDiCT scores included the cell-type preferential expression of all disease genes, similarly to Zhang et al., 2022. For PCD, the PrEDiCT score included 31 genes, and indeed pointed to known disease-affected cell types (Figure 1C). Additionally, similarly to other RNA-based schemes, PrEDiCT is oblivious to post-translational regulation, and, since most available single-cell transcriptomic datasets do not contain full-length gene reads, PrEDiCT is also oblivious to cell-type preferential expression of alternatively spliced transcripts. Lastly, preferential expression is just one of several mechanisms that lead to tissue-selective disease manifestations (Hekselman and Yeger-Lotem, 2020). Nevertheless, by applying PrEDiCT to 1,140 diseases and single cell transcriptomes of six distinct tissues, we revealed affected cell types for 29% of the Mendelian diseases in our dataset (Supplementary file 2). Interestingly, this fits with previous observations that about 30% of Mendelian diseases manifest clinically in a tissue that overexpresses disease genes (Barshir et al., 2018; Barshir et al., 2014; Hekselman and Yeger-Lotem, 2020; Lage et al., 2008).

We supported likely disease–cell-type associations by three lines of evidence. The first was text-mining of literature for co-appearance of diseases and cell-types. Text-mining enabled large-scale in-silico assessment, yet co-appearance could also reflect negative and/or speculative results. Our second line of evidence was expert curation. This analysis, although on a smaller scale, provided additional support for the relevance of likely associations (Figure 1—figure supplement 4). The third line of evidence came from recapitulation of results using mouse single-cell data. Yet, since the patterns of variation across genes tend to be similar, mouse single-cell data did not provide statistically independent information. This could lead to more false-positive associations for diseases with a single disease gene. However, the fraction of associations that were recapitulated in mouse did not differ between diseases with a single or multiple disease genes, supporting this line of evidence (Figure 2—figure supplement 1). Another caveat in the comparison between human and mouse was that gene expression data drove both PrEDiCT calculation and human-mouse cell-type matching. This caveat too had limited impact, since matched cell-types were almost identical upon excluding disease genes. Notably, it would be intriguing to integrate data obtained from human and mouse to increase discovery power in future applications. Altogether, the three lines of evidence provided complementary support for likely associations.

Overall, 328 diseases affected 102/129 cell types. Interestingly, there was no correlation between cell type prevalence and its likelihood to be affected (Figure 5A). In particular, endothelia were more likely to be affected by diseases than other prevalent cell classes, whereas immunocytes and epithelia were the least likely to be affected among the prevalent cell classes (Figure 5B). This suggests that these cell types are either more resilient than other cell types, or, alternatively, that their impairment is lethal to the organism. To analyze this further, a cross-tissue study of 20 human tissues showed that immunocytes had similar expression signatures across tissues, in accordance with their common functions, whereas endothelia had tissue-specific expression signatures that reflect their tissue-specialized roles (Jones et al., 2022). Hence, it seems that germline impairment of immunocytes is more likely lethal, whereas the tissue-specialization of endothelia limits the impact of their germline impairment and thus facilitates overall survival. Yet, our analyses focused on diseases and cell types from six specific tissues and thus were limited in scope. The generalizability of our observations therefore awaits analysis of larger sets of diseases and cell types. In the future, once single cell technologies could offer a comprehensive coverage of expressed genes per cell, it will also be intriguing to assess disease heterogeneity across cells within a cell type.

Our expansive resource of diseases and likely affected cell types could be used to interrogate disease etiologies. For example, we showed that mitochondrial diseases tend to affect muscle cells, in accordance with their energetic demands (Figure 1E). Additionally, we showed that diseases inflicting on multiple tissues likely affect similar cell types among those tissues, thereby providing a mechanistic explanation for this phenomenon (Figure 3). Lastly, we demonstrated that the inference of likely affected cell types could be refined by applying PrEDiCT scheme separately to subsets of disease genes with distinct functions. For instance, by separately analyzing oncogenes and TSGs of heritable lung and trachea cancers, we found that oncogenes likely affect basal cells, and TSGs likely affect monocytes (Figure 4C). This could suggest that tissue-constructive cells are more susceptible to oncogene mutations, whereas protective cells, such as monocytes, are more susceptible to TSG mutations. The latter is consistent with the function of monocytes in the elimination of malignant transformation of cells in different tissues (Robinson et al., 2021). Another interesting subset of diseases are those that impair intercellular communication. A recent study explored the cell types affected by monogenic muscular disorders (Eraslan et al., 2022). Consistent with their results, we found that autosomal recessive limb-girdle muscular dystrophy disrupts intercellular communication among muscle cell types, via mutations in the DAG1 receptor (Figure 4B). Yet, by applying PrEDiCT separately to the ligand of DAG1, LAMA2, we also highlighted the involvement of mesenchymal stem cells in the disease (Urciuolo et al., 2013). By exploring disease genes in appropriate cellular context, we enhanced the mechanistic understanding of disease emergence.

The associations between diseases and affected cell types, though supported by literature and recapitulated in mice, remain putative. Experimental testing could be performed in human cell lines, or in mouse models, in light of the many shared cell types between human and mouse (Figure 2). These validation experiments have a huge potential to open new directions in disease research and accelerate cell-directed gene therapy.

Methods

Human single-cell transcriptomics analysis

Single-cell transcriptomes were downloaded from Tabula Sapiens (Jones et al., 2022). We focused on tissues that consisted of ≥2 samples with ≥800 cells and were sequenced in both human and mouse using microfluidic droplet-based 3’-end technology. These tissues included bone marrow, lung, skeletal muscle, spleen, tongue, and trachea. Analysis was done using Seurat package v4.0.5 (Hao et al., 2021). Per tissue, gene expression levels were normalized cell-wise using the NormalizeData function. Henceforth, we considered only genes with normalized counts ≥ 0.05 in≥10% of cells of at least one cell type (Supplementary file 9).

Mouse single-cell transcriptomics analysis

Single-cell transcriptomes were downloaded from Tabula Muris, 2018. To improve the annotation of mouse cells from Tabula Muris, we reanalyzed the transcriptomic profiles of each tissue. First, we selected 2,000 variably expressed genes using FindVariableFeatures function in Seurat. The minimum and maximum average normalized expression of genes across cells were set to 0.05 and 3, respectively (mean.cutoff=c(0.05,3)). We scaled and centered the expression values of the variably expressed genes using ScaleData function, while correcting for the different samples (vars.to.regress=‘mouse.id’). Then, we projected their expression on all significant principal components (PCs; p<0.001, JackStraw test) ordered by their explained variance.

Next, we applied a two-phase clustering process. We clustered cells using Seurat FindNeighbors and FindClusters functions based on all the top significant PCs. To resolve over-clustering of cells, we hierarchically ordered cell clusters using BuildClusterTree function. Then, we tested whether cells from different splits in the tree were distinguishable, according to out-of-bag error of a random forest classifier that was trained on variably expressed genes. Indistinguishable cell clusters (p≥0.05) were merged. To estimate sample-based differences between related clusters, we repeated hierarchically ordering of merged cell clusters. Terminal splits of cell clusters that included uneven numbers of cells from different samples (adjusted p<0.001, Chi-square) were merged. Such differences were observed only in tongue and lung tissues. Gene expression normalization and average calculations were applied as described for human tissues. Henceforth, we considered only genes with normalized counts ≥ 0.05 in ≥10% of cells of at least one cell type (Supplementary file 10).

Annotations of mouse cell clusters

We compared the clusters obtained above to the cell type annotations of Tabula Muris. A cluster where a similar annotation was common to >90% of its cells, and where > 90% of cells with that annotation were within that cluster, was annotated according to Tabula Muris. All other clusters were manually annotated based on highly expressed marker genes (Z score ≥2). We manually searched PubMed for evidence that any of these markers indicates a known cell type (including cell identity and\or function), preferably in the context of the relevant tissue. A cluster was annotated if at least two of its markers indicated the same cell type. To comply with other studies, cell types were named as in Cell Ontology (Diehl et al., 2016). Cell type annotations, relevant marker genes and supporting literature appear in Supplementary file 4.

Annotation of diseases-affected tissues

Disease data were retrieved from OMIM (Amberger et al., 2019) and included phenotypes with a known molecular basis and phenotypic series. Each disease was associated with its disease genes and their mouse orthologs according to OMIM, and was associated with its affected tissue according to HPO phenotypic abnormality annotations (Köhler et al., 2021). We focused on disease that were cataloged by HPO as having main phenotypic abnormality in blood and blood-forming tissues (HP: 0001871), lungs (HP: 0002088), musculature (HP: 0003011), spleen (HP: 0001743), tongue (HP: 0000157), and trachea (HP: 0002778), in accordance with the six tissues that we analyzed. Phenotypic abnormalities that HPO categorized under each of these main terms were also included. We assessed the reliability of associations by comparing them to manually curated associations of diseases and their affected tissues from ODiseA database (Hekselman et al., 2022). For this, we downloaded from ODiseA all the diseases affecting blood and bone marrow, lung, and trachea. ODiseA annotations indicating that a tissue is unaffected by a disease were excluded from further analysis.

PrEDiCT score calculation and significance assessment

For each disease, we calculated its cell-type PrEDiCT scores in all cell types of the disease-affected tissue(s). The calculation was divided into three steps: (i) calculating the cell-type preferential expression of each disease gene; (ii) calculating the cell-type PrEDiCT score; and (iii) assessing the statistical significance of the cell-type PrEDiCT score. The three steps are detailed below.

Step (i): The cell-type preferential expression of a gene was set to the Z-score of its average expression in that cell type relative to cell types of the same tissue (Equation 1). Given the sparsity of single-cell data, per tissue, only genes whose average expression in any cell type exceeded the median average expression across genes and cell types were retained. Preferential expression values for human and mouse are available in Supplementary files 11 and 12, respectively.

P_{g}^{(c)} = [e_{g}^{(c)} - a v e r a g e (e_{g})] ∕ S D (e_{g})

$P_{g}^{(c)}$ denotes the preferential expression of gene g in cell type c. $e_{g}^{(c)}$ denotes the expression level e of gene g in cell type c. Average expression and standard deviation (SD) were calculated across all cell types of an affected tissue.

Step (ii): The PrEDiCT score of disease D in cell type c was set to the median cell-type preferential expression of its disease genes (g₁ to g_n; Equation 2).

{P r E D i C T}_{D}^{(c)} = m e d i a n [P_{g 1}^{(c)}, P_{g 2}^{(c)} \dots P_{g n}^{(c)}]

Step (iii): To assess whether a certain PrEDiCT score was significantly higher than expected by chance for a gene set of the same size, we applied a permutation test. Given a disease D with n disease-associated genes, we selected at random n genes expressed in any cell type, and computed the PrEDiCT score for this random gene set in each cell type of the disease-affected tissue (referred to as ‘random score’). We repeated this procedure 1,000 times, resulting in 1,000 random scores per disease and cell type. The p-value of the PrEDiCT score of disease D in cell type c ${(P r E D i C T}_{D}^{(c)})$ was set to the fraction of random scores in c that were at least as high as ${P r E D i C T}_{D}^{(c)}$ . P-values were then adjusted for multiple hypothesis testing per disease using the Benjamini-Hochberg procedure. The distribution of PrEDiCT scores were similar between tissues (Supplementary file 13).

Text-mining of PubMed records

We searched PubMed for records containing names of the disease in our disease dataset or names of human cell types in the disease-affected tissues. For this, we used eSearch function in Biopython package (Cock et al., 2009). The number of maximum records retrieved was set to the maximum that Biopython supports (retmax = 9999). Then, per tissue, we intersected the list of records of each tissue-affecting disease d with the list of records of each tissue cell type c, to identify records that mention both. Diseases with less than three records that mentioned it together with a specific cell type were excluded. Cell types that were not mentioned with any disease were excluded. Next, per tissue, we determined whether disease–cell-type pairs significantly co-appeared by applying a Fisher’s exact test. Pairs with Z-scores higher than expected (adjusted p<0.001, Bonferroni correction) were determined ‘literature-supported’.

Expert curation and assessment of disease-affected cell types

We assessed whether PrEDiCT score value is indicative of true associations by manual curation of a subset of associations with ranging PrEDiCT score values. Per tissue, we sorted all pairs of diseases and cell-types by their PrEDiCT scores, and then selected two pairs from percentile 0%, 25%, 50%, 75%, and 100%, resulting in a total of 10 pairs per tissue. Each pair was manually reviewed by a medical student to determine whether the cell type presents pathophysiological phenotypes in diseased patients. By using expert knowledge, literature, and OMIM records, each pair was designated as either affected, unaffected, or undetermined.

Determining matching cell types between tissues

For each pair of tissues, all cell types were compared to each other. For this, each cell type was associated with marker genes, namely genes with Z score ≥2. Markers of mouse cell types were converted to their human orthologous genes according to the Mouse Genome Informatics database (Bult et al., 2019). Next, we estimated the similarity between each pair of cell types using matchSCore2 package, which compared the two markers lists by using Jaccard index (Mereu et al., 2020). Cell types were determined as matching if their Jaccard index was ≥0.05 and in the top 10th percentile. To test the impact of disease genes on the matching of human and mouse cell types, we repeated the matching upon excluding disease genes from the list of marker genes. Most (109/122, 89%) of the cell-type pairs that matched originally also matched upon excluding disease genes.

Permutation tests for similarity between likely affected cell types

For each pair of tissues, we denoted one as the reference tissue (Tr) and the other as the test tissue (Tt). Success was determined, per disease, if any of the cell types in Tt were matching to any of the likely affected cell types in Tr. We tested the null hypothesis that the number of diseases for which success was determined (num_s) was not higher than expected by chance. For this, we carried out a permutation test. Per disease, we randomly selected cell types equal in number to the likely affected cell types in Tr and repeated the success test; specifically, we checked if the randomly selected cell types in Tr were matching to any of the likely affected cell types in Tt. We repeated this for all diseases with likely affected cell types in Tr, and recorded the number of randomly successful diseases (num_r). We repeated this procedure 1000 times. Significance was calculated as the fraction of cases where num_r≥num_s. p-Values were adjusted for multiple comparisons by Benjamini-Hochberg correction.

Analysis of likely affected cell classes

Susceptibility of a given cell type was set to the percentage of diseases affecting that cell type out of all diseases that affect that same tissue (Supplementary file 7). Cell types were grouped into one of five cell classes: fibroblasts, stem cells, myocytes, endothelia, epithelia and immunocytes, or were grouped as ‘other’. Then, we applied an analysis of variance (ANOVA) test to the proportions of diseases that likely affected each cell class using aov function in R v4.1.1. To further determine whether specific cell classes were more susceptible than others, we compared the cell-type susceptibility of prevalent cell classes (>15 cell types across ≥4 tissues each). to all other cell types, by using Mann-Whitney U test. p-Values were adjusted for multiple comparisons by Benjamini-Hochberg correction.

Ligands- and receptors-associated diseases

We extracted a list of 1625 pairs of ligands and their corresponding receptors from Jin et al., 2021. We retrieved all diseases with disease genes that included both a ligand and its receptor. We filtered this set to include diseases where both the ligand and its receptor were preferentially expressed in distinct cell types.

Heritable cancers

Data of heritable cancers were downloaded from the Cancer Gene Census of the Catalogue of Somatic Mutations in Cancer [COSMIC; (Sondka et al., 2018)]. Specifically, we downloaded germline tumor type, associated genes, and role in cancer (oncogenes or tumor suppressor genes) from tiers 1 and 2. We manually annotated germline tumor types to the six tissues included in our study (Supplementary file 8).

Data availability

All data generated or analyzed during this study are included in the manuscript and in the supporting files. The Source Data files contain data used to generate all figures. Additional data and code to redo analysis are available at GitHub https://github.com/hekselman/PrEDiCT(copy archived at Hekselman, 2024) and Dryad https://doi.org/10.5061/dryad.9w0vt4bm7.

The following data sets were generated

1. Hekselman I
2. Yeger-Lotem E
(2024) Dryad
Affected cell types for hundreds of Mendelian diseases revealed by analysis of human and mouse single-cell data.

https://doi.org/10.5061/dryad.9w0vt4bm7

The following previously published data sets were used

(2022) NCBI Gene Expression Omnibus
ID GSE201333. Tabula Sapiens.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE201333
1. Tabula Muris Consortium
(2018) NCBI Gene Expression Omnibus
ID GSE109774. Tabula Muris: Transcriptomic characterization of 20 organs and tissues from Mus musculus at single cell resolution.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE109774

References

(2019) OMIM.org: leveraging knowledge across phenotype-gene relationships
Nucleic Acids Research 47:D1038–D1043.

https://doi.org/10.1093/nar/gky1151
- PubMed
- Google Scholar
1. Anjani G
2. Vignesh P
3. Joshi V
4. Shandilya JK
5. Bhattarai D
6. Sharma J
7. Rawat A
(2020) Recent advances in chronic granulomatous disease
Genes & Diseases 7:84–92.

https://doi.org/10.1016/j.gendis.2019.07.010
- PubMed
- Google Scholar
(2014) Comparative analysis of human tissue interactomes reveals factors leading to tissue-specific manifestation of hereditary diseases
PLOS Computational Biology 10:e1003632.

https://doi.org/10.1371/journal.pcbi.1003632
- PubMed
- Google Scholar
(2018) Role of duplicate genes in determining the tissue-selectivity of hereditary diseases
PLOS Genetics 14:e1007327.

https://doi.org/10.1371/journal.pgen.1007327
- PubMed
- Google Scholar
(2019) Mouse genome database (MGD) 2019
Nucleic Acids Research 47:D801–D806.

https://doi.org/10.1093/nar/gky1056
- PubMed
- Google Scholar
1. Cock PJA
2. Antao T
3. Chang JT
4. Chapman BA
5. Cox CJ
6. Dalke A
7. Friedberg I
8. Hamelryck T
9. Kauff F
10. Wilczynski B
11. de Hoon MJL
(2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics
Bioinformatics 25:1422–1423.

https://doi.org/10.1093/bioinformatics/btp163
- PubMed
- Google Scholar
1. Cohn RD
2. Henry MD
3. Michele DE
4. Barresi R
5. Saito F
6. Moore SA
7. Flanagan JD
8. Skwarchuk MW
9. Robbins ME
10. Mendell JR
11. Williamson RA
12. Campbell KP
(2002) Disruption of DAG1 in differentiated skeletal muscle reveals a role for dystroglycan in muscle regeneration
Cell 110:639–648.

https://doi.org/10.1016/s0092-8674(02)00907-8
- PubMed
- Google Scholar
1. Cummings BB
2. Marshall JL
3. Tukiainen T
4. Lek M
5. Donkervoort S
6. Foley AR
7. Bolduc V
8. Waddell LB
9. Sandaradura SA
10. O’Grady GL
11. Estrella E
12. Reddy HM
13. Zhao F
14. Weisburd B
15. Karczewski KJ
16. O’Donnell-Luria AH
17. Birnbaum D
18. Sarkozy A
19. Hu Y
20. Gonorazky H
21. Claeys K
22. Joshi H
23. Bournazos A
24. Oates EC
25. Ghaoui R
26. Davis MR
27. Laing NG
28. Topf A
29. Genotype-Tissue Expression Consortium
30. Kang PB
31. Beggs AH
32. North KN
33. Straub V
34. Dowling JJ
35. Muntoni F
36. Clarke NF
37. Cooper ST
38. Bönnemann CG
39. MacArthur DG
(2017) Improving genetic diagnosis in Mendelian disease with transcriptome sequencing
Science Translational Medicine 9:eaal5209.

https://doi.org/10.1126/scitranslmed.aal5209
- PubMed
- Google Scholar
1. Dai Y
2. Hu R
3. Manuel AM
4. Liu A
5. Jia P
6. Zhao Z
(2021) CSEA-DB: an omnibus for human complex trait and cell type associations
Nucleic Acids Research 49:D862–D870.

https://doi.org/10.1093/nar/gkaa1064
- PubMed
- Google Scholar
1. Diehl AD
2. Meehan TF
3. Bradford YM
4. Brush MH
5. Dahdul WM
6. Dougall DS
7. He Y
8. Osumi-Sutherland D
9. Ruttenberg A
10. Sarntivijai S
11. Van Slyke CE
12. Vasilevsky NA
13. Haendel MA
14. Blake JA
15. Mungall CJ
(2016) The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability
Journal of Biomedical Semantics 7:44.

https://doi.org/10.1186/s13326-016-0088-7
- PubMed
- Google Scholar
1. Eraslan G
2. Drokhlyansky E
3. Anand S
4. Fiskin E
5. Subramanian A
6. Slyper M
7. Wang J
8. Van Wittenberghe N
9. Rouhana JM
10. Waldman J
11. Ashenberg O
12. Lek M
13. Dionne D
14. Win TS
15. Cuoco MS
16. Kuksenko O
17. Tsankov AM
18. Branton PA
19. Marshall JL
20. Greka A
21. Getz G
22. Segrè AV
23. Aguet F
24. Rozenblatt-Rosen O
25. Ardlie KG
26. Regev A
(2022) Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function
Science 376:eabl4290.

https://doi.org/10.1126/science.abl4290
- PubMed
- Google Scholar
1. Ferreira CR
(2019) The burden of rare diseases
American Journal of Medical Genetics. Part A 179:885–892.

https://doi.org/10.1002/ajmg.a.61124
- PubMed
- Google Scholar
1. Guan J
2. Lin Y
3. Wang Y
4. Gao J
5. Ji G
(2021) An analytical method for the identification of cell type-specific disease gene modules
Journal of Translational Medicine 19:20.

https://doi.org/10.1186/s12967-020-02690-5
- PubMed
- Google Scholar
1. Hao Y
2. Hao S
3. Andersen-Nissen E
4. Mauck WM
5. Zheng S
6. Butler A
7. Lee MJ
8. Wilk AJ
9. Darby C
10. Zager M
11. Hoffman P
12. Stoeckius M
13. Papalexi E
14. Mimitou EP
15. Jain J
16. Srivastava A
17. Stuart T
18. Fleming LM
19. Yeung B
20. Rogers AJ
21. McElrath JM
22. Blish CA
23. Gottardo R
24. Smibert P
25. Satija R
(2021) Integrated analysis of multimodal single-cell data
Cell 184:3573–3587.

https://doi.org/10.1016/j.cell.2021.04.048
- PubMed
- Google Scholar
1. Hekselman I
2. Yeger-Lotem E
(2020) Mechanisms of tissue and cell-type specificity in heritable traits and diseases
Nature Reviews. Genetics 21:137–150.

https://doi.org/10.1038/s41576-019-0200-9
- PubMed
- Google Scholar
1. Hekselman I
2. Kerber L
3. Ziv M
4. Gruber G
5. Yeger-Lotem E
(2022) The Organ-Disease Annotations (ODiseA) Database of Hereditary Diseases and Inflicted Tissues
Journal of Molecular Biology 434:167619.

https://doi.org/10.1016/j.jmb.2022.167619
- PubMed
- Google Scholar
Software
1. Hekselman I
(2024) Predict, version swh:1:rev:71b591e1f4a413f347e5bfc453a411edd5aeb514
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:dfed1277bd7649514be1994ee581757284c4fd70;origin=https://github.com/hekselman/PrEDiCT;visit=swh:1:snp:d5189c6ac53099ec298f1110ee5f0b006dffa18e;anchor=swh:1:rev:71b591e1f4a413f347e5bfc453a411edd5aeb514
(2023)
StatPearls

Heinz body, StatPearls, Treasure Island (FL) Ineligible Companies.
- Google Scholar
Preprint
1. Jagadeesh KA
2. Dey KK
3. Montoro DT
4. Mohan R
5. Gazal S
6. Engreitz JM
7. Xavier RJ
8. Price AL
9. Regev A
(2021) Identifying Disease-Critical Cell Types and Cellular Processes across the Human Body by Integration of Single-Cell Profiles and Human Genetics
bioRxiv.

https://doi.org/10.1101/2021.03.19.436212
- Google Scholar
1. Jagadeesh KA
2. Dey KK
3. Montoro DT
4. Mohan R
5. Gazal S
6. Engreitz JM
7. Xavier RJ
8. Price AL
9. Regev A
(2022) Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics
Nature Genetics 54:1479–1492.

https://doi.org/10.1038/s41588-022-01187-9
- PubMed
- Google Scholar
1. Jin S
2. Guerrero-Juarez CF
3. Zhang L
4. Chang I
5. Ramos R
6. Kuan C-H
7. Myung P
8. Plikus MV
9. Nie Q
(2021) Inference and analysis of cell-cell communication using CellChat
Nature Communications 12:1088.

https://doi.org/10.1038/s41467-021-21246-9
- PubMed
- Google Scholar
1. Jones RC
2. Karkanias J
3. Krasnow MA
4. Pisco AO
5. Quake SR
6. Salzman J
7. Yosef N
8. Bulthaup B
9. Brown P
10. Harper W
11. Hemenez M
12. Ponnusamy R
13. Salehi A
14. Sanagavarapu BA
15. Spallino E
16. Aaron KA
17. Concepcion W
18. Gardner JM
19. Kelly B
20. Neidlinger N
21. Wang Z
22. Crasta S
23. Kolluru S
24. Morri M
25. Pisco AO
26. Tan SY
27. Travaglini KJ
28. Xu C
29. Alcántara-Hernández M
30. Almanzar N
31. Antony J
32. Beyersdorf B
33. Burhan D
34. Calcuttawala K
35. Carter MM
36. Chan CKF
37. Chang CA
38. Chang S
39. Colville A
40. Crasta S
41. Culver RN
42. Cvijović I
43. D’Amato G
44. Ezran C
45. Galdos FX
46. Gillich A
47. Goodyer WR
48. Hang Y
49. Hayashi A
50. Houshdaran S
51. Huang X
52. Irwin JC
53. Jang S
54. Juanico JV
55. Kershner AM
56. Kim S
57. Kiss B
58. Kolluru S
59. Kong W
60. Kumar ME
61. Kuo AH
62. Leylek R
63. Li B
64. Loeb GB
65. Lu W-J
66. Mantri S
67. Markovic M
68. McAlpine PL
69. de Morree A
70. Morri M
71. Mrouj K
72. Mukherjee S
73. Muser T
74. Neuhöfer P
75. Nguyen TD
76. Perez K
77. Phansalkar R
78. Pisco AO
79. Puluca N
80. Qi Z
81. Rao P
82. Raquer-McKay H
83. Schaum N
84. Scott B
85. Seddighzadeh B
86. Segal J
87. Sen S
88. Sikandar S
89. Spencer SP
90. Steffes LC
91. Subramaniam VR
92. Swarup A
93. Swift M
94. Travaglini KJ
95. Van Treuren W
96. Trimm E
97. Veizades S
98. Vijayakumar S
99. Vo KC
100. Vorperian SK
101. Wang W
102. Weinstein HNW
103. Winkler J
104. Wu TTH
105. Xie J
106. Yung AR
107. Zhang Y
108. Detweiler AM
109. Mekonen H
110. Neff NF
111. Sit RV
112. Tan M
113. Yan J
114. Bean GR
115. Charu V
116. Forgó E
117. Martin BA
118. Ozawa MG
119. Silva O
120. Tan SY
121. Toland A
122. Vemuri VNP
123. Afik S
124. Awayan K
125. Botvinnik OB
126. Byrne A
127. Chen M
128. Dehghannasiri R
129. Detweiler AM
130. Gayoso A
131. Granados AA
132. Li Q
133. Mahmoudabadi G
134. McGeever A
135. de Morree A
136. Olivieri JE
137. Park M
138. Pisco AO
139. Ravikumar N
140. Salzman J
141. Stanley G
142. Swift M
143. Tan M
144. Tan W
145. Tarashansky AJ
146. Vanheusden R
147. Vorperian SK
148. Wang P
149. Wang S
150. Xing G
151. Xu C
152. Yosef N
153. Alcántara-Hernández M
154. Antony J
155. Chan CKF
156. Chang CA
157. Colville A
158. Crasta S
159. Culver R
160. Dethlefsen L
161. Ezran C
162. Gillich A
163. Hang Y
164. Ho P-Y
165. Irwin JC
166. Jang S
167. Kershner AM
168. Kong W
169. Kumar ME
170. Kuo AH
171. Leylek R
172. Liu S
173. Loeb GB
174. Lu W-J
175. Maltzman JS
176. Metzger RJ
177. de Morree A
178. Neuhöfer P
179. Perez K
180. Phansalkar R
181. Qi Z
182. Rao P
183. Raquer-McKay H
184. Sasagawa K
185. Scott B
186. Sinha R
187. Song H
188. Spencer SP
189. Swarup A
190. Swift M
191. Travaglini KJ
192. Trimm E
193. Veizades S
194. Vijayakumar S
195. Wang B
196. Wang W
197. Winkler J
198. Xie J
199. Yung AR
200. Artandi SE
201. Beachy PA
202. Clarke MF
203. Giudice LC
204. Huang FW
205. Huang KC
206. Idoyaga J
207. Kim SK
208. Krasnow M
209. Kuo CS
210. Nguyen P
211. Quake SR
212. Rando TA
213. Red-Horse K
214. Reiter J
215. Relman DA
216. Sonnenburg JL
217. Wang B
218. Wu A
219. Wu SM
220. Wyss-Coray T
221. Tabula Sapiens Consortium*
(2022) The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans
Science 376:eabl4896.

https://doi.org/10.1126/science.abl4896
- PubMed
- Google Scholar
1. Kim-Hellmuth S
2. Aguet F
3. Oliva M
4. Muñoz-Aguirre M
5. Kasela S
6. Wucher V
7. Castel SE
8. Hamel AR
9. Viñuela A
10. Roberts AL
11. Mangul S
12. Wen X
13. Wang G
14. Barbeira AN
15. Garrido-Martín D
16. Nadel BB
17. Zou Y
18. Bonazzola R
19. Quan J
20. Brown A
21. Martinez-Perez A
22. Soria JM
23. GTEx Consortium
24. Getz G
25. Dermitzakis ET
26. Small KS
27. Stephens M
28. Xi HS
29. Im HK
30. Guigó R
31. Segrè AV
32. Stranger BE
33. Ardlie KG
34. Lappalainen T
(2020) Cell type-specific genetic regulation of gene expression across human tissues
Science 369:eaaz8528.

https://doi.org/10.1126/science.aaz8528
- PubMed
- Google Scholar
(1995)
Distribution of type XV collagen transcripts in human tissue and their production by muscle cells and fibroblasts

The American Journal of Pathology 147:1500–1509.
- PubMed
- Google Scholar
1. Köhler S
2. Gargano M
3. Matentzoglu N
4. Carmody LC
5. Lewis-Smith D
6. Vasilevsky NA
7. Danis D
8. Balagura G
9. Baynam G
10. Brower AM
11. Callahan TJ
12. Chute CG
13. Est JL
14. Galer PD
15. Ganesan S
16. Griese M
17. Haimel M
18. Pazmandi J
19. Hanauer M
20. Harris NL
21. Hartnett MJ
22. Hastreiter M
23. Hauck F
24. He Y
25. Jeske T
26. Kearney H
27. Kindle G
28. Klein C
29. Knoflach K
30. Krause R
31. Lagorce D
32. McMurry JA
33. Miller JA
34. Munoz-Torres MC
35. Peters RL
36. Rapp CK
37. Rath AM
38. Rind SA
39. Rosenberg AZ
40. Segal MM
41. Seidel MG
42. Smedley D
43. Talmy T
44. Thomas Y
45. Wiafe SA
46. Xian J
47. Yüksel Z
48. Helbig I
49. Mungall CJ
50. Haendel MA
51. Robinson PN
(2021) The Human Phenotype Ontology in 2021
Nucleic Acids Research 49:D1207–D1217.

https://doi.org/10.1093/nar/gkaa1043
- PubMed
- Google Scholar
1. Lage K
2. Hansen NT
3. Karlberg EO
4. Eklund AC
5. Roque FS
6. Donahoe PK
7. Szallasi Z
8. Jensen TS
9. Brunak S
(2008) A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes
PNAS 105:20870–20875.

https://doi.org/10.1073/pnas.0810772105
- PubMed
- Google Scholar
Book
1. Leiding JW
2. Holland SM
(1993)
Chronic Granulomatous Disease

GeneReviews.
- Google Scholar
(2019) Primary Ciliary Dyskinesia (PCD): A genetic disorder of motile cilia
Translational Science of Rare Diseases 4:51–75.

https://doi.org/10.3233/TRD-190036
- PubMed
- Google Scholar
1. Mathys H
2. Davila-Velderrain J
3. Peng Z
4. Gao F
5. Mohammadi S
6. Young JZ
7. Menon M
8. He L
9. Abdurrob F
10. Jiang X
11. Martorell AJ
12. Ransohoff RM
13. Hafler BP
14. Bennett DA
15. Kellis M
16. Tsai L-H
(2019) Single-cell transcriptomic analysis of Alzheimer’s disease
Nature 570:332–337.

https://doi.org/10.1038/s41586-019-1195-2
- PubMed
- Google Scholar
1. Mereu E
2. Lafzi A
3. Moutinho C
4. Ziegenhain C
5. McCarthy DJ
6. Álvarez-Varela A
7. Batlle E
8. Grün D
9. Lau JK
10. Boutet SC
11. Sanada C
12. Ooi A
13. Jones RC
14. Kaihara K
15. Brampton C
16. Talaga Y
17. Sasagawa Y
18. Tanaka K
19. Hayashi T
20. Braeuning C
21. Fischer C
22. Sauer S
23. Trefzer T
24. Conrad C
25. Adiconis X
26. Nguyen LT
27. Regev A
28. Levin JZ
29. Parekh S
30. Janjic A
31. Wange LE
32. Bagnoli JW
33. Enard W
34. Gut M
35. Sandberg R
36. Nikaido I
37. Gut I
38. Stegle O
39. Heyn H
(2020) Benchmarking single-cell RNA-sequencing protocols for cell atlas projects
Nature Biotechnology 38:747–755.

https://doi.org/10.1038/s41587-020-0469-4
- PubMed
- Google Scholar
1. Montoro DT
2. Haber AL
3. Biton M
4. Vinarsky V
5. Lin B
6. Birket SE
7. Yuan F
8. Chen S
9. Leung HM
10. Villoria J
11. Rogel N
12. Burgin G
13. Tsankov AM
14. Waghray A
15. Slyper M
16. Waldman J
17. Nguyen L
18. Dionne D
19. Rozenblatt-Rosen O
20. Tata PR
21. Mou H
22. Shivaraju M
23. Bihler H
24. Mense M
25. Tearney GJ
26. Rowe SM
27. Engelhardt JF
28. Regev A
29. Rajagopal J
(2018) A revised airway epithelial hierarchy includes CFTR-expressing ionocytes
Nature 560:319–324.

https://doi.org/10.1038/s41586-018-0393-7
- PubMed
- Google Scholar
1. Park J
2. Shrestha R
3. Qiu C
4. Kondo A
5. Huang S
6. Werth M
7. Li M
8. Barasch J
9. Suszták K
(2018) Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease
Science 360:758–763.

https://doi.org/10.1126/science.aar2131
- PubMed
- Google Scholar
1. Plasschaert LW
2. Žilionis R
3. Choo-Wing R
4. Savova V
5. Knehr J
6. Roma G
7. Klein AM
8. Jaffe AB
(2018) A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte
Nature 560:377–381.

https://doi.org/10.1038/s41586-018-0394-6
- PubMed
- Google Scholar
1. Robinson A
2. Han CZ
3. Glass CK
4. Pollard JW
(2021) Monocyte Regulation in Homeostasis and Malignancy
Trends in Immunology 42:104–119.

https://doi.org/10.1016/j.it.2020.12.001
- PubMed
- Google Scholar
Preprint
1. Rouhana JM
2. Wang J
3. Eraslan G
4. Anand S
5. Hamel AR
6. Cole B
7. Regev A
8. Aguet F
9. Ardlie KG
10. Segrè AV
(2021) ECLIPSER: Identifying Causal Cell Types and Genes for Complex Traits through Single Cell Enrichment of e/sQTL-Mapped Genes in GWAS Loci
bioRxiv.

https://doi.org/10.1101/2021.11.24.469720
- Google Scholar
1. Segerstolpe Å
2. Palasantza A
3. Eliasson P
4. Andersson E-M
5. Andréasson A-C
6. Sun X
7. Picelli S
8. Sabirsh A
9. Clausen M
10. Bjursell MK
11. Smith DM
12. Kasper M
13. Ämmälä C
14. Sandberg R
(2016) Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes
Cell Metabolism 24:593–607.

https://doi.org/10.1016/j.cmet.2016.08.020
- PubMed
- Google Scholar
1. Servián-Morilla E
2. Cabrera-Serrano M
3. Johnson K
4. Pandey A
5. Ito A
6. Rivas E
7. Chamova T
8. Muelas N
9. Mongini T
10. Nafissi S
11. Claeys KG
12. Grewal RP
13. Takeuchi M
14. Hao H
15. Bönnemann C
16. Lopes Abath Neto O
17. Medne L
18. Brandsema J
19. Töpf A
20. Taneva A
21. Vilchez JJ
22. Tournev I
23. Haltiwanger RS
24. Takeuchi H
25. Jafar-Nejad H
26. Straub V
27. Paradas C
(2020) POGLUT1 biallelic mutations cause myopathy with reduced satellite cells, α-dystroglycan hypoglycosylation and a distinctive radiological pattern
Acta Neuropathologica 139:565–582.

https://doi.org/10.1007/s00401-019-02117-6
- PubMed
- Google Scholar
1. Smillie CS
2. Biton M
3. Ordovas-Montanes J
4. Sullivan KM
5. Burgin G
6. Graham DB
7. Herbst RH
8. Rogel N
9. Slyper M
10. Waldman J
11. Sud M
12. Andrews E
13. Velonias G
14. Haber AL
15. Jagadeesh K
16. Vickovic S
17. Yao J
18. Stevens C
19. Dionne D
20. Nguyen LT
21. Villani AC
22. Hofree M
23. Creasey EA
24. Huang H
25. Rozenblatt-Rosen O
26. Garber JJ
27. Khalili H
28. Desch AN
29. Daly MJ
30. Ananthakrishnan AN
31. Shalek AK
32. Xavier RJ
33. Regev A
(2019) Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis
Cell 178:714–730.

https://doi.org/10.1016/j.cell.2019.06.029
- PubMed
- Google Scholar
1. Sonawane AR
2. Platig J
3. Fagny M
4. Chen C-Y
5. Paulson JN
6. Lopes-Ramos CM
7. DeMeo DL
8. Quackenbush J
9. Glass K
10. Kuijjer ML
(2017) Understanding Tissue-Specific Gene Regulation
Cell Reports 21:1077–1088.

https://doi.org/10.1016/j.celrep.2017.10.001
- PubMed
- Google Scholar
1. Sondka Z
2. Bamford S
3. Cole CG
4. Ward SA
5. Dunham I
6. Forbes SA
(2018) The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers
Nature Reviews. Cancer 18:696–705.

https://doi.org/10.1038/s41568-018-0060-1
- PubMed
- Google Scholar
1. Tabula Muris C
(2018) Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris
Nature 562:367–372.

https://doi.org/10.1038/s41586-018-0590-4
- PubMed
- Google Scholar
1. Urciuolo A
2. Quarta M
3. Morbidoni V
4. Gattazzo F
5. Molon S
6. Grumati P
7. Montemurro F
8. Tedesco FS
9. Blaauw B
10. Cossu G
11. Vozzi G
12. Rando TA
13. Bonaldo P
(2013) Collagen VI regulates satellite cell self-renewal and muscle regeneration
Nature Communications 4:1964.

https://doi.org/10.1038/ncomms2964
- PubMed
- Google Scholar
(2019) The hyper IgM syndromes: Epidemiology, pathogenesis, clinical manifestations, diagnosis and management
Clinical Immunology 198:19–30.

https://doi.org/10.1016/j.clim.2018.11.007
- PubMed
- Google Scholar
1. Zhang MJ
2. Hou K
3. Dey KK
4. Sakaue S
5. Jagadeesh KA
6. Weinand K
7. Taychameekiatchai A
8. Rao P
9. Pisco AO
10. Zou J
11. Wang B
12. Gandal M
13. Raychaudhuri S
14. Pasaniuc B
15. Price AL
(2022) Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data
Nature Genetics 54:1572–1580.

https://doi.org/10.1038/s41588-022-01167-z
- PubMed
- Google Scholar
1. Zhao T
2. Lyu S
3. Lu G
4. Juan L
5. Zeng X
6. Wei Z
7. Hao J
8. Peng J
(2021) SC2disease: a manually curated database of single-cell transcriptome for human diseases
Nucleic Acids Research 49:D1413–D1419.

https://doi.org/10.1093/nar/gkaa838
- PubMed
- Google Scholar
1. Zou Y
2. Zhang R-Z
3. Sabatelli P
4. Chu M-L
5. Bönnemann CG
(2008) Muscle interstitial fibroblasts are the main source of collagen VI synthesis in skeletal muscle: implications for congenital muscular dystrophy types Ullrich and Bethlem
Journal of Neuropathology and Experimental Neurology 67:144–154.

https://doi.org/10.1097/nen.0b013e3181634ef7
- PubMed
- Google Scholar

Article and author information

Author details

Idan Hekselman

Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, Be’er Sheva, Israel

Contribution
Formal analysis, Visualization, Methodology, Writing – original draft

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-0529-7838
Assaf Vital

Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, Be’er Sheva, Israel

Contribution
Methodology

Competing interests
No competing interests declared
Maya Ziv-Agam

Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, Be’er Sheva, Israel

Contribution
Data curation

Contributed equally with
Lior Kerber

Competing interests
No competing interests declared
Lior Kerber

Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, Be’er Sheva, Israel

Contribution
Data curation

Contributed equally with
Maya Ziv-Agam

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0009-0005-0583-5618
Ido Yairi

Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, Be’er Sheva, Israel

Contribution
Data curation, Formal analysis

Competing interests
No competing interests declared
Esti Yeger-Lotem
1. Department of Clinical Biochemistry and Pharmacology, Ben-Gurion University of the Negev, Be’er Sheva, Israel
2. The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Be’er Sheva, Israel
Contribution
Conceptualization, Supervision, Writing – original draft

For correspondence
estiyl@bgu.ac.il

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8279-7898

Funding

Israel Science Foundation (317/19)

Esti Yeger-Lotem

Israel Science Foundation (401/22)

Esti Yeger-Lotem

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was funded by the Israel Science Foundation [317/19 to E.Y.-L] and [401/22 to E.Y.-L].

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.