Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification

  1. Zhanying Feng
  2. Zhana Duren
  3. Jingxue Xin
  4. Qiuyue Yuan
  5. Yaoxi He
  6. Bing Su
  7. Wing Hung Wong  Is a corresponding author
  8. Yong Wang  Is a corresponding author
  1. CEMS, NCMIS, HCMS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
  2. School of Mathematics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
  3. Center for Human Genetics and Department of Genetics and Biochemistry, Clemson University, United States
  4. Department of Statistics, Department of Biomedical Data Science, Bio-X Program, Stanford University, United States
  5. State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, China
  6. Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, China
  7. Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
7 figures, 1 table and 2 additional files

Figures

Figure 1 with 1 supplement
Overview of SpecVar.

(a) SpecVar constructs an atlas of context-specific regulatory networks and regulatory categories. Then SpecVar represents genome-wide association studies (GWAS) summary statistics into relevance scores and SNP-associated regulatory subnetworks. (b) For a single phenotype, SpecVar can use relevance score and SNP-associated regulatory subnetworks to identify and interpret relevant tissues. (c) For multiple phenotypes, based on relevance score, SpecVar can reveal relevance correlation, common relevant tissues, and shared regulations.

Figure 1—figure supplement 1
Principal component analysis (PCA) plot of regulatory network atlas of 77 human tissues.

The trans-regulatory score (TRS) across 77 tissues are used for PCA.

Figure 2 with 2 supplements
Comparison of heritability enrichment between SpecVar and four alternate methods: all regulatory elements (ARE), all accessible peaks (AAP), specifically accessible peaks (SAP), and specifically expressed genes (SEG).

(a) The heritability enrichment of low-density lipoprotein (LDL) in the ‘right lobe of liver’. (b) The heritability enrichment of total cholesterol in the ‘right lobe of liver’. (c) The heritability enrichment of educational attainment in the ‘frontal cortex’. (d) The heritability enrichment of cognitive performance in the ‘frontal cortex’. (e) The heritability enrichment of brain shape in cranial neural crest cell (CNCC). (f) The heritability enrichment of facial morphology in ‘CNCC’. The sample size of error bars for (a-f) is 200. (g) Boxplot of top 10 contexts’ heritability enrichment of six phenotypes for five methods.

Figure 2—figure supplement 1
SpecVar achieves higher heritability than other methods through regulatory network and specificity.

(a) The five brain tissues’ averaged heritability enrichment of educational attainment. (b) The five brain tissues’ averaged heritability enrichment of cognitive performance. (c) Schematic showing the relationship and differences of the five methods. (d) Three comparisons to conduct ablation analysis. For comparison between SpecVar and specifically accessible peaks (SAP), we exclude their shared genomic regions. And for comparison between SpecVar and specifically expressed genes (SEG), we also exclude their shared genomic regions. (e) The log2(Fold) of SpecVar vs all regulatory elements (ARE), SpecVar vs SAP, and SpecVar vs SEG for six phenotypes in their relevant tissues.

Figure 2—figure supplement 2
The heritability enrichment estimated by ‘pooled genome partition’.

(a) The heritability enrichment of low-density lipoprotein (LDL) in the ‘right lobe of liver’. (b) The heritability enrichment of total cholesterol in the ‘right lobe of liver’. (c) The heritability enrichment of educational attainment in the ‘frontal cortex’. (d) The heritability enrichment of cognitive performance in the ‘frontal cortex’. (e) The heritability enrichment of brain shape in cranial neural crest cell (‘CNCC’). (f) The heritability enrichment of facial morphology in ‘CNCC’. The sample size of error bars for (a-f) is 200.

Figure 3 with 5 supplements
Comparison of identifying proper relevant tissues between SpecVar and two other LD Score regression (LDSC)-based method: specifically accessible peaks (SAP) and specifically expressed genes (SEG).

The top five relevant tissues ranked by the relevant score of SpecVar, SAP, and SEG for (a) low-density lipoprotein (LDL), (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology. The sample size of error bars for (a-f) is 100.

Figure 3—figure supplement 1
The top five relevant tissues ranked by the log-likelihood estimated by CoCoNet for (a) low-density lipoprotein (LDL), (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology.
Figure 3—figure supplement 2
The top five relevant tissues ranked by -log(p-value) estimated by RolyPoly for (a) low-density lipoprotein (LDL), (b) total cholesterol, (c) educational attainment, (d) cognitive performance, and (e) facial morphology.
Figure 3—figure supplement 3
Top 10 contexts ranked by heritability enrichment in context-specific regulatory elements of (a) low-density lipoprotein (LDL), (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology.
Figure 3—figure supplement 4
Top 10 contexts ranked by p-values of heritability enrichment in context-specific regulatory elements of (a) low-density lipoprotein (LDL), (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology.
Figure 3—figure supplement 5
The top five relevant tissues ranked by the relevant score estimated by the group-based SpecVar.

The top five relevant tissues for (a) low-density lipoprotein (LDL), (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology. (g) Fold change of first-ranked tissue and second-ranked tissue of group-based SpecVar and original SpecVar.

Figure 4 with 1 supplement
SpecVar uses SNP-associated regulation to interpret relevance to tissues.

(a) The brain shape’s SNP-associated regulatory subnetwork in cranial neural crest cell (CNCC). The dash arrows indicate significant SNPs that are not located in regulatory elements (RE) but near this RE. (b) SNP-associated regulation of FOXC2. There is a group of significant SNPs of brain shape that is located in the 650k downstream of FOXC2, and they are with high linkage disequilibrium. SpecVar prioritizes SNPs located in a CNCC-specific RE as causal genetic variants affecting brain shape through the regulation of FOXC2. The SNP locus and promoter of FOXC2 are linked by a HiChIP loop of the brain tissues.

Figure 4—figure supplement 1
Distribution of A scores of (a) low-density lipoprotein (LDL) in 'right lobe of liver’, (b) total cholesterol in the 'right lobe of liver’, (c) educational attainment in the 'frontal cortex’, (d) cognitive performance in the 'frontal cortex’, (e) brain shape in cranial neural crest cell ('CNCC'), and (f) facial morphology in 'CNCC'.
Figure 5 with 4 supplements
SpecVar defines relevance correlation to reveal association of phenotypes.

(a) The scatter plot of true phenotypic correlation and relevance correlation by SpecVar. Each point means a pair of facial distances. (b) For all phenotype pairs of facial distances, the Pearson correlation coefficient (PCC) between phenotypic correlation and relevance correlation of three methods. (c) For highly correlated phenotype pairs of facial distances, the PCC between phenotypic correlation and relevance correlation of three methods. (d) For all pairs of UKBB phenotypes, the PCC between phenotypic correlation and relevance correlation of three methods. (e) For highly correlated pairs of UKBB phenotypes, the PCC between phenotypic correlation and relevance correlation of three methods. (f) For UKBB phenotype pairs with 25 different heritability thresholds, the PCC between phenotypic correlation and relevance correlation of four methods.

Figure 5—figure supplement 1
Mean square error (MSE) and mutual information (MI) metrics show SpecVar achieve better approximation for phenotypic correlation on facial distance dataset.

(a) For all phenotype pairs of facial distances, the MSE between true phenotypic correlation and relevance correlation of SpecVar, specifically accessible peaks (SAP), and specifically expressed genes (SEG). (b). For highly correlated phenotype pairs of facial distances, the MSE between true phenotypic correlation and estimated phenotypic correlation of SpecVar, SAP, and SEG. (c). For all phenotype pairs of facial distances, the MI between true phenotypic correlation and relevance correlation of SpecVar, SAP, and SEG. (d). For highly correlated phenotype pairs of facial distances, the MI between true phenotypic correlation and estimated phenotypic correlation of SpecVar, SAP, and SEG.

Figure 5—figure supplement 2
SpecVar achieve better and more robust approximation for phenotypic correlation on UKBB dataset.

(a) For all phenotype pairs of UKBB phenotypes, the mean square error (MSE) between true phenotypic correlation and relevance correlation of three methods. (b) For highly correlated phenotype pairs of UKBB phenotypes, the MSE between true phenotypic correlation and relevance correlation of three methods. (c) For all phenotype pairs of UKBB phenotypes, the mutual information (MI) between true phenotypic correlation and relevance correlation of three methods. (d) For highly correlated phenotype pairs of UKBB phenotypes, the MI between true phenotypic correlation and relevance correlation of three methods. (e) For UKBB phenotype pairs with different heritability thresholds, the MSE between true phenotypic correlation and relevance correlation of three methods. (f) Boxplot of relevance correlation MSE under 25 different thresholds of phenotype heritability. SpecVar shows the smallest variance. (g) Boxplot of relevance correlation Pearson correlation coefficient (PCC) under 25 different thresholds of phenotype heritability. SpecVar and specifically accessible peaks (SAP) show the smallest variance.

Figure 5—figure supplement 3
SpecVar achieve better approximation for phenotypic correlation than heritability enrichment and p-value on facial distance dataset.

(a) The scatter plot of true phenotypic correlation and estimated relevance correlation by heritability enrichment. (b) The scatter plot of true phenotypic correlation and estimated relevance correlation by -log(p-value). Each point means a pair of facial distances. (c) For all phenotype pairs, the mean square error (MSE) between phenotypic correlation and relevance correlation of SpecVar, heritability enrichment, and -log(p-value). (d) For all phenotype pairs, the Pearson correlation coefficient (PCC) between phenotypic correlation and relevance correlation of SpecVar, heritability enrichment, and -log(p-value). (e) For high correlated phenotype pairs, the MSE between phenotypic correlation and relevance correlation of SpecVar, heritability enrichment, and -log(p-value). (f) For high correlated phenotype pairs, the PCC between phenotypic correlation and relevance correlation of SpecVar, heritability enrichment, and -log(p-value).

Figure 5—figure supplement 4
Combination of SpecVar’s relevance correlation and LDSC-GC’s genetic correlation gives a more accurate estimation of phenotypic correlation.

(a) For phenotype pairs with 25 different heritability thresholds, the mean square error (MSE) between true phenotypic correlation and SpecVar’s relevance correlation; and MSE between true phenotypic correlation and LDSC-GC’s genetic correlation. (b). For phenotype pairs with 25 different heritability thresholds, the Pearson correlation coefficient (PCC) between true phenotypic correlation and SpecVar’s relevance correlation; and PCC between true phenotypic correlation and LDSC-GC’s genetic correlation. (c) The R2 metric of regression between phenotypic correlation and relevance correlation, genetic correlation, linear combination, and product on face distance dataset. (d) The PCC metric of four regression in face distance dataset. (e) The R2 metric of four regression on UKBB dataset. (f) The PCC metric of four regressionon UKBB dataset.

SpecVar uses common relevant tissues and shared SNP-associated regulatory network to interpret relevance correlation.

(a) Scatter plot of R scores across 77 human contexts of ‘body mass index’ and ‘leg fat-free mass (right)’. (b) The SNP-associated regulatory network of ‘body mass index’ and ‘leg fat-free mass (right)’ are significantly shared. (c) SNP-associated regulation of SH2B1. There is a shared significant SNP of ‘body mass index’ and ‘leg fat-free mass (right)’ that is located at the 90k downstream of SH2B1, and there is a HiChIP loop linking this locus to the promoter of SH2B1. SpecVar prioritizes SNP located in a ‘frontal cortex’-specific regulatory elements (RE) as potential causal genetic variant affecting both ‘body mass index’ and ‘leg fat-free mass (right)’ through regulation of SH2B1.

Author response image 1
The top 5 relevant tissues of SpecVar, SAP, and SEG ranked by the relevant score estimated by “pooled genome partition” for (a) LDL, (b) total cholesterol, (c) educational attainment, (d) cognitive performance, (e) brain shape, and (f) facial morphology.

Tables

Table 1
The total sample size, number of significant SNP associations, and SpecVar-identified relevant tissues of six phenotypes.

For each relevant tissue, we have two numbers in the bracket: the first is the R score and the second is its false discovery rate (FDR) q-value.

TraitSample sizeSNP associationRelevant tissues (R score and its FDR q-value)
Low-density lipoprotein173,0823077Right lobe of liver (722.74, 1.2e-3), frontal cortex (146.54, 3.3e-4), gastrocnemius medialis (128.02, 4.7e-4), fetal adrenal gland (123.52, 9.5e-4)
Total cholesterol187,3654169Right lobe of liver (714.75, 1.0e-2), fetal adrenal gland (216.74, 4.3e-17), H7-hESC (130.73, 2.1e-3)
Educational attainment1070,75130,519Frontal cortex (167.23, 3.7e-7)
Cognitive performance257,84113,732Frontal cortex (216.62, 7.0e-28),
Ammon’s horn (107.25, 2.3e-10)
Brain shape19,64438,630CNCC (512.56, 2.7e-44), trophoblast (144.15, 3.8e-9)
Facial morphology10,115495CNCC (134.95, 8.0e-10), fibroblast (128.81, 3.8e-26)
  1. CNCC, cranial neural crest cell.

Additional files

Supplementary file 1

Supplementary tables.

(a) Data source, transcription factor (TF) number, target gene (TG) number, regulatory element (RE) number, RE number per TG, and group information of paired RNA-seq and ATAC-seq for 77 regulatory networks. (b) SpecVar-based heritability enrichment, and its standard error, p-value, and q-value. (c) R scores of six phenotypes in 77 human contexts. (d) 206 phenotypes selected from UKBB and their top relevant contexts. (e) Relevance correlation and the top three common relevant tcontexts of UKBB phenotypes. (f) Number of regions of five genome partition methods (Spec, ARE, SAP, AAP, SEG). ARE, all regulatory elements; AAP, all accessible peaks; SAP, specifically accessible peaks; SEG, specifically expressed gene.

https://cdn.elifesciences.org/articles/82535/elife-82535-supp1-v2.xlsx
MDAR checklist
https://cdn.elifesciences.org/articles/82535/elife-82535-mdarchecklist1-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zhanying Feng
  2. Zhana Duren
  3. Jingxue Xin
  4. Qiuyue Yuan
  5. Yaoxi He
  6. Bing Su
  7. Wing Hung Wong
  8. Yong Wang
(2022)
Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification
eLife 11:e82535.
https://doi.org/10.7554/eLife.82535