Research Article

A conditional gene-based association framework integrating isoform-level eQTL data reveals new susceptibility genes for schizophrenia

Program in Bioinformatics, Zhongshan School of Medicine and The Fifth Affiliated Hospital, Sun Yat-sen University, China
Key Laboratory of Tropical Disease Control (Sun Yat-sen University), Ministry of Education, China
Center for Precision Medicine, Sun Yat-sen University, China
Research Center of Medical Sciences, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, China
The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, China
Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, China

Apr 12, 2022

Open access
Copyright information

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Linkage disequilibrium and disease-associated variants in the non-coding regions make it difficult to distinguish the truly associated genes from the redundantly associated genes for complex diseases. In this study, we proposed a new conditional gene-based framework called eDESE that leveraged an improved effective chi-squared statistic to control the type I error rates and remove the redundant associations. eDESE initially performed the association analysis by mapping variants to genes according to their physical distance. We further demonstrated that the isoform-level eQTLs could be more powerful than the gene-level eQTLs in the association analysis using a simulation study. Then the eQTL-guided strategies, that is, mapping variants to genes according to their gene/isoform-level variant-gene cis-eQTLs associations, were also integrated with eDESE. We then applied eDESE to predict the potential susceptibility genes of schizophrenia and found that the potential susceptibility genes were enriched with many neuronal or synaptic signaling-related terms in the Gene Ontology knowledgebase and antipsychotics-gene interaction terms in the drug-gene interaction database (DGIdb). More importantly, seven potential susceptibility genes identified by eDESE were the target genes of multiple antipsychotics in DrugBank. Comparing the potential susceptibility genes identified by eDESE and other benchmark approaches (i.e., MAGMA and S-PrediXcan) implied that strategy based on the isoform-level eQTLs could be an important supplement for the other two strategies (physical distance and gene-level eQTLs). We have implemented eDESE in our integrative platform KGGSEE (http://pmglab.top/kggsee/#/) and hope that eDESE can facilitate the prediction of candidate susceptibility genes and isoforms for complex diseases in a multi-tissue context.

Editor's evaluation

This manuscript describes a new method of identifying disease-relevant risk genes from genome wide association studies by using a conditional gene-based method (named eDESE). The authors apply this new approach in an analysis of schizophrenia datasets, and identify meaningful biological processes and potential drug repurposing candidates. Thus, this new method could provide improved gene prioritization for fine-mapping and functional studies of specific diseases.

https://doi.org/10.7554/eLife.70779.sa0

Introduction

Genome-wide association studies (GWASs) have been used to identify genotype-phenotype associations for over a decade, and thousands of single-nucleotide polymorphisms (SNPs) have been revealed for their associations with hundreds or thousands of complex human diseases (Visscher et al., 2017; Gallagher and Chen-Plotkin, 2018). Nevertheless, conventional GWASs analyses have limited power to produce a complete set of susceptibility variants for complex diseases (Tam et al., 2019). Because most susceptibility SNPs only have small effects on a complex phenotype, conventional SNP-based association tests are generally underpowered to reveal susceptibility variants after multiple-testing corrections. Moreover, the susceptibility variants scattering randomly throughout the genome are often in strong linkage disequilibrium (LD) with numerous neutral SNPs, which makes the discrimination of truly causal variants from GWAS hits quite difficult (Tam et al., 2019). Finally, more than 90% of the disease-associated variants are in non-coding regions of the genome, and many of them are far from the nearest known gene, and it remains a challenge to link genes and a complex phenotype through the non-coding variants (Schaub et al., 2012; Maurano et al., 2012). Accordingly, corresponding methodological strategies have been proposed to make up, at least partly, for the issues mentioned above.

First, gene-based approaches can reduce the multiple-testing burdens by considering the association between a phenotype and all variants within a gene (Neale and Sham, 2004). Assigning a variant to a gene according to the physical distance of the variant from gene boundary is one of the most popular strategies for gene-based approaches. For example, MAGMA (Multi-marker Analysis of GenoMic Annotation), one of the most popular gene-based approaches, uses a multiple regression approach to incorporate LD between markers and detect multi-markers effects to perform gene-based analysis (de Leeuw et al., 2015). VEGAS, a versatile gene-based test for GWAS, incorporates information from a full set of markers (or a defined subset) within a gene and accounts for LD between markers by simulations from the multivariate normal distribution (Liu et al., 2010). GATES, a rapid gene-based association test that uses an extended Simes procedure to assess the statistical significance of gene-level associations (Li et al., 2011). SuSiE (sum of single effects), a novel and popular approach to variable selection in linear regression, can use summary statistics and LD to produce gene-level evidence of association in terms of Bayes Factor (Wang et al., 2020).

Second, evaluating the gene-phenotype associations at one gene conditioning on other genes can isolate true susceptibility genes from the redundant non-susceptibility genes (Li et al., 2019). Yang et al., 2012 proposed an approximate conditional and joint association analysis method based on linear regression analysis to estimate the individual causal variant with GWAS summary statistics. Our previously proposed conditional gene-based association approach based on effective chi-squared statistics (ECS) could remove redundantly associated genes based on the GWAS p-values of variants. Comparing the conditional gene-based association approach with ECS, MAGMA, and VEGAS suggested that the former might be more powerful to predict biologically sensible susceptibility genes (Li et al., 2019).

Third, the observation that variants in the non-coding regions were enriched in the transcriptional regulatory regions implied that these variants might affect the disease risk by altering the genetic regulation of target genes (Gallagher and Chen-Plotkin, 2018). Integration of expression quantitative trait loci (eQTL) studies and GWAS has been used to investigate the genetic regulatory effects on complex diseases. As many complex diseases manifested themselves in certain tissues, using the eQTLs of potentially phenotype-associated tissues might help identify the true susceptibility genes in the tissue context (Hekselman and Yeger-Lotem, 2020). Based on the framework of MAGMA, a method called eMAGMA integrated genetic and transcriptomic information (e.g. eQTLs) in a tissue-specific analysis to identify risk genes and was applied to identify novel genes underlying the major depression disorder (Gerring et al., 2021; Gerring et al., 2019). S-PrediXcan was developed for imputing the genetically regulated gene expression component based on GWAS summary statistics and transcriptome prediction models built from the eQTL/splicing QTL dataset of the Genotype Tissue Expression (GTEx) project (Barbeira et al., 2018). Researchers have applied S-PrediXcan to study genetic mechanisms of multiple complex traits (Gamazon et al., 2019; Gamazon et al., 2018; Huckins et al., 2019). In contrast to the considerable research focusing on integrating gene-level eQTLs with GWAS summary statistics, little attention has been paid to integrating isoform-level eQTLs with GWAS summary statistics. Michael Gandal et al. estimated the candidate risk genes of three psychiatric disorders based on GWAS summary statistics and isoform-level expression profiles. They emphasized the importance of isoform-level gene regulatory mechanisms in defining cell type and disease specificity (Gandal et al., 2018), and similar analyses and conclusions were generated for Alzheimer’s disease (Fan et al., 2021).

Although much achievement has been attained, identifying independently phenotype-associated genes with high reliability remains challenging, especially for complex diseases. Conventional gene-based approaches mainly focus on variants close (say ±5 kilo base pairs) to genes boundary and omit the distal but important variant-gene associations. Gene-based association approaches using eQTLs to identify candidate susceptibility genes and isoforms might raise the utilization rate of the distal but important variant-gene associations. The present study aimed to build a more powerful conditional gene-based framework based on a new ECS and mainly guided by eQTLs. We also investigated whether isoform-level eQTLs in the phenotype-associated tissues can help predict more significant susceptibility genes than gene-level eQTLs. The main assumption is that isoform-level eQTLs may reflect the more real regulatory relationship than gene-level eQTLs. Thus, using isoform-level eQTLs can help predict novel susceptibility genes and isoforms that the conventional gene-based approaches and gene-level eQTLs strategy cannot find. The following is the formation procedure of the assumption: Gene-level and isoform-level eQTLs are predicted based on the gene-level and isoform-level (or transcript-level) expression profiles, and the gene-level expression profiles are computed by averaging the expression of multiple isoforms produced by the gene. Thus, gene-level expression profiles may omit the expression heterogeneity among these isoforms, neutralize the opposite effects, and lower the power of gene-level eQTLs.

Results

Overview of the present study

We previously proposed an effective chi-squared statistic called ECS for the unconditional and conditional gene-based association analysis (Li et al., 2019). Then we built a unified framework called DESE to estimate the potentially phenotype-associated tissues based on the conditional gene-based association analysis with ECS and gene selective expression analysis (Jiang et al., 2019). However, we found that the previous ECS was hindered by a potential type I error inflation issue and further undermined the accuracy of DESE. Here we proposed a new conditional gene-based association framework, eDESE (eQTL-guided DESE), which could also perform conditional gene-based association analysis and geneselective expression analysis, to systematically explore the susceptibility genes and tissues associated with complex diseases by using the GWAS summary statistics and multiple gene-variant mapping strategies. eDESE inherited the framework of DESE but had two important advantages over DESE. First, eDESE is built based on a new ECS, with which the type I error could be controlled within a proper level. Second, eDESE expands the conditional gene-based association analysis of DESE by not only using physically nearby SNPs but also using the gene-level and isoform-level cis-eQTLs associations (Figure 1).

Figure 1

Download asset Open asset

The advantages of eDESE over ECS and DESE.

First, we proposed a new ECS and chose the best exponent c between 1 and 2 to properly control the type I error. Second, we first adopted three strategies to map SNP to genes to perform the unconditional gene-based association analysis with the improved ECS. Then the unconditional gene-based association analysis results were put into the conditional gene-based association analysis and the following iterative procedure. 5 kb: 5000 base pairs. 1 Mb: 10⁶ base pairs.

To evaluate the performance of the new ECS and eDESE, we performed extensive simulations and a real data application to schizophrenia. Specifically, we organized the present study in several sequential parts that cover the optimizing the exponent of chi-squared statistics to control the type I error rates, applying the new ECS to perform conditional gene-based association analysis in simulation data and real-world schizophrenia GWAS summary statistics data. For simplicity and clarity, the model integrating different mapping strategies, that is, physically nearby SNPs (distance), gene-level and isoform-level variant-gene cis-eQTLs associations, were named eDESE:dist, eDESE:gene and eDESE:isoform, respectively.

Choose the favorable exponent c for the correlation matrix of chi-squared statistics to control the type I error rates

We found that the exponent c in the correlation matrix of chi-squared statistics could determine the deviation of the p-values produced by the ECS tests against the uniform distribution. As shown in Figure 2, the c = 1.0 led to deflated p-values while the c = 2.0 led to inflated p-values in the upper tail of the Q-Q plot against the uniform distribution. This pattern was independent of sample sizes, variant number and phenotype distribution (for binary or continuous traits) (Figure 2). The stable trend determined by the c value also implied that the favorable c, which could properly control the type I error rate, measured by the minimal mean log fold change (MLFC), must be within range 1 and 2. Besides, our theoretical derivation also demonstrated that the c value should be within range 1 and 2. Moreover, it seemed that given the c value, the distributions of p-values were similar at different sample sizes and phenotype distributions. Figure 3 shows the favorable c obtained by the grid search algorithm at 84 different scenarios. As shown in Figure 3, most majority p-values at the sample size 10,000 and 40,000 of binary or continuous phenotypes are overlapped. Again, the favorable c values were approximately independent of trait types, sample sizes and variant number. For the sake of simplicity, we proposed to use the averaged favorable c value, 1.432, and integrated it into the improved ECS.

Figure 2

Download asset Open asset

Q-Q plots of the p-value of the ECS test under null hypothesis based on the two extreme exponents (i.e. 1 and 2).

(a), (b), and (c) represent the variant number of 50, 100, and 500, respectively.

Figure 3

Download asset Open asset

The boxplots of the favorable c values at different simulation scenarios.

(a) Binary and continuous phenotypes; (b) Different sample sizes; (c) Different variant number.

The type I error and power of the conditional gene-based association analysis based on the new effective chi-squared statistics (ECS)

We then investigated the type I error and power of the conditional gene-based association analysis based on the improved ECS with the favorable exponent c value. As shown in Figure 4, in six different scenarios, the conditional p-values of the genes without truly casual loci approximately follow the uniform distribution U[0,1], regardless of the variance explained by its nearby genes. Moreover, the distribution of conditional p-value was similar to that produced by the conventional likelihood ratio test for the nested linear regression models. These results suggested that the conditional gene-based association analysis based on the improved ECS could produce valid p-values for statistical inference. In contrast, the unconditional association test based on the improved ECS produced an inflated p-value due to the indirect associations produced by the nearby causal genes in the LD block. Concerning the statistical power, we found that conditional gene-based association analysis based on the improved ECS produced smaller p-values than the likelihood ratio test (Figure 5), suggesting a higher statistical power of the former. Another advantage of conditional gene-based association analysis based on the improved ECS over the likelihood ratio test was that the former did not require individual genotypes. The reason might be that the degree of freedom in the latter was inflated by the LD among variants.

Figure 4

Download asset Open asset

Q-Q plots of the conditional, unconditional gene-based association test and likelihood-ratio test under the null hypothesis.

(a) and (d) two gene-variant pairs with the similar variant number (SIPA1L2 with 29 variants and LOC729336 with 30 variants). (b) and (e) two gene-variant pairs with the different variant number, and the first is larger than the second (CACHD1 with 41 variants and RAVER2 with eight variants). (c) and (f) Two gene-variant pairs with the different variant number, and the second is larger than the first (LOC647132 with five variants and FAM5C with 48 variants). (a), (b) and (c) The former gene has no QTL, and QTL explained 0.5 % of heritability in the latter gene. (d), (e) and (f) The former gene has no QTL, and QTL explained 1 % of heritability in the latter gene. Ten thousand phenotype datasets were simulated for each scenario. Unconditional Eff. Chi. (the red) represents unconditional association analysis at the former gene by the improved ECS. Conditional Eff. Chi (the blue) represents conditional association analysis at the former gene conditioning on the latter gene by the improved ECS. The likelihood ratio test (the yellow) was conducted based on the nested linear regression models.

Figure 5

Download asset Open asset

Q-Q plots of the conditional gene-based association test and likelihood-ratio test at different representative gene-variant pairs.

The variant number of the two gene-variant pairs involved in (a)-(f) are the same as that in Figure 4 legend. The difference is: in (a)-(c), the QTL in either gene (former and latter) explained 0.25% of heritability. In (d)-(f), the QTL in either gene explained 0.5% of heritability. One thousand phenotype datasets were simulated for each scenario.

Apply eDESE:dist to predict the potential susceptibility genes of schizophrenia

We had demonstrated that the conditional gene-based analysis based on the improved ECS was more powerful than the likelihood ratio test in each simulation scenario. To evaluate the performance of the ECS and eDESE in the real-world data, we used a recent large-scale GWAS summary statistics dataset (Trubetskoy et al., 2022) and gene expression profiles (GTEx v8) of multiple human tissues (Consortium, 2020) to identify the susceptibility genes of schizophrenia. We found that the improved ECS identified 739 significant genes without conditioning on gene-expression profiles. Furthermore, we also found 205 significant genes out of the above 739 genes identified by eDESE:dist based on the improved ECS by conditioning on the gene-level expression profiles (see details in Supplementary file 1a).

We then compared the significant susceptibility genes identified by eDESE:dist with that of MAGMA. We identified 619 significant susceptibility genes based on MAGMA. The significant gene count of MAGMA was about three times larger than that of eDESE:dist, which might partly result from the conditional gene-based analysis’s advantage of removing the redundantly associated genes. Besides, more than half of the significant susceptibility genes identified by eDESE:dist were also identified by MAGMA (Figure 6a).

Figure 6

Download asset Open asset

The comparison of the potential susceptibility genes for schizophrenia identified by MAGMA and eDESE:dist.

(a) The Venn diagram shows the overlapped and unique genes identified by MAGMA and eDESE:dist. (b) The bar plot shows the top GO enrichment terms of the overlapped genes. MF: Molecular Function of GO. BP: Biological Process terms of GO. CC: Cellular Component terms of GO. The x-axis label represents the top ( ≤ 10) significant GO enrichment terms (MF, BP, and CC). The y-axis label represents the negative log10 of the adjusted p-value of each term. See also Figure 6—source data 1, Figure 6—source data 2 and Figure 6—source data 3.

Figure 6—source data 1 Significant genes identified by MAGMA.: https://cdn.elifesciences.org/articles/70779/elife-70779-fig6-data1-v1.xlsx
Download elife-70779-fig6-data1-v1.xlsx
Figure 6—source data 2 Significant genes identified by eDESE:dist.: https://cdn.elifesciences.org/articles/70779/elife-70779-fig6-data2-v1.xlsx
Download elife-70779-fig6-data2-v1.xlsx
Figure 6—source data 3 Enrichment results of the common 105 genes identified by MAGMA and eDESE:dist.: https://cdn.elifesciences.org/articles/70779/elife-70779-fig6-data3-v1.xlsx
Download elife-70779-fig6-data3-v1.xlsx

Further, we performed Gene Ontology (GO) enrichment analysis to study the functional annotations of these significant genes. Interestingly, we found that most GO:BP and GO:CC enrichment terms of the overlapped genes were neuronal-, dendrite- or synaptic signaling-related terms. The GO:MF enrichment terms of the overlapped genes were all about signaling transduction (see examples in Figure 6b and details in Supplementary file 1b). We then found that the unique genes identified by eDESE:dist were enriched with three GO:CC terms, that is, dendrite, dendritic tree and distal axon, which were all dendrite-related terms. However, although the unique genes identified by MAGMA were enriched with thirty-one GO terms, none of these terms were neuronal-, dendrite-, or synaptic signaling-related terms. Moreover, systematic text-mining results in PubMed showed that 67 of the 205 (~32.7%) and 170 of the 619 (~27.5%) potential susceptibility genes had at least two search hits for eDESE:dist and MAGMA, respectively (see details in Supplementary file 1c). The GO enrichment results and the text-mining results both implied the utility of eDESE:dist.

Evaluate the type I error and power of gene-level eQTLs and isoform-level eQTLs in association analysis based on ECS

Next, we mapped variants to genes (or isoforms) according to their variant-gene/isoform eQTLs associations. Since the isoform-level and corresponding gene-level expression profiles were quantified based on the same RNA-sequencing data, we then investigated the type I error and power of the association analysis by ECS based on the gene-level and isoform-level eQTLs.

We considered the multiple different scenarios that variants affected phenotype through regulating the gene expression. We simulated genotype data, gene-level and isoform-level expression data and corresponding phenotype data. In Table 1, AllVar means that all variants are used in the gene-based association analysis based on ECS. IsoeQTL denotes that the eQTL is associated with a susceptibility isoform. GeneQTL denotes that the eQTL is associated with a gene whose expression is averaged from three susceptibility isoforms (homogeneity). Gen3eQTL and Gen6eQTL denote that the eQTL is associated with a gene whose expression is averaged by three and six isoforms, one of which is the susceptibility isoform (heterogeneity), respectively.

Table 1

Type I error and power of different simulation scenarios in association analysis.

Scenarios	Important parameters			Binary trait					Continuous trait
	Eg	Vgp	Vge	AllVar	IsoeQTL	GeneQTL	Gen3eQTL	Gen6eQTL	AllVar	IsoeQTL	GeneQTL	Gen3eQTL	Gen6eQTL
				Type I error
1	0	0	0.05	0	0	0	0	0.002	0	0	0.001	0.002	0.003
2	0	0	0.15	0	0.002	0.001	0	0	0	0	0	0.001	0
3	0	0	0.3	0	0	0	0.002	0	0	0.001	0.001	0.002	0.002
				Power
4	0	0.005	0.05	0.251	0.036	0.022	0.034	0.031	0.246	0.032	0.019	0.032	0.038
5	0	0.005	0.15	0.219	0.021	0.013	0.023	0.032	0.301	0.025	0.017	0.037	0.043
6	0	0.005	0.3	0.229	0.028	0.017	0.021	0.034	0.282	0.024	0.017	0.025	0.039
7	0.1	0	0.05	0	0.017	0.019	0.006	0.001	0	0.017	0.027	0.009	0.001
8	0.1	0	0.15	0	0.213	0.221	0.113	0.054	0.002	0.245	0.245	0.132	0.068
9	0.1	0	0.3	0.018	0.704	0.659	0.581	0.388	0.027	0.72	0.686	0.607	0.446
10	0.1	0.005	0.05	0.288	0.052	0.076	0.043	0.043	0.313	0.063	0.091	0.05	0.041
11	0.1	0.005	0.15	0.403	0.33	0.302	0.199	0.134	0.46	0.357	0.334	0.229	0.136
12	0.1	0.005	0.3	0.569	0.778	0.738	0.677	0.485	0.62	0.805	0.774	0.712	0.512

Eg denotes the effect size of gene expression on phenotype. Vgp denotes phenotype variance explained by all variants. Vge denotes gene expression variance explained by all variants.

As shown in Table 1, our simulations' type I error rates are controlled within 0% ~ 0.3% (on average <0.1%) according to the p-value threshold 0.001 (scenarios 1–3). As expected, in scenarios 4–6, where gene expression cannot affect the phenotype (Eg = 0 in Table 1), AllVar is much more powerful than eQTLs. Further, in scenarios 7–12, where gene expression can affect the phenotype, our results suggest that the powers of eQTLs roughly increase with the phenotype variance explained by all variants (Vgp in Table 1) and gene expression variance explained by all variants (Vge in Table 1). Moreover, in scenarios 7–12, associations test totally based on the susceptibility isoforms (IsoeQTL and GeneQTL in Table 1) are more powerful than those based on the gene-level eQTLs. While IsoeQTL is computed based on fewer (i.e. one) susceptibility isoforms than GeneQTL (i.e. three), the power of IsoeQTL is the equal of GeneQTL. Thus, our simulation results revealed that isoform-level eQTLs were more powerful than gene-level eQTLs in association analysis in the scenarios that gene expression could affect the phenotype.

Estimate the potentially phenotype-associated tissues for schizophrenia

Like DESE, eDESE can produce phenotype-associated genes and tissues. Therefore, we firstly adopted eDESE:dist to predict the phenotype-associated tissues of schizophrenia and found that all thirteen brain regions were significantly associated with schizophrenia and ranked the top based on the gene-level and isoform-level expression profiles, respectively (see details in Supplementary file 1d).

Since all thirteen brain regions were predicted as the potentially phenotype-associated tissues of schizophrenia by eDESE:dist, removing the possible false positives would be necessary. Then we resorted to the eQTLs and assumed that if a tissue (say T₁) is a phenotype-associated tissue, potential susceptibility genes identified based on the eQTLs of T₁ will be more likely to be phenotype-associated genes and selectively express in T₁ or similar tissues. We then computed the gene-level and isoform-level eQTLs of all thirteen brain regions and predicted the potentially phenotype-associated tissues using eDESE:gene and eDESE:isoform, respectively. Our results showed that the brain (all thirteen brain regions as a whole) was predicted as the schizophrenia-associated tissue based on the gene-level eQTLs of twelve brain regions, respectively (Table 2). However, on the more precise resolution, we found brain was predicted as schizophrenia-associated tissue only based on the isoform-level eQTLs of five brain regions, respectively (Table 2). Thus, the five brain regions were collectively optimized as the potentially phenotype-associated tissues of schizophrenia by eDESE:dist, eDESE:gene, and eDESE:isoform.

Table 2

The result about whether the brain was optimized as the schizophrenia-associated tissue based on each brain region’s gene/isoform-level eQTLs.

Brain regions	Gene-level eQTL	Isoform-level eQTL
Brain-Anterior cingulate cortex (BA24)	Yes	Yes
Brain-Cerebellum	Yes	Yes
Brain-Frontal Cortex (BA9)	Yes	Yes
Brain-Hippocampus	Yes	Yes
Brain-Spinal cord (cervical c-1)	Yes	Yes
Brain-Amygdala	Yes	No
Brain-Caudate (basal ganglia)	Yes	No
Brain-Cerebellar Hemisphere	Yes	No
Brain-Cortex	Yes	No
Brain-Hypothalamus	Yes	No
Brain-Nucleus accumbens (basal ganglia)	Yes	No
Brain-Putamen (basal ganglia)	No	No
Brain-Substantia nigra	Yes	No

“Yes” denotes that brain (i.e., all thirteen brain tissues) was estimated as the significantly schizophrenia-associated tissue based on the gene/isoform-level eQTLs of the tissue. “No” denotes the contrary. The font names of the optimized brain regions are bold. See also Table 2—source data 1, Table 2—source data 2, Table 2—source data 3 and Table 2—source data 4.

Table 2—source data 1 Tissue significance estimated by eDESE:dist based on the gene-level expression profiles.: https://cdn.elifesciences.org/articles/70779/elife-70779-table2-data1-v1.xlsx
Download elife-70779-table2-data1-v1.xlsx
Table 2—source data 2 Tissue significance estimated by eDESE:dist based on the isoform-level expression profiles.: https://cdn.elifesciences.org/articles/70779/elife-70779-table2-data2-v1.xlsx
Download elife-70779-table2-data2-v1.xlsx
Table 2—source data 3 Tissue significance estimated by eDESE:gene based on the gene-level eQTLs of each brain region.: https://cdn.elifesciences.org/articles/70779/elife-70779-table2-data3-v1.xlsx
Download elife-70779-table2-data3-v1.xlsx
Table 2—source data 4 Tissue significance estimated by eDESE:isoform based on the isoform-level eQTLs of each brain region.: https://cdn.elifesciences.org/articles/70779/elife-70779-table2-data4-v1.xlsx
Download elife-70779-table2-data4-v1.xlsx

In contrast, we predicted schizophrenia-associated tissues based on the gene-level and isoform-level eQTLs of Muscle Skeletal and Skin Sun Exposed Lower Leg, whose sample sizes were bigger than all the other tissues in GTEx(v8). The prediction results of Muscle Skeletal and Skin Sun Exposed Lower Leg showed that neither of them was predicted to be significant tissues of schizophrenia by using their gene-level and isoform-level eQTLs (see details in Supplementary file 1e), which demonstrated, at least partly, our assumption. Thus, our results showed the utility of integrating the three models of eDESE for optimizing the phenotype-associated tissues.

The comparison of eDESE:isoform versus eDESE:gene and S-PrediXcan

eDESE:isoform can identify more potential susceptibility genes

Our simulation study demonstrated that association analysis based on the improved ECS with the isoform-level eQTLs was more powerful than with the gene-level eQTLs. We further tested this finding in the real data and identified the potential susceptibility genes for schizophrenia using the eDESE:isoform and eDESE:gene, respectively. We found the number of potential susceptibility genes identified by eDESE:isoform was larger than that of eDESE:gene and S-PrediXcan under the same adjustment filter cutoff (Figure 7a, see details in Supplementary file 1fg and h). We further combined the potential susceptibility genes of the five optimized brain regions identified by S-PrediXcan, eDESE:gene and eDESE:isoform, respectively. Still, we found that the susceptibility genes exclusively identified by eDESE:isoform were the most among the three models (Figure 7b).

Figure 7

Download asset Open asset

Comparison of the potential susceptibility genes identified by S-PrediXcan, eDESE:gene, and eDESE:isoform.

(a) The bar plot shows the count of potential susceptibility genes in each of the five optimized brain regions. (b) The Venn diagram shows the count of the overlapped and unique genes identified by S-PrediXcan, eDESE:gene and eDESE:isoform in the five optimized brain regions. See also Figure 7—source data 1 and Figure 7—source data 2.

Figure 7—source data 1 The count of potential susceptibility genes in each of the five optimized brain regions.: https://cdn.elifesciences.org/articles/70779/elife-70779-fig7-data1-v1.xlsx
Download elife-70779-fig7-data1-v1.xlsx
Figure 7—source data 2 The potential susceptibility genes identified by S-PrediXcan, eDESE:gene and eDESE:isoform in the five optimized brain regions.: https://cdn.elifesciences.org/articles/70779/elife-70779-fig7-data2-v1.xlsx
Download elife-70779-fig7-data2-v1.xlsx

Potential susceptibility genes identified by eDESE:isoform were supported by more published studies

Then we searched the PubMed database with the combined susceptibility gene set (i.e. 391 genes for S-PrediXcan, 633 genes for eDESE:gene, and 854 genes for eDESE:isoform, Figure 7b) of all five optimized brain regions. We found that 135 genes for S-PrediXcan, 170 genes for eDESE:gene and 247 genes for eDESE:isoform had at least one search hit which reported the associations of these genes with schizophrenia in PubMed database, respectively (see details in Supplementary file 1ij and k).

We also found 138 of the 524 (26.3%) potential susceptibility genes exclusively predicted by eDESE:isoform each had a least one search hit. Moreover, we found that 19 of the 524 (3.6%) potential susceptibility genes each had at least 10 supported papers in PubMed (Table 3, see details in ). Interestingly, TCF4 (transcription factor 4), RGS4 (regulator of G protein signaling 4) and RANGAP1 (Ran GTPase activating protein 1) were reported by more than 100 papers. RGS4 is reported to be biased expressed in brain (O’Leary et al., 2016). TCF4 and RANGAP1 are broadly expressed in the brain, and TCF4 may play an important role in nervous system development (O’Leary et al., 2016).

Table 3

The important examples of potential susceptibility genes exclusively predicted by eDESE:isoform.

Gene name	# of hits in PubMed
RGS4	> 100
TCF4	> 100
RANGAP1	> 100
GRIA1	80
GRM3	76
TSPO	39
TPH2	35
FEZ1	31
ZDHHC8	24
VRK2	23
KCNN3	20
NCAM1	20
MIP	15
SLC39A8	14
DLG1	14
BDNF-AS	13
FGA	13
ADRA1A	12
MAPT	10

Table 3—source data 1 The PubMed search hits of the unique potential susceptibility genes of schizophrenia identified by eDESE:isoform (compared with S-PrediXcan and eDESE:gene).: https://cdn.elifesciences.org/articles/70779/elife-70779-table3-data1-v1.xlsx
Download elife-70779-table3-data1-v1.xlsx

Furthermore, we applied the Hetionet (v1.0) (Himmelstein et al., 2017), which encodes knowledge from millions of biomedical studies to connect diseases, genes, anatomies and more, to investigate the above top associations. We set the source node as the susceptibility gene and the target node as schizophrenia using the ‘Connectivity search’ function. We found that RGS4 and RANGAP1 both had multiple significant meta path types indicating their potential associations and mechanisms associated with schizophrenia.

Potential susceptibility genes identified by eDESE:isoform were enriched with more biologically sensible GO terms

Next, we performed the GO enrichment analysis and found that the potential susceptibility genes identified by eDESE guided by the eQTLs (eDESE:gene and eDESE:isoform) had more biologically sensible GO enrichment terms than S-PrediXcan (Table 4). In addition, the GO enrichment results also showed that the potential susceptibility genes identified by eDESE:isoform were enriched with more neuronal, dendritic or synaptic signaling-related biological process GO terms than S-PrediXcan and eDESE:gene.

Table 4

The GO enrichment terms of the potential susceptibility genes in each optimized brain region identified by S-PrediXcan, eDESE:gene and eDESE:isoform.

Tissue name	*S-PrediXcan	eDESE:gene	eDESE:isoform
Brain-Anterior cingulate cortex (BA24)	-	Regulation of gap junction assembly (BP)	Potassium ion transmembrane transporter activity (MF); potassium: chloride symporter activity (MF); nervous system development (BP); generation of neurons (BP); neurogenesis (BP)
Brain-Cerebellum	Ion binding (MF); cation binding (MF); metal ion binding (MF); intracellular organelle (CC)	Dendrite (CC); dendritic tree (CC); neuron projection (CC); postsynapse (CC); synapse (CC)	Voltage-gated calcium channel activity involved in cardiac muscle cell action potential (MF); nervous system development (BP)
Brain-Frontal Cortex (BA9)	Intracellular organelle (CC); intracellular membrane-bounded organelle (CC)	Somatodendritic compartment (CC); dendrite (CC); dendritic tree (CC); synaptic vesicle membrane (CC); exocytic vesicle membrane (CC)	-
Brain-Hippocampus	Ion binding (MF)	-	Postsynaptic density (CC); asymmetric synapse (CC)
Brain-Spinal cord (cervical c-1)	-	High voltage-gated calcium channel activity (MF); voltage-gated calcium channel activity involved in AV node cell action potential (MF); regulation of B cell tolerance induction (BP); positive regulation of B cell tolerance induction (BP); L-type voltage-gated calcium channel complex (CC)	Nitrogen compound transport (BP); organelle (CC)

*

MF: Molecular Function terms of GO. BP: Biological Process terms of GO. CC: Cellular Component terms of GO.

We further performed the GO enrichment analysis based on the combined gene lists of all five optimized brain regions for S-PrediXcan, eDESE:gene and eDESE:isoform. However, we found that the 55 genes commonly identified by the three models were enriched with no GO term. Furthermore, the 240 unique genes identified by S-PrediXcan were also enriched with no GO term. On the other hand, the 319 unique genes identified by the eDESE:gene were enriched with ‘integral component of synaptic vesicle membrane’ (CC) and several general GO terms. In comparison, the 524 unique genes identified by eDESE:isoform were enriched with a considerable number of GO terms, in which a few neuronal, dendritic or synaptic signaling-related GO terms were found (Supplementary file 1l).

Potential susceptibility genes identified by eDESE:isoform were significantly enriched in a biologically sensible consensus module in the brain weighted gene co-expression network

We then tested the enrichment of the potential susceptibility genes in the consensus modules of the brain weighted gene co-expression network. We found that the potential susceptibility genes identified by eDESE:gene and eDESE:isoform based on the gene-level and isoform-level eQTLs of Brain-Cerebellum were both enriched with a brain consensus module (colored ‘turquoise’) with statistical significance, that is, adjusted p-value = 0.0046 and 0.0061, respectively. Besides, the potential susceptibility genes identified by eDESE:gene based on the gene-level eQTLs of Brain-Frontal Cortex (BA9) were also enriched (adjusted p-value = 0.024) with the consensus module colored ‘turquoise’. However, no enriched consensus module was found for the potential susceptibility genes identified by S-PrediXcan. Moreover, we found that the ‘turquoise’ consensus module contained 7,726 genes and was enriched with plenty of neuronal and synaptic signaling-related GO terms (Figure 8, see details in Supplementary file 1m).

Figure 8

Download asset Open asset

The GO enrichment terms of the genes in the consensus module (colored ‘turquoise’).

The x-axis label represents the top (≤10) significant GO enrichment terms (in MF, BP, and CC). The y-axis label represents the negative log10 of the adjusted p-value of each term. See also Figure 8—source data 1 and Figure 8—source data 2.

Figure 8—source data 1 Genes in the consensus module (colored "turquoise").: https://cdn.elifesciences.org/articles/70779/elife-70779-fig8-data1-v1.xlsx
Download elife-70779-fig8-data1-v1.xlsx
Figure 8—source data 2 Enrichment results of the genes in the consensus module (colored ‘turquoise’).: https://cdn.elifesciences.org/articles/70779/elife-70779-fig8-data2-v1.xlsx
Download elife-70779-fig8-data2-v1.xlsx

eDESE:isoform can predict the potential susceptibility isoforms of corresponding phenotype-associated tissues

As shown in Figure 7b, 55 genes were collectively predicted to be susceptible to schizophrenia by S-PrediXcan, eDESE:gene, and eDESE:isoform. As eDESE:isoform could output susceptibility gene-isoform pairs for corresponding tissues, we further got the corresponding susceptibility isoforms of the 55 genes (see details in Supplementary file 1n). Interestingly, we found that the number or type of susceptibility isoforms for a gene varied greatly in the different optimized brain regions. For example, NAGA and CHRNA2 were reported to be associated with schizophrenia by three and seven research papers in the PubMed database, respectively. ENST00000407991 of CHRNA2 was significantly associated with schizophrenia only in Brain-Cerebellum, while ENST00000396398 of NAGA was significantly associated with schizophrenia in all the five optimized brain regions. We also found that different isoforms of the same gene were predicted to be significantly associated with schizophrenia in the different optimized brain regions. For example, SNX19 was reported to be associated with schizophrenia by seven research papers. ENST00000527116 of SNX19 in Brain-Anterior cingulate cortex (BA24), ENST00000528555 of SNX19 in Brain-Cerebellum, ENST00000265909 of SNX19 in Brain-Frontal Cortex (BA9) and Brain-Hippocampus, and ENST00000526579 of SNX19 in Brain-Spinal cord (cervical c-1) were significantly associated with schizophrenia, respectively.

The above comparisons suggested that incorporating isoform-level eQTLs can help eDESE predict more potential susceptibility genes than gene-level eQTLs and S-PrediXcan in each optimized brain region. Our results further pointed that eDESE:isoform could help find some novel, biologically sensible susceptibility genes which S-PrediXcan and eDESE:gene cannot find. Moreover, we also found that the potential susceptibility genes identified by eDESE were enriched with a consensus module in the brain weighted gene co-expression network, which was significantly enriched with plenty of biologically sensible GO terms. Based on the isoform-level eQTLs of each phenotype-associated brain region, eDESE:isoform can also help gain insight into the potential susceptibility isoforms. Thus, our results revealed the potential advantages of eDESE:isoform, at least partly, over eDESE:gene and S-PrediXcan.

The druggability of the potential susceptibility genes

Since drug target genes with genetic support are twice or as likely to be approved than target genes with no known genetic associations (King et al., 2019; Nelson et al., 2015), we searched the DrugBank database (Wishart et al., 2018) and found that seven potential susceptibility genes identified by eDESE in total (0.976% for eDESE:dist, 0.632% for eDESE:gene and 0.703% for eDESE:isoform) were the target genes of multiple antipsychotics (Table 5). Several popular target genes, such as DRD2 and ADRA1A, were identified by different eDESE models. Besides, we found that three genes (0.485%) of the 619 potential susceptibility genes identified by MAGMA were also the target genes of multiple antipsychotics. One gene (0.256%) of the 391 potential susceptibility genes identified by S-PrediXcan was the target gene of an atypical antipsychotic (i.e. paliperidone). The results suggested that the three models of eDESE could complement each other to identify more susceptibility genes which could be the target genes of the therapeutic drugs.

Table 5

The target genes of the antipsychotics predicted as the potential susceptibility genes by MAGMA, S-PrediXcan, and eDESE.

Target gene	Models
DRD2	eDESE:dist & eDESE:gene & eDESE:isoform & MAGMA
ADRA1A	eDESE:isoform
CHRM3	eDESE:gene & eDESE:isoform
CHRM4	eDESE:gene & eDESE:isoform& MAGMA
OPRD1	eDESE:dist
GABRD	eDESE:isoform
CYP2D6	eDESE:gene & eDESE:isoform& MAGMA & S-PrediXcan

To further investigate the drug-gene interactions of the potential susceptibility genes, we searched the Drug Gene Interaction database (DGIdb v4.2.0) (Freshour et al., 2021) and kept the drug-gene interaction terms with at least one supported PubMed paper. After the filtration, we kept 30,072 unique drug-gene interaction terms and found 679 unique drug-gene interaction terms for 34 antipsychotics (see details in Supplementary file 1o). Then we put the combined potential susceptibility gene list (identified by MAGMA, S-PrediXcan, eDESE:dist, eDESE:gene and eDESE:isoform, respectively) into DGIdb to investigate if the ‘antipsychotic’-‘potential susceptibility gene’ interaction terms were enriched in the known drug-gene interaction database, that is, DGIdb. As shown in Table 6, we found that ‘antipsychotic’ - ‘potential susceptibility gene’ interaction terms identified by MAGMA and the three models of eDESE were all significantly enriched in DGIdb. We also found 452 drug-gene interaction terms for the susceptibility gene identified by S-PrediXcan. However, only 12 ‘antipsychotic’- ‘potential susceptibility gene’ interaction terms were found (hypergeometric distribution test p-value = 0.33).

Table 6

The enrichment of drug-gene interaction terms in DGIdb for the susceptibility genes identified by MAGMA, S-PrediXcan and eDESE.

Models	# of antipsychotics-gene interaction terms	# of total drug-gene interaction terms	Enrichment p*
MAGMA	57	937	1.62e-11
S-PrediXcan	12	452	0.33
eDESE:dist	34	279	1.56e-15
eDESE:gene	56	968	1.65e-10
eDESE:isoform	70	1,104	8.74e-15

*

Enrichment p denotes the p-value of the hypergeometric distribution test. See also Table 6—source data 1, Table 6—source data 2, Table 6—source data 3, Table 6—source data 4, and Table 6—source data 5.

Table 6—source data 1 The drug-gene interaction term results of the potential susceptibility genes of schizophrenia identified by MAGMA in DGIdb.: https://cdn.elifesciences.org/articles/70779/elife-70779-table6-data1-v1.xlsx
Download elife-70779-table6-data1-v1.xlsx
Table 6—source data 2 The drug-gene interaction term results of the potential susceptibility genes of schizophrenia identified by S-PrediXcan in DGIdb.: https://cdn.elifesciences.org/articles/70779/elife-70779-table6-data2-v1.xlsx
Download elife-70779-table6-data2-v1.xlsx
Table 6—source data 3 The drug-gene interaction term results of the potential susceptibility genes of schizophrenia identified by eDESE:dist in DGIdb.: https://cdn.elifesciences.org/articles/70779/elife-70779-table6-data3-v1.xlsx
Download elife-70779-table6-data3-v1.xlsx
Table 6—source data 4 The drug-gene interaction term results of the potential susceptibility genes of schizophrenia identified by eDESE:gene in DGIdb.: https://cdn.elifesciences.org/articles/70779/elife-70779-table6-data4-v1.xlsx
Download elife-70779-table6-data4-v1.xlsx
Table 6—source data 5 The drug-gene interaction term results of the potential susceptibility genes of schizophrenia identified by eDESE:isoform in DGIdb.: https://cdn.elifesciences.org/articles/70779/elife-70779-table6-data5-v1.xlsx
Download elife-70779-table6-data5-v1.xlsx

Moreover, we investigated the potential druggability of the potential susceptibility genes. Among the 42 potentially druggable gene categories in DGIdb, we found that the top potentially druggable category for the potential susceptibility genes identified by MAGMA, S-PrediXcan and eDESE all were the “DRUGGABLE GENOME”. The number of “DRUGGABLE GENOME” genes were 168 (27.1%) for MAGMA, 86 (22.0%) for S-PrediXcan, 62 (30.2%) for eDESE:dist, 136 (21.5%) for eDESE:gene and 207 (24.2%) for eDESE:isoform.

Taken together, compared with MAGMA and S-PrediXcan, our results showed that the potential susceptibility genes identified by eDESE were more likely to be the target genes of therapeutic drugs. Besides, the application of eQTLs (especially the isoform-level eQTLs) could aid eDESE in identifying more potentially druggable genes.

Discussion

In this study, we proposed a multi-strategy conditional gene-based association framework, eDESE, based on a new effective chi-squared statistic (ECS) to identify the potential susceptibility tissues, genes, and isoforms for complex diseases. Compared with the unconditional association test based on the new ECS and likelihood ratio test, the conditional association test based on the new ECS showed a lower type I error rate and higher statistical power. Except the improved ECS, eDESE has another advantage of using three mapping strategies, that is, mapping based on physical position and the gene-level and isoform-level variant-gene eQTLs associations, to perform the gene-based association analysis. We implemented the three mapping strategies in corresponding three conditional gene-based association models, that is, eDESE:dist, eDESE:gene, and eDESE:isoform.

eDESE:dist and conventional MAGMA both map variants to genes based on their physical distance to perform gene-based association analysis. eDESE:dist differs from MAGMA mainly in that the former is a conditional gene-based association analysis based on the improved ECS. Although eDESE:dist predicted a smaller size of potential susceptibility genes than MAGMA, more than half of the genes identified by eDESE:dist were also identified by MAGMA. The overlapped potential susceptibility genes (accounting for 51.2% in eDESE:dist and 17.0% in MAGMA) were enriched with plenty of neuronal- or synaptic signaling-related GO terms. Similar results of a recent large-scale GWAS were obtained by gene-set analyses (Trubetskoy et al., 2022; Legge et al., 2021). The comparison of MAGMA and eDESE:dist revealed the advantage of conditional association analysis, that is, removing the redundantly associated genes. In addition, our results showed that the unique potential susceptibility genes identified by eDESE:dist were enriched with more biologically sensible GO terms than MAGMA. Moreover, the proportion of the susceptibility genes identified by eDESE:dist supported by research papers were more than that of MAGMA. Besides, eDESE:dist could produce a potential susceptibility tissue list for complex diseases because of the involvement of gene selective expression analysis.

Nevertheless, eDESE:dist might omit some distal but important variant-gene associations. We thus expanded eDESE by using eDESE:gene and eDESE:isoform. Using a simulation study, we firstly demonstrated that isoform-level eQTLs were more powerful than gene-level eQTLs in association analysis based on the improved ECS. Then taking schizophrenia as an example, we predicted the potential susceptibility tissues by integrating the results of eDESE:dist, eDESE:gene and eDESE:isoform based on the principle of ‘wisdom of the crowds’ and finally optimized five brain regions. Interestingly, four of the five optimized brain regions (i.e. Brain-Anterior cingulate cortex (BA24), Brain-Cerebellum, Brain-Frontal Cortex (BA9), and Brain-Hippocampus) were also reported by the Schizophrenia Working Group of the Psychiatric Genomics Consortium, in which the genes with high relative specificity for bulk expression were strongly enriched with schizophrenia associations (Trubetskoy et al., 2022).

Furthermore, we compared the potential susceptibility genes identified by eDESE:isoform with eDESE:gene and S-PrediXcan, based on the eQTLs of the five optimized brain regions, to demonstrate its potential gains and insights in the biology of schizophrenia. We found that eDESE:isoform could identify more susceptibility genes supported by the published papers and enriched with biologically sensible GO biological process terms. Interestingly, we also found susceptibility genes identified by eDESE:isoform in Brain-Cerebellum and eDESE:gene in both Brain-Cerebellum and Brain-Frontal Cortex (BA9), to be enriched with the same consensus module, which was stunningly enriched with plenty of neuronal- or synaptic signaling-related GO terms. Moreover, to our best knowledge, eDESE:isoform is the first conditional gene-based association model to produce a list of phenotype-associated isoforms (or transcripts) for complex diseases. The comparisons among eDESE:isoform, eDESE:gene, and S-PrediXcan revealed that our model, especially the eDESE:isoform, can gain more potential insights for schizophrenia biology.

To further investigate if eDESE can help gain more insights into the gene druggability, we compared the drug-gene interaction terms and potentially druggable category of the susceptibility genes identified by MAGMA, S-PrediXcan and eDESE. Seven susceptibility genes identified by eDESE were found to be the target genes of multiple FDA-approved antipsychotics, and six (85.7%) of the seven target genes were found by eDESE:isoform. On the other hand, only one and three susceptibility genes identified by S-PrediXcan and MAGMA were found to be the target genes of several FDA-approved antipsychotics. In addition, the ‘antipsychotics’-‘susceptibility gene’ interactions terms for MAGMA and eDESE were both significantly enriched in the known drug-gene interaction database (DGIdb). Moreover, we found that the number of potential susceptibility genes identified by eDESE:isoform and denoted as ‘DRUGGABLE GENOME’ in DGIdb, were the most among MAGMA, S-PrediXcan, and eDESE. The comparison revealed that the druggablilty of the potential susceptibility genes identified by eDESE, especially the eDESE:isoform, provided more credible supports for the utility of eDESE.

Our framework might have three potential applications. First, eDESE can be used to predict the potential susceptibility genes and isoforms for other complex diseases. Second, eDESE can help optimize the more biologically sensible phenotype-associated tissues for other complex diseases. Third, the expression profiles used by eDESE can be replaced with other profiles, such as drug-induced gene expression profiles, to investigate the potential drug mechanism of action (MOA).

The present study was also limited by several factors. First, the moderate sample size in GTEx (v8) might lead the gene/isoform-level eQTLs to be underpowered. Future genetic studies based on the increased sample sizes might alleviate this problem. Second, the size of the susceptibility genes identified by the eDESE:gene and eDESE:isoform was a little larger than that of the conventional studies. The main reasons might lie in the gene-level and isoform-level eQTLs selected based on a lenient threshold (p-value < 0.01) to involve more eQTLs in the association analysis, which was taken as the remedy against the underpowered eQTLs. As shown in the present study, conventional MAGMA also identified 619 significant genes. The comparison of MAGMA and eDESE showed the advantages of eDESE in GO enrichment analysis, text-mining analyses and druggability. Although these potential susceptibility genes identified by eDESE lacked systematically experimental validation, we shared these genes in Supplementary file 1ag and h and encouraged the follow-up studies to evaluate their functions and roles in the development of schizophrenia. Future studies based on increased sample sizes with a stricter threshold to select the eQTLs can reduce the potential susceptibility gene count, and this is also a topic for our future exploration. Third, the performance of conditional association analysis for fine-mapping can be greatly influenced by the gene orders of entering the analysis. Although the improved effective chi-squared test with eDESE can work well in the real data of schizophrenia, integrating some non-conditional fine-mapping methods, such as SuSiE, with eDESE is worth trying. Fourth, the optimal c value of the effective chi-squared test is still empirical although we derived its range and relevant factors. The optimal c value can be improved to be better suited for other specific application scenarios.

In conclusion, in this study, we proposed a new statistical framework to predict potential susceptibility genes for complex diseases based on the GWAS summary statistics and three different variant-gene mapping strategies. The application of our framework to schizophrenia revealed many novel susceptible and druggable candidate genes. Besides, our results suggested that the usage of isoform-level eQTLs can be an important supplement for the conventional gene-based approaches. The framework was packaged and implemented in our integrative platform KGGSEE. We hope our framework can facilitate researchers to gain more insights into the susceptibility genes, isoforms and tissues for complex diseases.

Share this article

Cite this article

The advantages of eDESE over ECS and DESE.

Q-Q plots of the p-value of the ECS test under null hypothesis based on the two extreme exponents (i.e. 1 and 2).

The boxplots of the favorable c values at different simulation scenarios.

Q-Q plots of the conditional, unconditional gene-based association test and likelihood-ratio test under the null hypothesis.

Q-Q plots of the conditional gene-based association test and likelihood-ratio test at different representative gene-variant pairs.

The comparison of the potential susceptibility genes for schizophrenia identified by MAGMA and eDESE:dist.

Figure 6—source data 1

Figure 6—source data 2

Figure 6—source data 3

Type I error and power of different simulation scenarios in association analysis.

The result about whether the brain was optimized as the schizophrenia-associated tissue based on each brain region’s gene/isoform-level eQTLs.

Table 2—source data 1

Table 2—source data 2

Table 2—source data 3

Table 2—source data 4

Comparison of the potential susceptibility genes identified by S-PrediXcan, eDESE:gene, and eDESE:isoform.

Figure 7—source data 1

Figure 7—source data 2

The important examples of potential susceptibility genes exclusively predicted by eDESE:isoform.

Table 3—source data 1

The GO enrichment terms of the potential susceptibility genes in each optimized brain region identified by S-PrediXcan, eDESE:gene and eDESE:isoform.

The GO enrichment terms of the genes in the consensus module (colored ‘turquoise’).

Figure 8—source data 1

Figure 8—source data 2

The target genes of the antipsychotics predicted as the potential susceptibility genes by MAGMA, S-PrediXcan, and eDESE.

The enrichment of drug-gene interaction terms in DGIdb for the susceptibility genes identified by MAGMA, S-PrediXcan and eDESE.

Table 6—source data 1

Table 6—source data 2

Table 6—source data 3

Table 6—source data 4

Table 6—source data 5

The effect sizes of variants to the correlation of chi-squares.

Author details

Xiangyi Li

Contribution

Contributed equally with

Competing interests

Lin Jiang

Contribution

Contributed equally with

Competing interests

Chao Xue

Contribution

Competing interests

Mulin Jun Li

Contribution

Competing interests

Miaoxin Li

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading