Sparse dimensionality reduction approaches in Mendelian randomization with highly correlated exposures

  1. Vasileios Karageorgiou  Is a corresponding author
  2. Dipender Gill
  3. Jack Bowden
  4. Verena Zuber
  1. University of Exeter, United Kingdom
  2. Imperial College London, United Kingdom

Abstract

Multivariable Mendelian randomization (MVMR) is an instrumental variable technique that generalizes the MR framework for multiple exposures. Framed as a linear regression problem, it is subject to the pitfall of multi-collinearity. The bias and efficiency of MVMR estimates thus depends heavily on the correlation of exposures. Dimensionality reduction techniques such as principal component analysis (PCA) provide transformations of all the included variables that are effectively uncorrelated. We propose the use of sparse PCA (sPCA) algorithms that create principal components of subsets of the exposures with the aim of providing more interpretable and reliable MR estimates. The approach consists of three steps. We first apply a sparse dimension reduction method and transform the variant-exposure summary statistics to principal components. We then choose a subset of the principal components based on data-driven cutoffs, and estimate their strength as instruments with an adjusted F-statistic. Finally, we perform MR with these transformed exposures. This pipeline is demonstrated in a simulation study of highly correlated exposures and an applied example using summary data from a genome-wide association study of 97 highly correlated lipid metabolites. As a positive control, we tested the causal associations of the transformed exposures on CHD. Compared to the conventional inverse-variance weighted MVMR method and a weak-instrument robust MVMR method (MR GRAPPLE), sparse component analysis achieved a superior balance of sparsity and biologically insightful grouping of the lipid traits.

Data availability

The GWAS summary statistics for the metabolites (http://www.computationalmedicine.fi/data/NMR_GWAS/) and CHD(http://www.cardiogramplusc4d.org/) are publicly available. We provide code for the SCA function, the simulation study and related documentation on github (https://github.com/vaskarageorg/SCA_MR/).

Article and author information

Author details

  1. Vasileios Karageorgiou

    University of Exeter, Exeter, United Kingdom
    For correspondence
    vk282@exeter.ac.uk
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7173-9967
  2. Dipender Gill

    Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
    Competing interests
    Dipender Gill, is a part-time employee of Novo Nordisk.
  3. Jack Bowden

    University of Exeter, Exeter, United Kingdom
    Competing interests
    No competing interests declared.
  4. Verena Zuber

    Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
    Competing interests
    No competing interests declared.

Funding

State Scholarships Foundation

  • Vasileios Karageorgiou

Expanding Excellence in England

  • Vasileios Karageorgiou
  • Jack Bowden

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2023, Karageorgiou et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,013
    views
  • 124
    downloads
  • 10
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Vasileios Karageorgiou
  2. Dipender Gill
  3. Jack Bowden
  4. Verena Zuber
(2023)
Sparse dimensionality reduction approaches in Mendelian randomization with highly correlated exposures
eLife 12:e80063.
https://doi.org/10.7554/eLife.80063

Share this article

https://doi.org/10.7554/eLife.80063

Further reading

    1. Chromosomes and Gene Expression
    2. Genetics and Genomics
    Omid Gholamalamdari, Tom van Schaik ... Andrew S Belmont
    Research Article

    Models of nuclear genome organization often propose a binary division into active versus inactive compartments yet typically overlook nuclear bodies. Here, we integrated analysis of sequencing and image-based data to compare genome organization in four human cell types relative to three different nuclear locales: the nuclear lamina, nuclear speckles, and nucleoli. Although gene expression correlates mostly with nuclear speckle proximity, DNA replication timing correlates with proximity to multiple nuclear locales. Speckle attachment regions emerge as DNA replication initiation zones whose replication timing and gene composition vary with their attachment frequency. Most facultative LADs retain a partially repressed state as iLADs, despite their positioning in the nuclear interior. Knock out of two lamina proteins, Lamin A and LBR, causes a shift of H3K9me3-enriched LADs from lamina to nucleolus, and a reciprocal relocation of H3K27me3-enriched partially repressed iLADs from nucleolus to lamina. Thus, these partially repressed iLADs appear to compete with LADs for nuclear lamina attachment with consequences for replication timing. The nuclear organization in adherent cells is polarized with nuclear bodies and genomic regions segregating both radially and relative to the equatorial plane. Together, our results underscore the importance of considering genome organization relative to nuclear locales for a more complete understanding of the spatial and functional organization of the human genome.

    1. Cell Biology
    2. Genetics and Genomics
    Keva Li, Nicholas Tolman ... UK Biobank Eye and Vision Consortium
    Research Article

    A glaucoma polygenic risk score (PRS) can effectively identify disease risk, but some individuals with high PRS do not develop glaucoma. Factors contributing to this resilience remain unclear. Using 4,658 glaucoma cases and 113,040 controls in a cross-sectional study of the UK Biobank, we investigated whether plasma metabolites enhanced glaucoma prediction and if a metabolomic signature of resilience in high-genetic-risk individuals existed. Logistic regression models incorporating 168 NMR-based metabolites into PRS-based glaucoma assessments were developed, with multiple comparison corrections applied. While metabolites weakly predicted glaucoma (Area Under the Curve = 0.579), they offered marginal prediction improvement in PRS-only-based models (p=0.004). We identified a metabolomic signature associated with resilience in the top glaucoma PRS decile, with elevated glycolysis-related metabolites—lactate (p=8.8E-12), pyruvate (p=1.9E-10), and citrate (p=0.02)—linked to reduced glaucoma prevalence. These metabolites combined significantly modified the PRS-glaucoma relationship (Pinteraction = 0.011). Higher total resilience metabolite levels within the highest PRS quartile corresponded to lower glaucoma prevalence (Odds Ratiohighest vs. lowest total resilience metabolite quartile=0.71, 95% Confidence Interval = 0.64–0.80). As pyruvate is a foundational metabolite linking glycolysis to tricarboxylic acid cycle metabolism and ATP generation, we pursued experimental validation for this putative resilience biomarker in a human-relevant Mus musculus glaucoma model. Dietary pyruvate mitigated elevated intraocular pressure (p=0.002) and optic nerve damage (p<0.0003) in Lmx1bV265D mice. These findings highlight the protective role of pyruvate-related metabolism against glaucoma and suggest potential avenues for therapeutic intervention.