Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease
Abstract
Biomedical single-cell atlases describe disease at the cellular level. However, analysis of this data commonly focuses on cell-type centric pairwise cross-condition comparisons, disregarding the multicellular nature of disease processes. Here we propose multicellular factor analysis for the unsupervised analysis of samples from cross-condition single-cell atlases and the identification of multicellular programs associated with disease. Our strategy, which repurposes group factor analysis as implemented in multi-omics factor analysis, incorporates the variation of patient samples across cell-types or other tissue-centric features, such as cell compositions or spatial relationships, and enables the joint analysis of multiple patient cohorts, facilitating the integration of atlases. We applied our framework to a collection of acute and chronic human heart failure atlases and described multicellular processes of cardiac remodeling, independent to cellular compositions and their local organization, that were conserved in independent spatial and bulk transcriptomics datasets. In sum, our framework serves as an exploratory tool for unsupervised analysis of cross-condition single-cell atlases and allows for the integration of the measurements of patient cohorts across distinct data modalities.
Data availability
The datasets and computer code produced in this study are available in the following databases:-All scripts related to this manuscript can be consulted here: https://github.com/saezlab/MOFAcell.-The R package implementing multicellular factor analysis can be found in:https://github.com/saezlab/MOFAcellulaR-The python implementation of multicellular factor analysis is available here:https://liana-py.readthedocs.io/en/latest/notebooks/mofacellular.html-A Zenodo entry containing data associated to this manuscript can be accessed here: https://zenodo.org/record/8082895.
-
Multiplexing droplet-based single cell RNA-sequencing using genetic barcodesGene Expression Omnibus GSE96583.
-
Spatial multi-omic map of human myocardial infarctionHuman Cell Atlas Data Portal, e9f36305-d857-44a3-93f0-df4e6007dc97.
-
Cells of the Adult Heartad98d3cd-26fb-4ee3-99c9-8a2ab085e737.
-
The Reference of the Transcriptional Landscape of Human End-Stage Heart FailureZenodo, doi: 10.5281/zenodo.3797044.
-
Pathogenic variants damage cell composition and single cell transcription in cardiomyopathiescellxgene, e75342a8-0f3b-4ec5-8ee1-245a23e0f7cb.
Article and author information
Author details
Funding
DFG CRC 1550 (464424253)
- Ricardo Omar Ramirez Flores
- Julio Saez-Rodriguez
Informatics for Life
- Jan David Lanzer
- Julio Saez-Rodriguez
EU ITN Marie Curie StrategyCKD (860329)
- Daniel Dimitrov
- Julio Saez-Rodriguez
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2023, Ramirez Flores et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,031
- views
-
- 518
- downloads
-
- 16
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
Degree distributions in protein-protein interaction (PPI) networks are believed to follow a power law (PL). However, technical and study biases affect the experimental procedures for detecting PPIs. For instance, cancer-associated proteins have received disproportional attention. Moreover, bait proteins in large-scale experiments tend to have many false-positive interaction partners. Studying the degree distributions of thousands of PPI networks of controlled provenance, we address the question if PL distributions in observed PPI networks could be explained by these biases alone. Our findings are supported by mathematical models and extensive simulations, and indicate that study bias and technical bias suffice to produce the observed PL distribution. It is, hence, problematic to derive hypotheses about the topology of the true biological interactome from the PL distributions in observed PPI networks. Our study casts doubt on the use of the PL property of biological networks as a modeling assumption or quality criterion in network biology.
-
- Computational and Systems Biology
- Immunology and Inflammation
Transcription factor partners can cooperatively bind to DNA composite elements to augment gene transcription. Here, we report a novel protein-DNA binding screening pipeline, termed Spacing Preference Identification of Composite Elements (SPICE), that can systematically predict protein binding partners and DNA motif spacing preferences. Using SPICE, we successfully identified known composite elements, such as AP1-IRF composite elements (AICEs) and STAT5 tetramers, and also uncovered several novel binding partners, including JUN-IKZF1 composite elements. One such novel interaction was identified at CNS9, an upstream conserved noncoding region in the human IL10 gene, which harbors a non-canonical IKZF1 binding site. We confirmed the cooperative binding of JUN and IKZF1 and showed that the activity of an IL10-luciferase reporter construct in primary B and T cells depended on both this site and the AP1 binding site within this composite element. Overall, our findings reveal an unappreciated global association of IKZF1 and AP1 and establish SPICE as a valuable new pipeline for predicting novel transcription binding complexes.