MGPfactXMBD: A Model-Based Factorization Method for scRNA Data Unveils Bifurcating Transcriptional Modules Underlying Cell Fate Determination

Jun Ren; Ying Zhou; Yudi Hu; Jing Yang; Hongkun Fang; Xuejing Lyu; Jintao Guo; Xiaodong Shi; Qiyuan Li

doi:10.7554/eLife.97424.2

eLife Assessment

MGPfactXMBD is a novel computational method for investigating cell evolutionary trajectory for scRNA-seq samples. It is important, with several potential future applications. The authors benchmarked this method using synthetic and real-world samples and showed superior performance for some of the tasks in cell trajectory analysis compared to other methods with compelling evidence.

https://doi.org/10.7554/eLife.97424.2.sa3

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Manifold-learning is particularly useful to resolve the complex cellular state space from single-cell RNA sequences. While current manifold-learning methods provide insights into cell fate by inferring graph-based trajectory at cell level, challenges remain to retrieve interpretable biology underlying the diverse cellular states. Here, we described MGPfact^XMBD, a model-based manifold-learning framework and capable to factorize complex development trajectories into independent bifurcation processes of gene sets, and thus enables trajectory inference based on relevant features. MGPfact^XMBD offers more nuanced understanding of the biological processes underlying cellular trajectories with potential determinants. When bench-tested across 239 datasets, MGPfact^XMBD showed advantages in major quantity-control metrics, such as branch division accuracy and trajectory topology, outperforming most established methods. In real datasets, MGPfact^XMBD recovered the critical pathways and cell types in microglia development with experimentally valid regulons and markers. Furthermore, MGPfact^XMBD discovered evolutionary trajectories of tumor-associated CD8⁺ T cells and yielded new subtypes of CD8⁺ T cells with gene expression signatures significantly predictive of the responses to immune checkpoint inhibitor in independent cohorts. In summary, MGPfact^XMBD offers a manifold-learning framework in scRNA-seq data which enables feature selection for specific biological processes and contributing to advance our understanding of biological determination of cell fate.

Introduction

Data-mining of single-cell RNA sequencing (scRNA-seq) is often transformed into learning of lower-dimensional embedding (Becht et al., 2019; Haghverdi et al., 2015; Van der Maaten and Hinton, 2008) of the expression vectors, which represents the variation in the cellular space and helps explain the biological background. Previous single-cell studies used various embedding methods to characterize and visualize clustering of cells with unique biological functions(Saelens et al., 2019). Among the existing methods, graph-based embedding can better capture nonlinear biological signals among cells hence yielded more insights of the diversity of cells. More recent studies also use graph-based embedding (Costa et al., 2018) to reveal the dependency among cells and thereby reconstruct the evolutionary trajectory in the cellular space, which helps in understanding the determination of cell fate in development, differentiation and cancer.

To date, more and more manifold-learning methods are developed to infer lower-dimensional graphic embedding (manifolds) of scRNA-seq data, and yielded a number of trajectories corresponding to important cellular processes, such as TSCAN (Ji and Ji, 2016), DPT (Haghverdi et al., 2016), and scShaper (Smolander et al., 2022) belong to linear topological classes and reveal major linear pathways based on embedding spaces or cell clustering. TSCAN employs the construction of minimum spanning trees to discover pathways, while DPT reconstructs cellular trajectories using random walks, and scShaper integrates multiple pseudo-temporal solutions to deduce the shortest trajectory within a linear context. Additionally, there have been many approaches capable of inferring complex tree topological structures, such as the widely used Monocle series of algorithms includes Monocle 2 and 3 (Cao et al., 2019; Qiu et al., 2017b, 2017a). They leverage sophisticated graphing techniques to map intricate cell hierarchies; Monocle 2 creates DDRtree based on reversed graph embedding techniques, while Monocle 3 utilizes UMAP(Becht et al., 2019) for embedding. TinGa (Todorov et al., 2020) and scFates (Faure et al., 2023) represent more recent innovations. TinGa utilizes the Growing Neural Gas (GNG) algorithm (Fritzke, 1994) to construct an adaptive graph structure that effectively captures the density structure of the dataset. scFates, streamlines pseudotime analysis with flexible tree learning options, advanced feature extraction tasks, and specific functionalities for fork analysis.

Moreover, recent studies based on RNA velocity has provided insights into cell state transitions. These methods measure RNA synthesis and degradation rates based on the abundance of spliced and unspliced mRNA, such as CellRank (Lange et al., 2022). Nevertheless, current RNA velocity analyses are still unable to resolve cell-fates with complex branching trajectory. Deep learning methods such as scTour (Li, 2023) and TIGON (Sha, 2024) circumvent some of these limitations, offering continuous state assumptions or requiring prior cell sampling information.

Despite these advances, trajectory prediction remains a major challenge in single cell analysis. Firstly, graph-based trajectories represent synergic effects multiple biological processes, making it difficult to disentangle the effects of specific process, hence limited model interpretability and the power to gain novel biological insights. Secondly, the inference of trajectory is highly dependent on the biases in the gene-selection, whereas conventional statistical feature-selection methods are less efficient for the learning of complex topologies, and adds to the difficulty of suggesting candidate genes for downstream functional study. Additionally, many approaches require additional prior information, which further limits the applicability.

Here, we describe MGPfact^XMBD (Factorization based on Mixtures of Gaussian Processes), a model-based, unsupervised manifold-learning method which factorize complex cellular trajectories into interpretable bifurcation Gaussian processes of transcription, and thereby enable discovery of specific biological determinants of cell fate. In the validation datasets, MGPfact recapitulated developmental trajectory of microglia and recovered key regulatory factors which have been proved experimentally. Moreover, MGPfact discovered highly specific subtypes of tumor-associated CD8⁺ T cells which associated with benefit to cancer immunotherapy.

Bring together, MGPfact is a knowledge discovery tool which conducts manifold-learning and factorization simultaneously. MGPfact offers two advantages in future scRNA-seq analyses: first, it provides highly interpretable, factorizable cellular trajectories with matched gene expression modules; then, it provides efficient feature-selection for graph-based embedding, thus enhancing our understanding of the determination of cell fate.

Results

Design of MGPfact

The analytical pipeline of MGPfact consists two major stages (Fig. 1). An algorithmic description is given in Algorithm 1.

Overview of MGPfact workflow.
The complete workflow comprises two major stages: MURP downsampling with preprocessed data and trajectory reconstruction. In the stage of trajectory reconstruction, the scRNA-seq data were first factorized into independent bifurcation processes based on mixtures of Gaussian processes, which were then merged into a consensus trajectory.

First, we performed downsampling of the preprocessed scRNA-seq data Y′ to yield a M-by-N expression matrix Y′ based on the “minimum unbiased representative points” (MURP) as described previously (Ren et al., 2022), where M representative points were considered as landmarks of the cellular trajectory and N is the number of genes. Then, we computed L Principal Components (PCs) of the downsampled expression matrix to obtain the matrix (M-by-L),

where v_l is projection vector,serve as the l-th initial state of embedding.

Next, we used typical Gaussian Process Regression of on pseudotime T:

where f(T) is a Gaussian Process (GP) with covariance matrix S.

And for all features:

where p(Y^∗|f(T)) is defined as follow:

Specifically, we consider S as a mixture of L independent bifurcation Gaussian processes(Schulz et al., 2018),

To cope with the bifurcation processes in cell fate, each of the Gaussian processes is defined with a bifurcation point at b_l, branching labels c_l, and the necessary hyperparameters. The branching labels c_l ∈ {0,1,2}, correspond to different phases and states of cell fate, where c_l = 0 corresponds the phase before branching, and c_l ∈ {1,2}. corresponds to the two cellular states of the bifurcation process, respectively. For any landmark (MURP) x,

The covariance matrix s_l for trajectory l can be expressed as follows,

where K is a kernel function. We employ radial basis function (rbf) and polynomial kernel function (pl). We chose these two kernel functions for the effectiveness in handling nonlinear and polynomial relationships, achieving a balance between model performance and computational efficiency.

And the 𝒦(t_x, t_y) is calculated as follows:

Therefore, p(Y^∗|f(T)) is updated as follows:

We infer all parameters by maximizing the posterior likelihood using Markov Chain Monte Carlo (MCMC) methods available in Mamba(B J, 2014). The posterior distribution of pseudotime T can be represented as:

where p(Y^∗ | f(T)) is the likelihood function of the observed data Y^∗, and p(f(T)) is the prior distribution of the Gaussian process. This posterior distribution integrates the observed data with model priors, enabling inference of pseudotime and trajectory simultaneously. Due to the high autocorrelation of T in the posterior distribution, we use Adaptive Metropolis within Gibbs (AMWG) sampling (Roberts and Rosenthal, 2009; Tierney, 1994). Other parameters are estimated using the more efficient SLICE sampling technique (Neal, 2003).

Algorithm 1 MGPfact: infer cell fate trajectory Algorithm 1 MGPfact: infer cell fate trajectory

Then, MGPfact can identify genes that have significant impacts on the branching events in the trajectories. we introduce a rotation matrix R = {r₁, r₂, …, r_L} to obtain factor score w_l for each trajectory l by rotating Y^∗.

For all trajectories,

where e_l is the error term for the l-th trajectory.

Specifically, the factor scores w_l for each gene onto the l-th trajectory can be expressed using equations (1) and (14) as follows,

Here, u_l is used to represent the contribution (gene weight) of each gene to the l-th trajectory, thus enabling gene-selection based on the inferred trajectories.

Additionally, we can combine independent bifurcation processes to form a consensus diffusion tree to represent the trajectory of cell fate (Supplementary Methods, Supplementary Table 1).

MGPfact outperformed state-of-the-art methods in F1_branches.
P-values based one-sided paired t-tests suggest that the F1_branches scores of MGPfact were significantly higher than those of the other methods for different trajectory types in the test set.

Performance evaluation of MGPfact

Robustness analysis of MGPfact

Before the performance evaluation, we performed a grid search on the number of independent trajectories in 100 training datasets and selected L = 3 for downstream testing (Supplementary Fig. 1, Supplementary Fig. 2, Methods).

To further validate the efficacy of MURP downsampling step in MGPfact, we employed an alternative downsampling using randomly selected cells for trajectory inference (Methods). This comparison revealed that the prediction accuracy substantially diminished without MURP, evidenced by a notable reduction in branch assignment (F1_branches, 20.5%) and cell ordering (cor_dist, 64.9%) (Supplementary Fig. 3). In contrast, trajectory predictions utilizing MURP-based downsampling realized an overall score increase of 5.31% to 185%, underscoring the indispensable role of MURP in the trajectory inference capabilities of MGPfact.

Furthermore, we performed a robustness analysis on the topological consistency of the predicted consensus trajectory by comparing prediction results from randomly sampled subsets of the original data. As a result, the consensus trajectory predicted from random subsets by MGPfact retained a high degree of congruence with those from the original datasets (Supplementary Fig. 4, HIM_mean = 0.686). This outcome attests to MGPfact’s robustness and generalizability under varying data conditions, hence the capability of retrieving conserved bifurcative trajectories in the data.

MGPfact predicts cellular trajectories

Then, we assessed the performance of MGPfact for prediction of cellular trajectories in 239 test datasets, including 171 synthetic and 68 real datasets, alongside with another 7 existing algorithms (Supplementary Fig. 1). For overall performance score, MGPfact (Overall_mean = 0.534) ranked second only to TinGa (Overall_mean = 0.563) and outperformed the rest of 6 algorithms (Fig. 2a). Particularly, MGPfact demonstrated the highest accuracy in predicting cell fate in branching trajectory (Fig. 2b, ). As for other three individual metrics, MGPfact ranked the 4^th in HIM (HIM_mean = 0.606), the 6^th in , and the 4^th in (Fig.2c-e).

Trajectory inference (TI) performance of state-of-the-art methods in 239 test datasets.
a. Overall scores; b. F1_branches; c. *HIM*; d. *cor*_dist; e. *wcor*_features. All results are color-coded based on the trajectory types, with the black line representing the mean value. The “Overall” assessment is calculated as the geometric mean of all four metrics.

Next, we compared the performance of MGPfact with the other algorithms in 9 different trajectory types, respectively, for predicting differentiated cell fate (F1_branches). As a result, MGPfact significantly outperformed more than half of the algorithms tested in the following trajectory types (T-test P<0.1, Table 1): disconnected graph (N=5), linear (N=5), bifurcation (N=4), multifurcation (N=4) and tree (N=4). As for the other three metrics, MGPfact also showed advantages in HIM in linear (N=5), bifurcation (N=3), convergence (N=3); and in wcor_features in multifurcation (N=6), bifurcation (N=5). Nevertheless, MGPfact showed limited predictive performance for cor_dist (Supplementary Table 2).

Comparison of the explanatory power for CD8⁺ T cell fate for MGPfact and three other different methods.
Adjusted R-squared values and P-values based on F-tests demonstrate the relative performance of MGPfact, Monocle 2, Monocle 3, and scFates Tree in fitting the experimentally characterized and annotated CD8+ T cell subtypes.

It is also worth noting that in 68 test datasets of real cell populations, MGPfact ranked the top of all 7 algorithms for “overall score” (Fig. 3a). As for the individual metric, MGPfact ranked to top for predicting trajectory topology (HIM_mean = 0.721, Fig. 3c); and the 2^nd in trajectory branching , Fig. 3b) with no significant difference with that of the top predictor (scShaper, T-test, P = 0.829). These data show that MGPfact can effectively reconstruct the trajectory of cell fate and retrieve relevant biological processes. As for the other metrics in real test datasets, MGPfact was the 5^th in the similarity of cell , and the 3^rd in the similarity of gene significance . The differences between the metrics of MGPfact and those of the top performers were subtle (Fig. 3d-e, ).

Trajectory inference (TI) performance of state-of-the-art methods in 68 test datasets of real cell population.
a. Overall scores; b. F1_branches; c. *HIM*; d. *cor*_dist; e. *wcor*_features. All results are color-coded based on the trajectory types, with the black line representing the mean value for ranking all methods. The “Overall” assessment is calculated as the geometric mean of all four metrics.

We also analyzed three real-world datasets (Saelens et al., 2019), each representing a unique topology of trajectory: linear, single bifurcation, and multiple bifurcations. MGPfact excelled in capturing key developmental trajectory with branch points (Supplementary Fig. 5). In the linear trajectory, MGPfact accurately predicted the absence of bifurcations, aligning well with the ground truth (overall = 0.871). For the bifurcation trajectory, MGPfact successfully identified the main bifurcation (overall = 0.636). As for the multifurcation trajectory, MGPfact’s prediction is also close to the ground truth, as reflected by the overall score (overall = 0.566).

In summary, our data suggest that MGPfact is highly efficient in predicting cell fate in branching trajectory (F1_branches) and topological structure (HIM). These capabilities align with the primary objectives of the algorithm, namely, effective identification of the branching events in the development processes of cells. In addition, MGPfact performed better in real datasets, suggesting its robustness to noise from real experimental conditions. Nevertheless, trajectory inference by MGPfact is based on factorization of the covariance matrix, hence less performance in wcor_features and cor_dist than methods based on full covariance matrix.

Comparative of Time Efficiency and Memory Consumption

We also compared the runtime and memory usage of different algorithms across 239 test datasets (Supplementary Table 3). MGPfact’s average maximum memory consumption is comparable to those of the other algorithms . As a trade-off for its advantages in feature-selection and factorization, MGPfact requires moderately longer execution time than the other algorithms (time_mean = 3.42 min).

MGPfact recovers the trajectory of early postnatal microglia development

The main advantage of MGPfact lies in the capability to factorize a complex cellular trajectory into bifurcation processes of selected co-expressed genes. To illustrate how MGPfact elucidates the biological process underlying cell fate determination, we applied MGPfact to a scRNA-seq data of microglia development and validated the results with experimental evidences (Li et al., 2019).

MGPfact recovers the determinants of the microglia development

Utilizing MGPfact, we reconstructed the developmental trajectories of microglia from immature microglia (IM at pseudotime 0) to homeostatic microglia (HM) and proliferative-region-associated microglia (PAM) (Fig. 4a-c, left panel). MGPfact identified three bifurcation processes (Supplementary Table 4), each corresponding to 74-90 highly weighted genes (HWG, absolute gene weight > 0.05) (Fig. 4a-c, right panel).

MGPfact reconstructed the developmental trajectory of microglia, recovering known determinants of microglia fate.
**a-c**. The inferred independent bifurcation processes with respect to the unique cell types (color-coded) of microglia development, where phase 0 corresponds to the state before bifurcation; and phases 1 and 2 correspond to the states post-bifurcation. Each colored dot represents a metacell of unique cell type defined by MURP. The most highly weighted regulons in each trajectory were labeled by the corresponding transcription factors (left panels). The HWG of each bifurcation process include a set of highly weighted genes, of which the expression levels differ significantly among phases 1, 2, and 3 (right panels). d. The most highly weighted regulons influencing the three developmental trajectories of microglia. e. The expression levels of the transcription factors of highly weighted regulons in each trajectory significantly differ among different phases. f. The consensus developmental trajectory by merging the three bifurcation processes. Point 0 denotes the initial of differentiation, whereas the notion of “*n-m*” denotes the m-th branch from the branching point n. Each colored circle represents a landmark (MURP) of the trajectory, showing the fraction of cell types. The transcription factors of highly weighted regulons in each bifurcation process were used to label each branching point. Particularly, PAM-T1 and PAM-T2 are the two newly defined subtypes of PAM. g. Selected differently expressed genes between PAM-T1 and PAM-T2 (|logfc| > 0.25, adjusted P-value < 0.1) are shown by colored-dots corresponding to the mean expression levels in either cell type. The IDs validated marker genes for PAM are labeled in green. In all box plots, the horizontal line represents the median value, and the whisker extends to the furthest data point within 1.5 times the interquartile range. Significance is denoted as: ns, not significant, * P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001, Wilcoxon rank-sum test.

The first bifurcation determines the differentiated cell fates of PAM and HM, which involves a set of notable marker genes of both cell types, such as Apoe, Selplg (HM), and Gpnmb (PAM). The second bifurcation determines the proliferative status, which is crucial for the development and function of PAM and HM (Guzmán, n.d.; Li et al., 2019). The genes affected by the second bifurcation are associated with cell cycle and proliferation, such as Mki67, Tubb5, Top2a. The third bifurcation influences the development and maturity of microglia, of which the highly weighted genes, such as Tmem119, P2ry12, and Sepp1 are all previously annotated markers for establishment of the fates of microglia(Anderson et al., 2022; Li et al., 2019) (Supplementary Table 4).

Moreover, we retrieved highly active regulons from the HWG by MGPfact, of which the significance is quantified by the overall weights of the member genes. These data unveiled highly active transcription regulations in each bifurcation processes, which further traced back to the influential transcription factors as determinants of microglia development, such as Hif1a, E2f5, and Nfkb1 (Fig. 4d-e, Methods). Specifically, Hif1a is crucial for microglial activation and directly linked to neurodegenerative disease progression (Wang et al., 2022). Our data showed an upregulation of Hif1a in the PAM-branch (phase 1) of the first bifurcation, reaffirming the role of Hif1a in PAM differentiation. The other two transcription factors, E2f5 and Nfkb1, were active in phase 2 of the 2^nd bifurcation and the 3^rd bifurcation, respectively. Both are known for the roles in microglial development (Dresselhaus and Meffert, 2019; Nawal, n.d.).

Using consensus trajectories to delineate the development of microglial cells

We generated a consensus trajectory of microglial development from three independent bifurcation processes (Methods). Of note, the consensus trajectory revealed two distinct subtypes of proliferative-region-associated microglia (PAM), PAM-T1 (Hif1a+/E2f5-/Nfkb1+) and PAM-T2 (Hif1a+/E2f5-/Nfkb1-) (Fig. 4f). Particularly, the highly expressed genes in PAM-T2, including Spp1, Gpnmb, Lgals1, and Cd63, are previously identified in disease-associated microglia (DAM)(Li et al., 2019). Thus, our finding reaffirmed the connection between the two cell types, DAM and PAM, and suggested Nfkb1 as a potential determinant of differentiation of PAM (Fig. 4g, Supplementary Table 5).

In conclusion, MGPfact reconstructed the cellular trajectory of microglial development, identified distinct cell types with marker genes and key regulators which are highly consistent to the experimental evidences (Dresselhaus and Meffert, 2019; Li et al., 2019; Nawal, n.d.; Wang et al., 2022).

MGPfact to decipher the evolution of tumor-associated CD8⁺ T cells

Next, we applied MGPfact to two populations of tumor-associated CD8+ T cells from non-small cell lung cancer (NSCLC)(Guo et al., 2018) and colorectal cancer (CRC)(Zhang et al., 2018), respectively. Using the same analytical pipeline as above, we identified a set of CD8⁺ T cell gene expression signatures (GES) from MGPfact-inferred trajectories, which are significantly predictive of clinical outcomes and immune treatment responses. Additionally, our data unveiled novel subtypes of tumor-associated CD8⁺ T cells with strong clinical implications.

MGPfact better explains the fate of tumor-associated CD8⁺ T cells

We assessed the goodness-of-fit (adjusted R-square) of the consensus trajectory derived by MGPfact and three methods (Monocle 2, Monocle 3 and scFates Tree) for the CD8⁺ T cell subtypes described in the original studies (Guo et al., 2018; Zhang et al., 2018). The data showed that MGPfact significantly improved the explanatory power for most CD8+ T cell subtypes over Monocle 2, which was used in the original studies (P < 0.05, see Table 2 and Supplementary Table 6), except for the CD8-GZMK cells in the CRC dataset. Additionally, MGPfact demonstrated better explanatory power in specific cell types when compared to Monocle 3 and scFates Tree. For instance, in the NSCLC dataset, MGPfact exhibited higher explanatory power for CD8-LEF1 cells (Table 2, R-squared = 0.935), while Monocle 3 and scFates Tree perform better in other cell types.

MGPfact identifies T-cell gene expression signatures with clinical implications

MGPfact discerned the different cellular fates of tumor-associated CD8⁺ T cells by distinct bifurcation processes. To reveal the clinical implications of these bifurcation processes, we retrieved the mean expression vectors corresponding to either phase (branch) as Gene Expression Signatures (GES), and stratified cancer cohorts by quantifying the propensities to certain destiny of CD8⁺ T cells (Methods). Our data demonstrated pronounced disparities in the clinical outcomes associated with different fates of CD8⁺ T cells among patients (Fig. 5a-b, Supplementary Fig. 8e). In NSCLC, trajectory 1 is associated with cytotoxic T cells (96%, phase 1) and higher overall survival in TCGA-lung adenocarcinoma (LUAD) patients. Trajectory 2 is associated with exhausted T cells (91%, phase 2), and lower overall survival in the same cohort (Fig. 5a, Supplementary Fig. 6a-b). Similarly, in the CRC, trajectory 1 is associated with exhausted T cells (95%, phase 1) and poor overall survival in TCGA-COAD patients (Fig. 5b, Supplementary Fig. 8a-b).

Highly weighted genes (HWG) of the bifurcation processes of CD8⁺ T cells serve as reliable indicators for clinical outcome and ICI treatment response.
a. Gene expression signatures (GES) corresponding to HWG in CD8⁺ T cells trajectory 1 and 2 in NSCLC predict overall survival of the TCGA-LUAD cohort. b. Gene expression signatures (GES) corresponding to HWG in CD8⁺ T cells trajectory 1 in CRC predict overall survival of the TCGA-COAD cohort. c. ROC curve showing the weighted mean of HWG in Trajectories 1 and 2 in NSCLC significantly associated with ICI response across 3 independent studies. d. ROC curve showing the weighted mean of HWG in trajectories 1 and 2 in CRC significantly associated with ICI response across 4 independent studies.

MGPfact serves as an effective approach for characterization of new cellular subtypes.
a. The consensus trajectory of tumor-associated CD8⁺ T cells in NSCLC identified CD8-ZNF683-T1 and CD8-ZNF683-T2 as two subtypes of CD8-ZNF683, which are influenced by TBX21. b. Selected differently expressed genes between CD8-ZNF683-T1 and CD8-ZNF683-T2 (|logfc| > 0.25, adjusted P-value < 0.1). c. High expression of CD8-ZNF683-T1 signatures predicts good overall survival in the TCGA LUAD cohort (Methods). P-values were calculated through multivariate Cox regression analysis, and HR represents hazard ratio. d. The consensus trajectory of tumor-associated CD8⁺ T cells in CRC identified CD8-GZMK-T1 and CD8-GZMK-T2 as two subtypes of CD8-GZMK. e. Selected differently expressed genes between CD8-GZMK-T1 and CD8-GZMK-T2 (|logfc| > 0.25, adjusted P-value < 0.1). f. ROC curve showing high expression of CD8-GZMK-T1 signature associated with ICI treatment response in three independent studies. The consensus trajectory is formed by merging three bifurcation processes. Each colored circle represents a landmark (MURP), indicating the of cell type.

In addition, for each trajectory identified in NSCLC and CRC, we selected a set of HWG (absolute gene weight > 0.05) to characterize the underlying biological processes and key transcription factors determining the bifurcation (Supplementary Table 7-8, Supplementary Fig. 6c-d, Supplementary Fig. 8c-d). In NSCLC, the HWG of trajectory 1 are primarily implicated in immune responses associated with antigen processing and presentation (Supplementary Table 9), while trajectory 2-HWG involve processes of immune cell migration (Supplementary Table 9). For CRC, trajectory 1 is enriched for genes of T cell activation and regulation (Supplementary Table 10).

Notably, our data showed that the weighted mean expression of the HWG of CD8⁺ T cell trajectories (Methods) are associated with responses to immune checkpoint inhibitors (ICIs) in multiple independent cohorts (Fig. 5c-d, Supplementary Fig. 8f). For instance, the weighted means of HWG of trajectories 1 and 2 in NSCLC which are associated with high activities of cytotoxic T cells, predicted better responses to anti-PD-130 and anti-CTLA-4 (Cho et al., 2020; Liu et al., 2018), as well as their combination therapies (Auslander et al., 2018) (AUC ∈ {0.813, 0.964}, P < 0.1, Supplementary Fig. 7). Similarly, HWG pertaining to trajectory 1 in CRC is associated with high proportion of EMRA (87%, phase 1), hence better responses to immunotherapies in 4 cohorts (Auslander et al., 2018; Cho et al., 2020; Jung et al., 2019; Liu et al., 2018) (AUC ∈ {0.796, 0.964}, P < 0.01, Supplementary Fig. 9).

Taken together, the factorization of scRNA-seq data by MGPfact provides highly relevant gene expression signatures of the fate of tumor associated CD8⁺ T cells, which advances the understanding of the evolution of tumor immune microenvironment (TIME) and predicts clinical outcomes.

MGPfact identified new subtypes of CD8⁺ T cells with clinical implications

Furthermore, the consensus trajectories of tumor-associated CD8⁺ T cells inferred by MGPfact from NSCLC and CRC revealed new subtypes of lymphocytes. In NSCLC, we characterized CD8-ZNF683-T1 (LEF+/TBX21-) and CD8-ZNF683-T2 (LEF+/TBX21+) from CD8-ZNF683 (Fig. 6a, Supplementary Fig. 6c-d). The CD8-ZNF683-T2 cells highly expressed genes associated with “pre-exhausted” state, such as ITGAL, SAMD3, and SLAMF7 (Pritchard et al., 2023), many of which are target genes of TBX21. In contrast, CD8-ZNF683-T1 showed lower expression of these genes, hence repellency to the “pre-exhausted” state (Fig. 6b, Supplementary Table 11). In CRC, we identified two subtypes of effector memory T cells (CD8-GZMK), CD8-GZMK-T1 (EOMES-/BHLHE40+) and CD8-GZMK-T2 (EOMES-/BHLHE40-) (Fig. 6d, Supplementary Fig. 8c-d). CD8-GZMK-T2 strongly resembles CD8-GZMK, and potentially differentiating into CD8-CD6 (resident memory T cells, T_RM) and CD8-CD160 (intraepithelial lymphocytes, IEL); whereas CD8-GZMK-T1 cells demonstrated higher expression of GZMB, indicating an active cytotoxic cell-killing capability (Trapani, 2001). Simultaneously, these cells also marked by high expression levels of immune related genes, including OASL, RBPJ, and CTLA4, which are known targets of BHLHE40 (Lutter et al., 2022; Salmon et al., 2022), implying that BHLHE40 is a modulator of the higher effector activity in CD8-GZMK-T1 (Fig. 6e, Supplementary Table 12).

We further derived scores based on the differentially expressed genes of CD8-ZNF683-T1 and CD8-GZMK-T1 (Methods), as measures of the fraction of each subtype in cancer cohorts. In the LUAD cohort of TCGA, increased fraction of CD8-ZNF683-T1 in TIME was associated with favorable outcomes (Fig. 6c). And increased fractions of CD8-GZMK-T1 in TIME were associated with better responses to ICI therapy across 3 independent cohorts^31–33 (Fig. 6f, Supplementary Fig. 10, AUC ∈ {0.782, 0.917}, P < 0.1), which were treated with anti-PD-1, anti-CTLA-4 and their combination therapies. In conclusion, our data showed the cellular trajectory inferred by MGPfact can be used to elucidate the complex evolutionary processes of tumor-associated CD8⁺ T cells, and further inform the characterization of new subtypes of T cells with significant clinical implications.

Discussion

Single-cell RNA sequencing (scRNA-seq) provides a direct, quantitative snapshot of a population of cells in certain biological conditions, thereby revealing the actual cell states and functions. Although existing clustering and embedding algorithms can effectively reveal discrete biological states of cells, these methods become less efficient when depicting continuous evolving of cells over the temporal domain. The introduction of manifold learning offers a new dimension for discovery of relevant biological knowledge in cell fate determination, allowing for a better representation of continuous changes in cells, especially in time-dependent processes such as development, differentiation, and clonal evolution. However, current manifold learning methods face major limitations, such as the need for prior information on pseudotime and cell clustering, and lack of explainability, which restricts their applicability. Additionally, many existing trajectory inference methods do not support gene selection, making it difficult to annotate the results to known biological entities, thereby hindering the interpretation of results and subsequent functional studies.

We developed MGPfact to overcome the limitations of the existing methods. Inspired by recent studies, MGPfact model the cell fate as mixture of Gaussian processes, which accommodate both continuous evolution pathway and biphasic destiny of cell fate. Thus, MGPfact is capable to distinguish discrete and continuous events in the same trajectory. In addition, by factorizing the mixture Gaussian processes, MGPfact offers the advantage to select genes corresponding to each bifurcation process and thereby enable full biological annotation and interpretation of the trajectory. As a validation, we showed that gene-selection by MGPfact consistently recapitulated the development of microglia and tumor-associated CD8⁺ T cells; and recovered key regulators of distinct cell fate. So far, MGPfact is the only model-based manifold-learning framework which factorizes complex development trajectories into independent bifurcation processes of gene sets.

We conducted a comprehensive comparison of MGPfact with existing TI methods from various perspectives. This comparison included the correlation of cell sorting, accuracy of branch allocation, similarity of topological structures and differentially expressed features. It ought to be mentioned that number of principal components used should be determined by the intrinsic biological characteristics of the cell fate-determination. Our experiment based on a limited number of datasets may not represent more complex scenarios in other cell types. For the overall TI-performance, MGPfact demonstrated leading performance across 239 datasets, second only to TinGa. For the performance in branch allocation, which directly reflect the fitness to the outcomes of cell-fate, MGPfact outperformed its counterparts, especially in the topology groups of linear and bifurcation. As for wcor_features and cor_dist, MGPfact performed less well mainly for two reasons. First, MGPfact is designed for bifurcation topology in the cellular trajectory, hence less efficient in inferring complex topologies. Then, MGPfact inference is based on selected gene sets instead of the whole transcriptome, the resulted trajectory correspond only to the bifurcation processes of interest hence does not necessarily reflect the whole topology of cellular trajectory. Furthermore, MGPfact performed significantly better in trajectory prediction in real cell population compared to the synthetic ones, suggesting the algorithm fit better to the true biological variation and noises.

To reconstruct the trajectory of cell fate, we merged all the bifurcation processes into a consensus trajectory. In the validation by microglia and tumor-associated CD8⁺ T cells, the consensus trajectory revealed highly consistent findings recovering known biology and the marker genes of specific cell states which further inform the transcription factor (TF) determining the fate of cell. In addition, the consensus trajectory revealed new subtypes of cells demonstrating highly relevant transcriptional characteristics. Particularly, we reported new subtypes of tumor associated CD8⁺ T cells characterized by different TBX21 and BHLHE40 activity, both of which are known regulators of CD8⁺ T cell functionality (Lutter et al., 2022; Pritchard et al., 2023; Salmon et al., 2022; Trapani, 2001). These data suggest that MGPfact is capable to discover gene modules with strong and consistent transcriptional background hence better interpretability. Moreover, the results of MGPfact demonstrated strong clinical relevance. Using MGPfact we retrieved gene expression signatures (GES) which quantitatively measure the propensity and fraction of different fates of CD8⁺ T cell. These signatures correspond to important biological processes of T cell activity; and predict clinical outcome and ICI treatment responses from transcriptome data of bulk tumor biopsies, independent of any endogenous feature of the tumor cells.

Nevertheless, MGPfact also has some limitations, which shall be addressed in future study. Firstly, the complex definition of the bifurcation kernel introduces unfavorable singularity to the Gaussian kernel when considering highly complex trajectories. Additionally, the current trajectory inference by MGPfact is solely based on the temporal domain, neglecting bifurcation processes occurring in space. To overcome these limitations, future models should incorporate spatial dynamics of transcription and RNA velocity data to provide more comprehensive insights of cell fate. Moreover, the reconstruction of cellular trajectories by MGPfact implies independence of each bifurcation processes, which may not reflect real cellular behavior. Therefore, predictions from MGPfact should be interpreted with caution and validated experimentally.

Methods

Benchmarking MGPfact to state-of-the-art methods

We adopted a comprehensive evaluation framework from previous scRNA-seq study to assess the TI performance of MGPfact (Saelens et al., 2019; Smolander et al., 2022; Todorov et al., 2020). The validation dataset comprises 110 real data and 229 synthetic data, encompassing 9 different cellular trajectory topologies. The ground truth of cellular trajectories of each dataset were inferred and validated by the original study (Saelens et al., 2019). The synthetic datasets were generated using four simulators: dyngen (Saelens et al., 2019), dyntoy (Saelens et al., 2019), PROSSTT (Papadopoulos et al., 2019), and Splatter (Zappia et al., 2017), each modeling different trajectory topologies such as linear, branching, and cyclic. Splatter simulates branching events by setting expression states and transition probabilities, dyntoy generates random expression gradients to reflect dynamic changes, and dyngen focuses on complex branching structures within gene regulatory networks.

The evaluation of TI performance was based on five metrics.

1) The Hamming-Ipsen-Mikhailov (HIM) distance is a metric for assessing similarity between two topological structures. It integrates the normalized Hamming distance, which highlights differences in edge lengths, with the Ipsen-Mikhailov distance, which focuses on similarities in degree distributions. By linearly combining these two measures, the HIM distance offers a comprehensive evaluation of both local and global structural differences.

2) The F1_branches is a metric used to evaluate a model’s accuracy in branch allocation. It represents the harmonic mean of precision and recall, effectively capturing the performance of branch identification. In trajectory inference, F1_branches are calculated by assessing the similarity between predicted and actual trajectory branches, emphasizing the Jaccard similarity of branch pairs.

3) This cor_dist metric measures similarity in intercellular distances between predicted and actual trajectories. It evaluates model accuracy in cell ordering by comparing relative positions of paired cells, highlighting changes in cell positions across states and reflecting cell differentiation dynamics.

4) The wcor_features metric evaluates the similarity of key features, such as differentially expressed genes, between predicted and actual trajectories. Using weighted Pearson correlation, it highlights consistent features, reflecting the model’s ability to capture biological variation. This helps identify crucial genes in trajectories and understand the molecular mechanisms of cell state transitions.

5) The overall score is calculated by taking the geometric mean of the four aforementioned metrics, providing an assessment of overall performance.

The dataset is divided into two groups: a training set and a testing set (Supplementary Fig. 1).

We use 100 training datasets to perform the following tasks:

1) Determine the optimal number of trajectories: With 3 set as the default for the number of factorized trajectories, we tested other values (1, 2, 4, and 5) and used paired T-tests to assess whether there are significant changes in MGPfact’s prediction results under different parameter settings.

2) Verify the critical role of MURP: Randomly select 20, 40, 60, 80, and 100 cells for trajectory inference, map the inference results back to the original data using the KNN graph structure, and compare the prediction results with those obtained through MURP downsampling.

3) Robustness analysis of the consensus trajectory topology: Perform 60%, 70%, 80%, and 90% sampling on the original data, and then calculate the HIM similarity between the consensus trajectory predictions of MGPfact with and without sampling. A higher score indicates better robustness of the method.

Subsequently, in 239 test datasets, we compare MGPfact with 7 state-of-the-art TI methods using the aforementioned metrics, including Monocle DDRTree (Qiu et al., 2017b, 2017a), TSCAN (Ji and Ji, 2016), and DPT (Haghverdi et al., 2016), as well as four new methods from recent studies: Monocle 3 (Cao et al., 2019), scShaper (Smolander et al., 2022), scFates Tree (Faure et al., 2023), and TinGa (Todorov et al., 2020).

The experimental comparisons were conducted on a CentOS system equipped with 48 CPU cores running at 2.2GHz and 250GB of memory. To ensure a uniform comparison, all experiments were performed using a single CPU core. For MGPfact, we tested each resulted trajectory and selected the one with the best “overall” score for comparison. For the other 7 methods, default settings are used unless otherwise specified.

Application of MGPfact in a Microglia Single-cell RNA-seq Dataset

We utilized the MGPfact reconstructed the developmental trajectory of microglia from a scRNA-seq dataset, including immature microglia (IM), proliferation-associated microglia (PAM), and homeostatic microglia (HM) (Li et al., 2019). In this analysis, we provided a detailed explanation of the analytical steps of MGPfact and the key results. Firstly, we identified three independent developmental pathways and pinpointed HWG associated with each bifurcation process. Then, we retrieved highly active regulons within each bifurcation process, tracing back to the potential influential determinants (transcription factors) in the development of microglia. Finally, we combined all the bifurcation processes into a consensus trajectory (Supplementary methods), which recovered the known biology of disease-related microglia (PAM), as represented by distinct cellular state and marker genes.

Predicting the evolutionary trajectory of tumor-associated CD8⁺ T cells

We utilized MGPfact to conducted an exploratory analysis of the evolution of tumor-associated CD8⁺ T cells of non-small cell lung cancer (NSCLC) and colorectal cancer (CRC). We evaluated the goodness-of-fitness of the consensus trajectories from MGPfact to the CD8⁺ T cell subtypes identified in the original studies. For comparison, we used Monocle 2 (Qiu et al., 2017b, 2017a) as a baseline model.

For the survival analysis, we extracted gene expression signatures (GES) from each independent bifurcation process to develop classifiers for evolutionary propensity of CD8⁺ T cells towards specific fates, based on which we stratified TCGA cancer cohorts and verified their association with clinical outcome.

To evaluated the association to ICI responses, we used HWG to retrieve key transcription factors related to each bifurcation process and characterized the underlying biological processes. We then assessed their connection to immunotherapy response using weighted means of the HWG.

Finally, we identified new subtypes based on distinct endpoints of the consensus trajectory and validated their association with clinical outcome and immunotherapy response using mean of the differently expressed gene (DEG, Supplementary Methods).

Single-cell sequencing data processing

We obtained the original mouse developmental microglia single-cell sequencing data from the GEO accession number GSE123025 (Li et al., 2019). Using Seurat (Butler et al., 2018; Stuart et al., 2019), we replicated the processing steps described in the original study: 1) Normalization by dividing gene expression values by total RNA count, followed by log2 transformation; 2) Selection of highly variable genes (HVGs) using Seurat’s mean.var.plot function, with controlled average expression [0.0125,3] and variance [0.5, ∞]; 3) Scaling and centering of the normalized matrix for HVGs, with regression of cell cycle effects. After preprocessing, we grouped cells into IM (P7-C0), PAM (P7-C1 and P7-GPNMB⁺CLEC7A⁺), and HM (P60), resulting in a 4,889-gene expression matrix across 1,009 cells.

For analyzing tumor-associated CD8⁺ T cells, we utilized scRNA-seq data from lung cancer (GSE99254) (Guo et al., 2018) and colorectal cancer (GSE108989) (Zhang et al., 2018) in the GEO database. We extracted preprocessed and centralized gene expression matrices of CD8⁺ T cells and analyzed them using the same genes and the same method (Monocle 2^9,10) as in the original papers or MGPfact for trajectory construction for a direct comparison. The NSCLC data yielded an 888-gene expression matrix across 3,700 cells, while the CRC data resulted in a 700-gene expression matrix across 3,177 cells.

Functional enrichment of highly weighted genes

For the highly weighted genes (HWG, absolute gene weight > 0.05) obtained from independent bifurcation processes, we utilized the R package clusterprofiler (Yu et al., 2012) to perform functional annotation using GO terms (Consortium, 2004), including biological process (BP), cellular component (CC), and molecular function (MF). The results with a Benjamini–Hochberg-adjusted P value less than 0.05 were retained.

Transcription factor program analysis

To comprehensively assess key regulatory factors within each independent trajectory, we performed SCENIC (Aibar et al., 2017) transcription factor regulatory program estimation for each analysis case. GENIE3 (Huynh-Thu et al., 2010) was used to identify co-expressed modules from the results of MURP (Ren et al., 2022) downsampling. RCisTarget (Aerts et al., 2010; Aibar et al., 2017) was then used to identify regulons before AUCell (Aibar et al., 2017) was used to estimate the activity of each regulon. Each regulon comprises a specific transcription factor and its target genes. Finally, we utilize gene weights obtained from MGPfact analysis to evaluate the distinct impact of top regulons on each trajectory.

Generating the consensus trajectory

Following MGPfact decomposition, we obtained multiple independent bifurcative trajectories, each corresponds to a binary tree within the temporal domain. These trajectories were then merged to construct a coherent diffusion tree, representing the consensus trajectory of cells’ fate. The merging process involves initially sorting all trajectories by their bifurcation time. The first (earliest) bifurcative trajectory is chosen as the initial framework, and subsequent trajectories are integrated to the initial framework iteratively by adding the corresponding branches at the bifurcation timepoints. As a result, the trajectories are ultimately merged into a comprehensive binary tree, serving as the consensus trajectory.

Assessing consistency of MGPfact-derived CD8⁺ T Cell subtype trajectories

In the case study of CD8⁺ T cells, by combining independent trajectories, we derive a consensus trajectory representing the complex developmental pathway. To assess the goodness-of-fitness to the CD8⁺ T cell subtypes from the original study, we classified the trajectories into several states based on bifurcation points, each corresponding to a distinct stage of the evolutionary process. Then, we evaluated the interactive effects between the states and pseudotime on the fraction of the cell types using F-test (ANOVA). The resulted R-squared (R2), P-values, and F-statistics were used to evaluate the goodness-of-fitness of the models tested hence the explanatory power. For comparison, we used the Monocle 2 as the baseline model for trajectory inference. The differentiation trajectories of Monocle 2 were replicated following the workflow in the original study (Guo et al., 2018; Zhang et al., 2018).

Survival analysis

We assessed the association of bifurcation processes and specific cell types with the clinical outcomes two cohorts of lung adenocarcinoma (LUAD, N=563) and colon adenocarcinoma (COAD, N=320) data from The Cancer Genome Atlas (TCGA). The gene expression and clinical data were downloaded from UCSC Xena platform (http://xena.ucsc.edu/).

We assessed the survival impacts of NSCLC and CRC bifurcation processes in the TCGA LUAD and COAD cohorts, respectively. For each independent bifurcation process, we defined Gene Expression Signatures (GES) by the mean expression vectors of all trajectories in phase 1 and 2, respectively. Subsequently, we calculated the Peason’s correlation coefficients for each individual expression profile in TCGA LUAD or COAD cohorts to the phase 1 and 2 GES, where higher correlation correspond to stronger propensity to specific cell fate. This allowed us to classify patients into two groups: those exhibiting propensity to phase 1 and those exhibiting propensity to phase 2. To assess the survival impacts of specific cell states defined by the consensus trajectory, we developed a CD8-ZNF683-T1 score based on the signed average expression level of DEG associated with CD8-ZNF683-T1 (Supplementary methods). Subsequently, we classified the TCGA LUAD cohorts into two groups using the median of CD8-ZNF683-T1 scores, identifying those demonstrating a propensity towards CD8-ZNF683-T1 and those demonstrating a propensity towards CD8-ZNF683-T2.

To adjust for possible confounding effects, the relevant clinical features including age, sex and tumor stage were used as covariates. The Cox regression model was implemented using R-4.2 package “survival”. And we generated Kaplan-Meier survival curves based on different classifiers to illustrate differences in survival time and report the statistical significance based on Log-rank test.

Immune-checkpoint inhibitor treatment response analysis

For the prediction of Immune-checkpoint inhibitor treatment response, we collected four datasets containing ICI treatment responses. These datasets consist of two non-small cell lung cancer related datasets (GSE135222 (Jung et al., 2019), n=27; GSE126044 (Cho et al., 2020), n=16) and two melanoma related datasets (GSE115821 (Auslander et al., 2018), n=14; TCGA-MENO (Liu et al., 2018), n=10). All data were processed with DESeq2 (Love et al., 2014) to fit gene dispersion to a negative binomial distribution, normalize raw counts, and stabilize variance, achieving standardization.

To validate the bifurcation processes identified by MGPfact predicting patients’ response to immune-checkpoint inhibitor (ICI) treatments, we selected highly weighted genes (HWG) with absolute weights greater than 0.05 from each independent bifurcation process. Then, we calculated the weighted mean expression of HWG in each ICI dataset to generate ROC curves for patient drug response.

To validate the specific cell states defined by the consensus trajectory predicting patients’ response to ICI treatments, we used CD8-GZMK-T1 score based on the average expression level differences of upregulated and downregulated genes associated with CD8-GZMK-T1. Then, we calculated the CD8-GZMK-T1 score in each ICI dataset to generate ROC curves for patient drug response.

Additional information

Acknowledgments

We would like to thank Qingyun Li for providing the cell labels for the microglial dataset(Li et al., 2019). We would also like to express our gratitude to Nengming Xiao, and Guo Fu for their valuable input and constructive suggestions during the preparation of the manuscript.

Code availability

We have developed a comprehensive workflow for MGPfact. Firstly, a Docker container enables one-click program execution (details at: https://github.com/renjun0324/ti_mgpfact). Additionally, to fully harness MGPfact’s capabilities, we have created the R package MGPfactR, accessible at: https://github.com/renjun0324/MGPfactR. Within this workflow, MCMC sampling for model parameter estimation is carried out using the Mamba library in the Julia environment. The related Julia package can be found here: https://github.com/renjun0324/MGPfact.jl. Additionally, other analysis scripts can be found on GitHub at https://github.com/renjun0324/mgpfact_paper. And the scFates Tree used in this paper is available as a performance comparison Docker container, constructed using the dendritic trajectory process in scFates (Faure et al., 2023), accessible at: https://github.com/renjun0324/ti_scfates_tree.

Data availability

The datasets used for performance comparison are archived on Zenodo by Saelens et al. (Saelens et al., 2019), with processed real and synthetic datasets available at https://doi.org/10.5281/zenodo.1443566. Data for specific cell instances of microglia and CD8⁺ T cells can be obtained from the GEO database with the following accession numbers: GSE123025 (Li et al., 2019), GSE99254 (Guo et al., 2018), and GSE108989 (Zhang et al., 2018). Expression matrices related to immune-checkpoint inhibitors (ICI) and clinical response information are downloadable from the CTR-DB (https://ctrdb.cloudna.cn).

Sources of Funding

This work was supported by the Fundamental Research Funds for the National Natural Science Foundation of China [82272944 to QL]; the National Natural Science Foundation of China [82203420 to JG].

Author Contributions

The project was conceived by Qiyuan Li. The model construction, data collection and analytical validation was done by Jun Ren, with the assistance of Yudi Hu and Jintao Guo. The manuscript was written by Jun Ren and Xuejing Lyu, with the assistance of Ying Zhou and Xiaodong Shi. All authors read and approved the final manuscript.

Supporting information

supplementary methods and figures

supplementary tables

References

1. Aerts S
2. Quan X-J
3. Claeys A
4. Naval Sanchez M
5. Tate P
6. Yan J
7. Hassan BA
2010Robust Target Gene Discovery through Transcriptome Perturbations and Genome-Wide Enhancer Predictions in Drosophila Uncovers a Regulatory Basis for Sensory SpecificationPLoS Biol 8https://doi.org/10.1371/journal.pbio.1000435
1. Aibar S
2. González-Blas CB
3. Moerman T
4. Huynh-Thu VA
5. Imrichova H
6. Hulselmans G
7. Rambow F
8. Marine J-C
9. Geurts P
10. Aerts J
11. van den Oord J
12. Atak ZK
13. Wouters J
14. Aerts S
2017SCENIC: single-cell regulatory network inference and clusteringNat Methods 14:1083–1086https://doi.org/10.1038/nmeth.4463
1. Anderson SR
2. Roberts JM
3. Ghena N
4. Irvin EA
5. Schwakopf J
6. Cooperstein IB
7. Bosco A
8. Vetter ML
2022Neuronal apoptosis drives remodeling states of microglia and shifts in survival pathway dependenceElife 11
1. Auslander N
2. Zhang G
3. Lee JS
4. Frederick DT
5. Miao B
6. Moll T
7. Tian T
8. Wei Z
9. Madan S
10. Sullivan RJ
11. Boland G
12. Flaherty K
13. Herlyn M
14. Ruppin E
2018Robust prediction of response to immune checkpoint blockade therapy in metastatic melanomaNat Med 24:1545–1549https://doi.org/10.1038/s41591-018-0157-9
1. Smith BJ
2014Mamba: Markov Chain Monte Carlo for Bayesian Analysis in julia
1. Becht E
2. McInnes L
3. Healy J
4. Dutertre C-A
5. Kwok IWH
6. Ng LG
7. Ginhoux F
8. Newell EW
2019Dimensionality reduction for visualizing single-cell data using UMAPNat Biotechnol 37:38–44https://doi.org/10.1038/nbt.4314
1. Butler A
2. Hoffman P
3. Smibert P
4. Papalexi E
5. Satija R
2018Integrating single-cell transcriptomic data across different conditions, technologies, and speciesNat Biotechnol 36:411–420https://doi.org/10.1038/nbt.4096
1. Cao J
2. Spielmann M
3. Qiu X
4. Huang X
5. Ibrahim DM
6. Hill AJ
7. Zhang F
8. Mundlos S
9. Christiansen L
10. Steemers FJ
11. Trapnell C
12. Shendure J
2019The single-cell transcriptional landscape of mammalian organogenesisNature 566:496–502https://doi.org/10.1038/s41586-019-0969-x
1. Cho J-W
2. Hong MH
3. Ha S-J
4. Kim Y-J
5. Cho BC
6. Lee I
7. Kim HR
2020Genome-wide identification of differentially methylated promoters and enhancers associated with response to anti-PD-1 therapy in non-small cell lung cancerExp Mol Med 52:1550–1563https://doi.org/10.1038/s12276-020-00493-8
1. Consortium GO
2004The Gene Ontology (GO) database and informatics resourceNucleic acids research 32:D258–D261
1. Costa F
2. Grün D
3. Backofen R
2018GraphDDP: a graph-embedding approach to detect differentiation pathways in single-cell-data using prior class knowledgeNat Commun 9https://doi.org/10.1038/s41467-018-05988-7
1. Dresselhaus EC
2. Meffert MK
2019Cellular Specificity of NF-κB Function in the Nervous SystemFront Immunol 10https://doi.org/10.3389/fimmu.2019.01043
1. Faure L
2. Soldatov R
3. Kharchenko PV
4. Adameyko I
2023scFates: a scalable python package for advanced pseudotime and bifurcation analysis from single-cell dataBioinformatics 39https://doi.org/10.1093/bioinformatics/btac746
1. Fritzke B
1994A growing neural gas network learns topologiesAdvances in neural information processing systems 7
1. Guo X
2. Zhang Y
3. Zheng L
4. Zheng C
5. Song J
6. Zhang Q
7. Kang B
8. Liu Z
9. Jin L
10. Xing R
11. Gao R
12. Zhang L
13. Dong M
14. Hu X
15. Ren X
16. Kirchhoff D
17. Roider HG
18. Yan T
19. Zhang Z
2018Global characterization of T cells in non-small-cell lung cancer by single-cell sequencingNat Med 24:978–985https://doi.org/10.1038/s41591-018-0045-3
1. Guzmán AU
2022Single-cell RNA sequencing of spinal cord microglia in a mouse model of neuropathic painMcGill University
1. Haghverdi L
2. Buettner F
3. Theis FJ
2015Diffusion maps for high-dimensional single-cell analysis of differentiation dataBioinformatics 31:2989–2998https://doi.org/10.1093/bioinformatics/btv325
1. Haghverdi L
2. Büttner M
3. Wolf FA
4. Buettner F
5. Theis FJ
2016Diffusion pseudotime robustly reconstructs lineage branchingNat Methods 13:845–848https://doi.org/10.1038/nmeth.3971
1. Hugo W
2. Zaretsky JM
3. Sun L
4. Song C
5. Moreno BH
6. Hu-Lieskovan S
7. Berent-Maoz B
8. Pang J
9. Chmielowski B
10. Cherry G
11. Seja E
12. Lomeli S
13. Kong X
14. Kelley MC
15. Sosman JA
16. Johnson DB
17. Ribas A
18. Lo RS
2016Genomic and Transcriptomic Features of Response to Anti-PD-1 Therapy in Metastatic MelanomaCell 165:35–44https://doi.org/10.1016/j.cell.2016.02.065
1. Huynh-Thu VA
2. Irrthum A
3. Wehenkel L
4. Geurts P
2010Inferring Regulatory Networks from Expression Data Using Tree-Based MethodsPLoS ONE 5https://doi.org/10.1371/journal.pone.0012776
1. Ji Z
2. Ji H
2016TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysisNucleic Acids Res 44:e117–e117https://doi.org/10.1093/nar/gkw430
1. Jung H
2. Kim HS
3. Kim JY
4. Sun J-M
5. Ahn JS
6. Ahn M-J
7. Park K
8. Esteller M
9. Lee S-H
10. Choi JK
2019DNA methylation loss promotes immune evasion of tumours with high mutation and copy number loadNat Commun 10https://doi.org/10.1038/s41467-019-12159-9
1. Lange M
2. Bergen V
3. Klein M
4. Setty M
5. Reuter B
6. Bakhti M
7. Lickert H
8. Ansari M
9. Schniering J
10. Schiller HB
11. Pe’er D
12. Theis FJ
2022CellRank for directed single-cell fate mappingNat Methods 19:159–170https://doi.org/10.1038/s41592-021-01346-6
1. Li Q
2023scTour: a deep learning architecture for robust inference and accurate prediction of cellular dynamicsGenome Biology
1. Li Q
2. Cheng Z
3. Zhou L
4. Darmanis S
5. Neff NF
6. Okamoto J
7. Gulati G
8. Bennett ML
9. Sun LO
10. Clarke LE
11. Marschallinger J
12. Yu G
13. Quake SR
14. Wyss-Coray T
15. Barres BA
2019Developmental Heterogeneity of Microglia and Brain Myeloid Cells Revealed by Deep Single-Cell RNA SequencingNeuron 101:207–223https://doi.org/10.1016/j.neuron.2018.12.006
1. Liu Jianfang
2. Lichtenberg T
3. Hoadley KA
4. Poisson LM
5. Lazar AJ
6. Cherniack AD
7. Kovatich AJ
8. Benz CC
9. Levine DA
10. Lee AV
11. Omberg L
12. Wolf DM
13. Shriver CD
14. Thorsson V
15. Hu H
16. Caesar-Johnson SJ
17. Demchok JA
18. Felau I
19. Kasapi M
20. Ferguson ML
21. Hutter CM
22. Sofia HJ
23. Tarnuzzer R
24. Wang Z
25. Yang L
26. Zenklusen JC
27. Zhang J
28. Chudamani S
29. Jia Liu
30. Lolla L
31. Naresh R
32. Pihl T
33. Sun Q
34. Wan Y
35. Wu Y
36. Cho J
37. DeFreitas T
38. Frazer S
39. Gehlenborg N
40. Getz G
41. Heiman DI
42. Kim J
43. Lawrence MS
44. Lin P
45. Meier S
46. Noble MS
47. Saksena G
48. Voet D
49. Hailei Zhang
50. Bernard B
51. Chambwe N
52. Dhankani V
53. Knijnenburg T
54. Kramer R
55. Leinonen K
56. Liu Y
57. Miller M
58. Reynolds S
59. Shmulevich I
60. Thorsson V
61. Zhang W
62. Akbani R
63. Broom BM
64. Hegde AM
65. Ju Z
66. Kanchi RS
67. Korkut A
68. Li J
69. Liang H
70. Ling S
71. Liu W
72. Lu Y
73. Mills GB
74. Ng K-S
75. Rao A
76. Ryan M
77. Jing Wang
78. Weinstein JN
79. Zhang J
80. Abeshouse A
81. Armenia J
82. Chakravarty D
83. Chatila WK
84. De Bruijn I
85. Gao J
86. Gross BE
87. Heins ZJ
88. Kundra R
89. La K
90. Ladanyi M
91. Luna A
92. Nissan MG
93. Ochoa A
94. Phillips SM
95. Reznik E
96. Sanchez-Vega F
97. Sander C
98. Schultz N
99. Sheridan R
100. Sumer SO
101. Sun Y
102. Taylor BS
103. Jioajiao Wang
104. Hongxin Zhang
105. Anur P
106. Peto M
107. Spellman P
108. Benz C
109. Stuart JM
110. Wong CK
111. Yau C
112. Hayes DN
113. Parker JS
114. Wilkerson MD
115. Ally A
116. Balasundaram M
117. Bowlby R
118. Brooks D
119. Carlsen R
120. Chuah E
121. Dhalla N
122. Holt R
123. Jones SJM
124. Kasaian K
125. Lee D
126. Ma Y
127. Marra MA
128. Mayo M
129. Moore RA
130. Mungall AJ
131. Mungall K
132. Robertson AG
133. Sadeghi S
134. Schein JE
135. Sipahimalani P
136. Tam A
137. Thiessen N
138. Tse K
139. Wong T
140. Berger AC
141. Beroukhim R
142. Cherniack AD
143. Cibulskis C
144. Gabriel SB
145. Gao GF
146. Ha G
147. Meyerson M
148. Schumacher SE
149. Shih J
150. Kucherlapati MH
151. Kucherlapati RS
152. Baylin S
153. Cope L
154. Danilova L
155. Bootwalla MS
156. Lai PH
157. Maglinte DT
158. Van Den Berg DJ
159. Weisenberger DJ
160. Auman JT
161. Balu S
162. Bodenheimer T
163. Fan C
164. Hoadley KA
165. Hoyle AP
166. Jefferys SR
167. Jones CD
168. Meng S
169. Mieczkowski PA
170. Mose LE
171. Perou AH
172. Perou CM
173. Roach J
174. Shi Y
175. Simons JV
176. Skelly T
177. Soloway MG
178. Tan D
179. Veluvolu U
180. Fan H
181. Hinoue T
182. Laird PW
183. Shen H
184. Zhou W
185. Bellair M
186. Chang K
187. Covington K
188. Creighton CJ
189. Dinh H
190. Doddapaneni H
191. Donehower LA
192. Drummond J
193. Gibbs RA
194. Glenn R
195. Hale W
196. Han Y
197. Hu J
198. Korchina V
199. Lee S
200. Lewis L
201. Li W
202. Liu X
203. Morgan M
204. Morton D
205. Muzny D
206. Santibanez J
207. Sheth M
208. Shinbro E
209. Wang L
210. Wang M
211. Wheeler DA
212. Xi L
213. Zhao F
214. Hess J
215. Appelbaum EL
216. Bailey M
217. Cordes MG
218. Ding L
219. Fronick CC
220. Fulton LA
221. Fulton RS
222. Kandoth C
223. Mardis ER
224. McLellan MD
225. Miller CA
226. Schmidt HK
227. Wilson RK
228. Crain D
229. Curley E
230. Gardner J
231. Lau K
232. Mallery D
233. Morris S
234. Paulauskis J
235. Penny R
236. Shelton C
237. Shelton T
238. Sherman M
239. Thompson E
240. Yena P
241. Bowen J
242. Gastier-Foster JM
243. Gerken M
244. Leraas KM
245. Lichtenberg TM
246. Ramirez NC
247. Wise L
248. Zmuda E
249. Corcoran N
250. Costello T
251. Hovens C
252. Carvalho AL
253. De Carvalho AC
254. Fregnani JH
255. Longatto-Filho A
256. Reis RM
257. Scapulatempo-Neto C
258. Silveira HCS
259. Vidal DO
260. Burnette A
261. Eschbacher J
262. Hermes B
263. Noss A
264. Singh R
265. Anderson ML
266. Castro PD
267. Ittmann M
268. Huntsman D
269. Kohl B
270. Le X
271. Thorp R
272. Andry C
273. Duffy ER
274. Lyadov V
275. Paklina O
276. Setdikova G
277. Shabunin A
278. Tavobilov M
279. McPherson C
280. Warnick R
281. Berkowitz R
282. Cramer D
283. Feltmate C
284. Horowitz N
285. Kibel A
286. Muto M
287. Raut CP
288. Malykh A
289. Barnholtz-Sloan JS
290. Barrett W
291. Devine K
292. Fulop J
293. Ostrom QT
294. Shimmel K
295. Wolinsky Y
296. Sloan AE
297. De Rose A
298. Giuliante F
299. Goodman M
300. Karlan BY
301. Hagedorn CH
302. Eckman J
303. Harr J
304. Myers J
305. Tucker K
306. Zach LA
307. Deyarmin B
308. Hu H
309. Kvecher L
310. Larson C
311. Mural RJ
312. Somiari S
313. Vicha A
314. Zelinka T
315. Bennett J
316. Iacocca M
317. Rabeno B
318. Swanson P
319. Latour M
320. Lacombe L
321. Têtu B
322. Bergeron A
323. McGraw M
324. Staugaitis SM
325. Chabot J
326. Hibshoosh H
327. Sepulveda A
328. Su T
329. Wang T
330. Potapova O
331. Voronina O
332. Desjardins L
333. Mariani O
334. Roman-Roman S
335. Sastre X
336. Stern M-H
337. Cheng F
338. Signoretti S
339. Berchuck A
340. Bigner D
341. Lipp E
342. Marks J
343. McCall S
344. McLendon R
345. Secord A
346. Sharp A
347. Behera M
348. Brat DJ
349. Chen A
350. Delman K
351. Force S
352. Khuri F
353. Magliocca K
354. Maithel S
355. Olson JJ
356. Owonikoko T
357. Pickens A
358. Ramalingam S
359. Shin DM
360. Sica G
361. Van Meir EG
362. Hongzheng Zhang
363. Eijckenboom W
364. Gillis A
365. Korpershoek E
366. Looijenga L
367. Oosterhuis W
368. Stoop H
369. Van Kessel KE
370. Zwarthoff EC
371. Calatozzolo C
372. Cuppini L
373. Cuzzubbo S
374. DiMeco F
375. Finocchiaro G
376. Mattei L
377. Perin A
378. Pollo B
379. Chen C
380. Houck J
381. Lohavanichbutr P
382. Hartmann A
383. Stoehr C
384. Stoehr R
385. Taubert H
386. Wach S
387. Wullich B
388. Kycler W
389. Murawa D
390. Wiznerowicz M
391. Chung K
392. Edenfield WJ
393. Martin J
394. Baudin E
395. Bubley G
396. Bueno R
397. De Rienzo A
398. Richards WG
399. Kalkanis S
400. Mikkelsen T
401. Noushmehr H
402. Scarpace L
403. Girard N
404. Aymerich M
405. Campo E
406. Giné E
407. Guillermo AL
408. Van Bang N
409. Hanh PT
410. Phu BD
411. Tang Y
412. Colman H
413. Evason K
414. Dottino PR
415. Martignetti JA
416. Gabra H
417. Juhl H
418. Akeredolu T
419. Stepa S
420. Hoon D
421. Ahn K
422. Kang KJ
423. Beuschlein F
424. Breggia A
425. Birrer M
426. Bell D
427. Borad M
428. Bryce AH
429. Castle E
430. Chandan V
431. Cheville J
432. Copland JA
433. Farnell M
434. Flotte T
435. Giama N
436. Ho T
437. Kendrick M
438. Kocher J-P
439. Kopp K
440. Moser C
441. Nagorney D
442. O’Brien D
443. O’Neill BP
444. Patel T
445. Petersen G
446. Que F
447. Rivera M
448. Roberts L
449. Smallridge R
450. Smyrk T
451. Stanton M
452. Thompson RH
453. Torbenson M
454. Yang JD
455. Zhang L
456. Brimo F
457. Ajani JA
458. Angulo Gonzalez AM
459. Behrens C
460. Bondaruk J
461. Broaddus R
462. Czerniak B
463. Esmaeli B
464. Fujimoto J
465. Gershenwald J
466. Guo C
467. Lazar AJ
468. Logothetis C
469. Meric-Bernstam F
470. Moran C
471. Ramondetta L
472. Rice D
473. Sood A
474. Tamboli P
475. Thompson T
476. Troncoso P
477. Tsao A
478. Wistuba I
479. Carter C
480. Haydu L
481. Hersey P
482. Jakrot V
483. Kakavand H
484. Kefford R
485. Lee K
486. Long G
487. Mann G
488. Quinn M
489. Saw R
490. Scolyer R
491. Shannon K
492. Spillane A
493. Stretch J
494. Synott M
495. Thompson J
496. Wilmott J
497. Al-Ahmadie H
498. Chan TA
499. Ghossein R
500. Gopalan A
501. Levine DA
502. Reuter V
503. Singer S
504. Singh B
505. Tien NV
506. Broudy T
507. Mirsaidi C
508. Nair P
509. Drwiega P
510. Miller J
511. Smith J
512. Zaren H
513. Park J-W
514. Hung NP
515. Kebebew E
516. Linehan WM
517. Metwalli AR
518. Pacak K
519. Pinto PA
520. Schiffman M
521. Schmidt LS
522. Vocke CD
523. Wentzensen N
524. Worrell R
525. Yang H
526. Moncrieff M
527. Goparaju C
528. Melamed J
529. Pass H
530. Botnariuc N
531. Caraman I
532. Cernat M
533. Chemencedji I
534. Clipca A
535. Doruc S
536. Gorincioi G
537. Mura S
538. Pirtac M
539. Stancul I
540. Tcaciuc D
541. Albert M
542. Alexopoulou I
543. Arnaout A
544. Bartlett J
545. Engel J
546. Gilbert S
547. Parfitt J
548. Sekhon H
549. Thomas G
550. Rassl DM
551. Rintoul RC
552. Bifulco C
553. Tamakawa R
554. Urba W
555. Hayward N
556. Timmers H
557. Antenucci A
558. Facciolo F
559. Grazi G
560. Marino M
561. Merola R
562. De Krijger R
563. A-P Gimenez-Roqueplo
564. Piché A
565. Chevalier S
566. McKercher G
567. Birsoy K
568. Barnett G
569. Brewer C
570. Farver C
571. Naska T
572. Pennell NA
573. Raymond D
574. Schilero C
575. Smolenski K
576. Williams F
577. Morrison C
578. Borgia JA
579. Liptay MJ
580. Pool M
581. Seder CW
582. Junker K
583. Omberg L
584. Dinkin M
585. Manikhas G
586. Alvaro D
587. Bragazzi MC
588. Cardinale V
589. Carpino G
590. Gaudio E
591. Chesla D
592. Cottingham S
593. Dubina M
594. Moiseenko F
595. Dhanasekaran R
596. Becker K-F
597. Janssen K-P
598. Slotta-Huspenina J
599. Abdel-Rahman MH
600. Aziz D
601. Bell S
602. Cebulla CM
603. Davis A
604. Duell R
605. Elder JB
606. Hilty J
607. Kumar B
608. Lang J
609. Lehman NL
610. Mandt R
611. Nguyen P
612. Pilarski R
613. Rai K
614. Schoenfield L
615. Senecal K
616. Wakely P
617. Hansen P
618. Lechan R
619. Powers J
620. Tischler A
621. Grizzle WE
622. Sexton KC
623. Kastl A
624. Henderson J
625. Porten S
626. Waldmann J
627. Fassnacht M
628. Asa SL
629. Schadendorf D
630. Couce M
631. Graefen M
632. Huland H
633. Sauter G
634. Schlomm T
635. Simon R
636. Tennstedt P
637. Olabode O
638. Nelson M
639. Bathe O
640. Carroll PR
641. Chan JM
642. Disaia P
643. Glenn P
644. Kelley RK
645. Landen CN
646. Phillips J
647. Prados M
648. Simko J
649. Smith-McCune K
650. VandenBerg S
651. Roggin K
652. Fehrenbach A
653. Kendler A
654. Sifri S
655. Steele R
656. Jimeno A
657. Carey F
658. Forgie I
659. Mannelli M
660. Carney M
661. Hernandez B
662. Campos B
663. Herold-Mende C
664. Jungk C
665. Unterberg A
666. Von Deimling A
667. Bossler A
668. Galbraith J
669. Jacobus L
670. Knudson M
671. Knutson T
672. Ma D
673. Milhem M
674. Sigmund R
675. Godwin AK
676. Madan R
677. Rosenthal HG
678. Adebamowo C
679. Adebamowo SN
680. Boussioutas A
681. Beer D
682. Giordano T
683. Mes-Masson A-M
684. Saad F
685. Bocklage T
686. Landrum L
687. Mannel R
688. Moore K
689. Moxley K
690. Postier R
691. Walker J
692. Zuna R
693. Feldman M
694. Valdivieso F
695. Dhir R
696. Luketich J
697. Mora Pinero EM
698. Quintero-Aguilo M
699. Carlotti CG
700. Dos Santos JS
701. Kemp R
702. Sankarankuty A
703. Tirapelli D
704. Catto J
705. Agnew K
706. Swisher E
707. Creaney J
708. Robinson B
709. Shelley CS
710. Godwin EM
711. Kendall S
712. Shipman C
713. Bradford C
714. Carey T
715. Haddad A
716. Moyer J
717. Peterson L
718. Prince M
719. Rozek L
720. Wolf G
721. Bowman R
722. Fong KM
723. Yang I
724. Korst R
725. Rathmell WK
726. Fantacone-Campbell JL
727. Hooke JA
728. Kovatich AJ
729. Shriver CD
730. DiPersio J
731. Drake B
732. Govindan R
733. Heath S
734. Ley T
735. Van Tine B
736. Westervelt P
737. Rubin MA
738. Lee JI
739. Aredes ND
740. Mariamidze A
2018An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome AnalyticsCell 173:400–416https://doi.org/10.1016/j.cell.2018.02.052
1. Love M
2. Anders S
3. Huber W
2014Differential analysis of count data–the DESeq2 packageGenome Biol 15:10–1186
1. Lutter L
2. Van Der Wal MM
3. Brand EC
4. Maschmeyer P
5. Vastert S
6. Mashreghi M
7. Van Loosdregt J
8. Van Wijk F
2022Human regulatory T cells locally differentiate and are functionally heterogeneous within the inflamed arthritic jointClin & Trans Imm 11https://doi.org/10.1002/cti2.1420
1. Nawal HS
A Systems Biology Perspective of Stem Cell Differentiation into Microglia
1. Neal RM
2003Slice samplingThe annals of statistics 31:705–767
1. Papadopoulos N
2. Gonzalo PR
3. Söding J
2019PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processesBioinformatics 35:3517–3519https://doi.org/10.1093/bioinformatics/btz078
1. Pritchard GH
2. Phan AT
3. Christian DA
4. Blain TJ
5. Fang Q
6. Johnson J
7. Roy NH
8. Shallberg L
9. Kedl RM
10. Hunter CA
2023Early T-bet promotes LFA1 upregulation required for CD8+ effector and memory T cell developmentJournal of Experimental Medicine 220https://doi.org/10.1084/jem.20191287
1. Qiu X
2. Hill A
3. Packer J
4. Lin D
5. Ma Y-A
6. Trapnell C
2017Single-cell mRNA quantification and differential analysis with CensusNat Methods 14:309–315https://doi.org/10.1038/nmeth.4150
1. Qiu X
2. Mao Q
3. Tang Y
4. Wang L
5. Chawla R
6. Pliner HA
7. Trapnell C
2017Reversed graph embedding resolves complex single-cell trajectoriesNat Methods 14:979–982https://doi.org/10.1038/nmeth.4402
1. Ren J
2. Zhang Q
3. Zhou Y
4. Hu Y
5. Lyu X
6. Fang H
7. Yang J
8. Yu R
9. Shi X
10. Li Q
2022A downsampling method enables robust clustering and integration of single-cell transcriptome dataJournal of Biomedical Informatics 130https://doi.org/10.1016/j.jbi.2022.104093
1. Roberts GO
2. Rosenthal JS
2009Examples of adaptive MCMCJournal of computational and graphical statistics 18:349–367
1. Saelens W
2. Cannoodt R
3. Todorov H
4. Saeys Y
2019A comparison of single-cell trajectory inference methodsNat Biotechnol 37:547–554https://doi.org/10.1038/s41587-019-0071-9
1. Salmon AJ
2. Shavkunov AS
3. Miao Q
4. Jarjour NN
5. Keshari S
6. Esaulova E
7. Williams CD
8. Ward JP
9. Highsmith AM
10. Pineda JE
11. Taneja R
12. Chen K
13. Edelson BT
14. Gubin MM
2022BHLHE40 Regulates the T-Cell Effector Function Required for Tumor Microenvironment Remodeling and Immune Checkpoint Therapy EfficacyCancer Immunology Research 10:597–611https://doi.org/10.1158/2326-6066.CIR-21-0129
1. Schulz E
2. Speekenbrink M
3. Krause A
2018A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functionsJournal of Mathematical Psychology 85:1–16https://doi.org/10.1016/j.jmp.2018.03.001
1. Sha Y
2024Reconstructing growth and dynamic trajectories from single-cell transcriptomics dataNature Machine Intelligence 6:25–39
1. Smolander J
2. Junttila S
3. Venålåinen MS
4. Elo LL
2022scShaper: an ensemble method for fast and accurate linear trajectory inference from single-cell RNA-seq dataBioinformatics 38:1328–1335https://doi.org/10.1093/bioinformatics/btab831
1. Stuart T
2. Butler A
3. Hoffman P
4. Hafemeister C
5. Papalexi E
6. Mauck WM
7. Hao Y
8. Stoeckius M
9. Smibert P
10. Satija R
2019Comprehensive Integration of Single-Cell DataCell 177:1888–1902https://doi.org/10.1016/j.cell.2019.05.031
1. Tierney L
1994Markov chains for exploring posterior distributionsthe Annals of Statistics :1701–1728
1. Todorov H
2. Cannoodt R
3. Saelens W
4. Saeys Y
2020TinGa: fast and flexible trajectory inference with Growing Neural GasBioinformatics 36:i66–i74https://doi.org/10.1093/bioinformatics/btaa463
1. Trapani JA
2001Granzymes: a family of lymphocyte granule serine proteasesGenome Biol 2https://doi.org/10.1186/gb-2001-2-12-reviews3014
1. Van der Maaten L
2. Hinton G
2008Visualizing data using t-SNEJournal of machine learning research 9
1. Wang Q
2. Lu M
3. Zhu X
4. Gu X
5. Zhang T
6. Xia C
7. Yang L
8. Xu Y
9. Zhou M
2022The role of microglia immunometabolism in neurodegeneration: Focus on molecular determinants and metabolic intermediates of metabolic reprogrammingBiomedicine & Pharmacotherapy 153https://doi.org/10.1016/j.biopha.2022.113412
1. Yu G
2. Wang L-G
3. Han Y
4. He Q-Y
2012clusterProfiler: an R package for comparing biological themes among gene clustersOmics: a journal of integrative biology 16:284–287
1. Zappia L
2. Phipson B
3. Oshlack A
2017Splatter: simulation of single-cell RNA sequencing dataGenome Biol 18https://doi.org/10.1186/s13059-017-1305-0
1. Zhang L
2. Yu X
3. Zheng L
4. Zhang Y
5. Li Y
6. Fang Q
7. Gao R
8. Kang B
9. Zhang Q
10. Huang JY
11. Konno H
12. Guo X
13. Ye Y
14. Gao S
15. Wang S
16. Hu X
17. Ren X
18. Shen Z
19. Ouyang W
20. Zhang Z
2018Lineage tracking reveals dynamic relationships of T cells in colorectal cancerNature 564:268–272https://doi.org/10.1038/s41586-018-0694-x

Article and author information

Author information

Jun Ren
School of Informatics, Xiamen University, Xiamen, China, National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China, Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, China
Ying Zhou
National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China, Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, China
Yudi Hu
National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China
Jing Yang
National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China
Hongkun Fang
National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China
Xuejing Lyu
National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China
Jintao Guo
National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China
Xiaodong Shi
School of Informatics, Xiamen University, Xiamen, China
Qiyuan Li
National Institute for Data Science in Health and Medicine, School of Medicine, Xiamen University, Xiamen, China, Department of Hematology, The First Affiliated Hospital of Xiamen University and Institute of Hematology, School of Medicine, Xiamen University, Xiamen, China
ORCID iD: 0000-0002-8934-8948
- For correspondence: qiyuan.li@xmu.edu.cn

Version history

Sent for peer review: April 2, 2024
Preprint posted: April 4, 2024
Reviewed Preprint version 1: August 21, 2024
Reviewed Preprint version 2: December 18, 2024

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 188
downloads: 4
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Introduction

Results

Design of MGPfact

Overview of MGPfact workflow.

Algorithm 1 MGPfact: infer cell fate trajectory Algorithm 1 MGPfact: infer cell fate trajectory

MGPfact outperformed state-of-the-art methods in F1branches.

Performance evaluation of MGPfact

Robustness analysis of MGPfact

MGPfact predicts cellular trajectories

Trajectory inference (TI) performance of state-of-the-art methods in 239 test datasets.

Comparison of the explanatory power for CD8+ T cell fate for MGPfact and three other different methods.

Trajectory inference (TI) performance of state-of-the-art methods in 68 test datasets of real cell population.

Comparative of Time Efficiency and Memory Consumption

MGPfact recovers the trajectory of early postnatal microglia development

MGPfact recovers the determinants of the microglia development

MGPfact reconstructed the developmental trajectory of microglia, recovering known determinants of microglia fate.

Using consensus trajectories to delineate the development of microglial cells

MGPfact to decipher the evolution of tumor-associated CD8+ T cells

MGPfact better explains the fate of tumor-associated CD8+ T cells

MGPfact identifies T-cell gene expression signatures with clinical implications

Highly weighted genes (HWG) of the bifurcation processes of CD8+ T cells serve as reliable indicators for clinical outcome and ICI treatment response.

MGPfact serves as an effective approach for characterization of new cellular subtypes.

MGPfact identified new subtypes of CD8+ T cells with clinical implications

Discussion

Methods

Benchmarking MGPfact to state-of-the-art methods

Application of MGPfact in a Microglia Single-cell RNA-seq Dataset

Predicting the evolutionary trajectory of tumor-associated CD8+ T cells

Single-cell sequencing data processing

Functional enrichment of highly weighted genes

Transcription factor program analysis

Generating the consensus trajectory

Assessing consistency of MGPfact-derived CD8+ T Cell subtype trajectories

Survival analysis

Immune-checkpoint inhibitor treatment response analysis

Additional information

Acknowledgments

Code availability

Data availability

Sources of Funding

Author Contributions

Supporting information

References

Article and author information

Author information

Jun Ren

Ying Zhou

Yudi Hu

Jing Yang

Hongkun Fang

Xuejing Lyu

Jintao Guo

Xiaodong Shi

Qiyuan Li