Introduction

Astrocytes are the most abundant cells in the brain. They play important roles in the central nervous system in supporting neuronal survival and synaptic activities, including the regulation of ionic homeostasis, providing energetic support, elimination of oxidative stress, and neurotransmitter removal and recycle (Verkhratsky and Nedergaard 2018). Abnormalities in astrocytes have been linked to various neurodegenerative and neurodevelopmental disorders, such as Parkinson’s disease, Alzheimer’s disease, Huntington’s disease, autism spectrum disorders and Alexander’s disease (Molofsky et al. 2012; Phatnani and Maniatis 2015; Booth et al. 2017). There is therefore a growing interest in using human pluripotent stem cell (PSC)-derived astrocytes for disease modelling in vitro (Chandrasekaran et al. 2016).

Contrary to the widely held belief that astrocytes in the brain are largely identical, recent studies revealed diversity in their transcriptomic profile, physiological properties and function (Oberheim et al. 2009; Schober et al. 2022). For example, single cell and spatial transcriptomic studies have identified several astrocyte subpopulations in the mouse cortex (Zhu et al. 2018; Batiuk et al. 2020; Bayraktar et al. 2020). In human, although astrocyte heterogeneity remains largely elusive, heterogeneity in radial glia across brain regions and within midbrain has been reported (La Manno et al. 2016; Li et al. 2023). Furthermore, different molecular and physiological features and distinct responses to stimuli were also observed in astrocytes from different mouse brain regions (Takata and Hirase 2008; Chai et al. 2017; Morel et al. 2017; Itoh et al. 2018; Kostuk et al. 2019; Makarava et al. 2019; Xin et al. 2019; Lozzi et al. 2020). Indeed, astrocyte heterogeneity has been suggested to underly the regional susceptibility to human diseases (Schober et al. 2022). Therefore, recapitulating astrocyte regional specificity in PSC-derived astrocytes is generally accepted as an important prerequisite.

Several reports have described the generation of regional astrocytes from human embryonic stem cells or induced pluripotent stem cells (iPSCs) which include forebrain (Krencik et al. 2011; Zhou et al. 2016; Tcw et al. 2017; Lin et al. 2018; Bradley et al. 2019; Hedegaard et al. 2020; Peteri et al. 2021), ventral midbrain (Booth et al. 2019; Barbuti et al. 2020; Crompton et al. 2021; de Rus Jacquet et al. 2021), hindbrain and spinal cord (Roybon et al. 2013; Serio et al. 2013; Holmqvist et al. 2015; Bradley et al. 2019; di Domenico et al. 2019; Yun et al. 2019). The regional identity of these astrocytes is typically evaluated at the stage of early neural progenitors generated via cell type- or region-directed neural patterning protocols, with the assumption that the positional characteristics will be faithfully preserved in the final astrocyte products. However, astrocyte production in vitro involves an extended period of astrocytic fate induction and progenitor expansion using FGF and EGF while substantial literature reported alterations in region-specific gene expression and/or neurogenic competence in expanded neural progenitors (Jain et al. 2003; Sun et al. 2008; Koch et al. 2009; Falk et al. 2012). Therefore, better characterisations of PSC-derived astrocytes and their lineage-specific features are needed to advance our knowledge about the molecular heterogeneity of human astrocytes.

Using a human iPSC line that allows the tracing of LMX1A expressing midbrain floor plate neural progenitors and their differentiated progeny (Cardo et al. 2023), we discovered an unexpected gradual depletion of LMX1A+ progenitor progeny during astrocyte induction from a bulk population of ventral midbrain patterned progenitors despite LMX1A+ progenitors being the predominant starting population. LMX1A+ progenitor derived astrocytes can however be generated if astrocytic induction is initiated from purified LMX1A+ progenitors, indicating that the positional constituents of the founding cell population may not be preserved faithfully in the derived astrocytes. Single cell RNA sequencing (scRNAseq) of astrocytes derived from both parental populations identified distinct transcriptomic signatures, providing a useful resource for assessment of human PSC-derived midbrain astrocytes.

Results

Depletion of LMX1A+ progenitors and/or derivatives in ventral midbrain patterned neural progenitor cultures during astrogenic induction

To investigate whether regionally patterned neural progenitors retain their lineage identity during astrogenic induction and glial progenitor expansion, we made use of the LMX1A-Cre/AAVS1-BFP iPSCs tracer line, which enables the tracking of LMX1A+ midbrain floor plate progenitors and their differentiated progeny(Cardo et al. 2023). We differentiated the LMX1A-Cre/AAVS1-BFP iPSCs towards the ventral midbrain fate following a modified protocol based on Jaeger et al. and Nolbrant et al.(Jaeger et al. 2011; Nolbrant et al. 2017) (Figure 1A). Immunocytochemistry of day (d)19 cultures confirmed a high proportion of cells expressing BFP (96.01±0.42%) and ventral midbrain progenitor markers LMX1A (92.94±0.91%), FOXA2 (94.76±0.57%) and OTX2 (97.82±0.28%; Figure 1B-D with the original images shown in Figure S1A). Most cells (91.26±1.64%) co-expressed LMX1A and FOXA2 (Figure 1B and 1D). At this stage, all BFP+ cells also stained positive to a pan neural progenitor marker NESTIN (Figure 1C). We detected a small proportion of cells expressing midbrain basal plate marker NKX6.1 (2.03±0.47%, Figure 1B-C) whose expression domain in the early developing ventral midbrain partially overlap with that of (Andersson et al. 2006)(Andersson et al. 2006). However, few PAX6+ cells were present (Figure 1C), which marks the forebrain and (Duan et al. 2013)(Duan et al. 2013). Immunocytochemical analysis of FACS-sorted BFP+ cells confirmed highly enriched expression of LMX1A (95.51±0.09%) and FOXA2 (95.35±0.14%), and co-expression of both markers (94.68±0.10%; Figure 1B and 1D). In contrast, only a small number of LMX1A+ cells (4.96±0.70%) were present in the sorted BFP- population (Figure 1B and1D). These findings provide further support that BFP expression faithfully recapitulate LMX1A expression and that LMX1A+ ventral midbrain progenitors represent the major cell population in d19(Cardo et al. 2023)(Cardo et al. 2023).

Depletion of LMX1A+ progenitors and their derivatives during astrogenic induction in ventral midbrain patterned neural progenitor cultures.

A, Schematic diagram of ventral midbrain neural differentiation and astrogenic induction. B-C, Representative view of immunocytochemistry of ventral midbrain neural progenitor markers and other regional markers in d19 unsorted, sorted BFP+, and sorted BFP- population. Scale bar represents 100 µm. Images shown were cropped to 300 µm×300 µm by randomly selecting the region of interest in the nuclei-only channel (uncropped greyscale images are shown in Figure S1A). D, Quantification of marker expression in unsorted, sorted BFP+, and sorted BFP- population. Error bars represent the standard error of means (SEM) of three independent experiments. E, Flow cytometry quantification of unsorted and BFP+ population during astrogenic induction and progenitor expansion. Each data point represents one biological replicate. The gating strategy used is shown in Figure S1B.

The d19 cells were then induced to undergo astrogenic switch in media containing FGF2 and EGF with the BFP content monitored by flow cytometry at each weekly passaging (Figure 1A, representative gating strategy is shown in Figure S1B-E). We found, unexpectedly, a dramatic decrease of BFP+ cell proportion from the starting point of 88.20±1.10% to only 27.59±8.28% at d43 and to nearly absent as the differentiation continued (Figure 1E). We did not observe evident cell death during culture and replating; thus, the absence of BFP could be either due to the silencing of BFP expression in the derivatives of LMX1A+ progenitors or the loss of these cells through growth competition. To address this question, we performed progenitor expansion and astrogenic induction under the same culture condition with purified d19 BFP+ progenitors isolated using fluorescence-activated cell sorting (FACS). Interestingly, the sorted BFP+ cells exhibited similar population growth rate to that of unsorted cultures, and the proportion of BFP+ cells remained at approximately 90% throughout the astrogenic induction and glial progenitor expansion period (Figure 1E). This observation demonstrates that BFP expression can be maintained in the derivatives of LMX1A+ midbrain floorplate progenitors, and that the loss of BFP+ cells in the unsorted culture is likely due to their growth disadvantage compared to the derivatives of LMX1A- progenitors.

Our finding is unexpected and demonstrates that the regional or lineage identity of PSC-derived astrocytic cells should not be assumed merely based on the dominant regional property of their cellular origin, given that no in vitro fate specification paradigm is 100% efficient.

Astrogenic switch occurred earlier in derivatives of LMX1A+ midbrain progenitors

Since sorted BFP+ (LMX1A+ or their derivatives) and unsorted astrocytic cultures differ distinctively in BFP expression soon after the initiation of astrogenic induction, for simplicity, these cultures are hereafter referred to as the BFP+ and BFP- cultures, respectively. Wondering whether the two cell populations behave differently in the process of astrogenic switch, we examined astrocytic marker expression in these cultures at d45 and d98 by immunocytochemistry. SOX9 and NFIA are transcription factors crucial for the initiation of astrogenesis and acquisition of astrogenic competence in the developing central nervous system (Stolt et al. 2003; Deneen et al. 2006), while CD44 identifies astrocyte-restricted precursors (Liu et al. 2004). We found that all these markers were more abundantly detected in the BFP+ cultures (NFIA: 65.89±2.81%; SOX9: 57.19±4.25%) than the BFP- cultures (NFIA: 4.26±1.28%; SOX9: 8.88±1.82%) at d45 (Two-way ANOVA with post-hoc Turkey test, NFIA: p=2.52×10-5, SOX9: p=9.21×10-6; Figure 2A-B with the original images shown in Figure S2). Although the number of NFIA+ and SOX9+ cells significantly increased in the BFP- cultures by d98 (NFIA: 44.07±4.56% on d98, p=3.58×10-4; SOX9: 44.28±2.84% on d98, p=1.21×10-4), the BFP+ cultures still contained more cells expressing NFIA (65.71±4.25%; p=2.06×10-2) and SOX9 (73.25±2.12%; p=5.51×10-4) than BFP-cultures (Figure 2A and 2C).

Early astrogenic switch and astrocyte maturation in derivatives of LMX1A+ midbrain progenitors.

A, Representative view of immunocytochemistry of astrogenic marker expression in BFP+ and unsorted progenitors at day 45 and 98. Scale bar represents 100 µm. Images shown were cropped to 462 µm×462 µm by randomly selecting the region of interest in the nuclei-only channel (uncropped greyscale images are shown in Figure S2). B, Representative view of immunocytochemistry of astrocyte marker expression in early and late, BFP+ and BFP- (unsorted) astrocytes. Scale bar represents 100 µm. Images shown were cropped to 300 µm×300 µm by randomly selecting the region of interest in the nuclei-only channel (uncropped greyscale images are shown in Figure S3). C, Quantification of immunocytochemistry of astrogenic marker in BFP+ and BFP- progenitors at day 45 and 98 shown in Panel A. Error bars represent the standard error of means (SEM) of three independent experiments. Two-way ANOVA was performed to compare between lineages (NFIA: p=5.389×10-6, df=1, effect size=3.62; SOX9: p=1.96×10-6, df=1, effect size=4.77) and days of differentiation (NFIA: p=7.82×10-5, df=1, effect size=1.99; SOX9: p=2.62×10-5, df=1, effect size=2.99) D, Quantification of immunocytochemistry of astrocyte marker expression in astrocytes. Error bars represent SEM of three independent experiments. Kruskal-Wallis test results following Bonferroni correction are shown on the top of the figure (AQP4: p.adjust=0.12, df=3, H=8.95; EAAT2: p.adjust=1.00, df=3, H=0.95; GFAP: p.adjust=0.06, df=3, H=10.38; S100B: p.adjust=0.11, df=3, H=9.05). E, Averaged trace of ATP-induced Ca2+ response assayed using FLIPR. Drugs or DMSO were applied at 1 minute of the assay. The line represents the average fluorescence change (ΔF/F0) in at least three independent experiments each with at least three replicate wells. The shaded area represents the SEM across at least three independent experiments. F, Quantitative comparison of the peak amplitude of ATP-induced Ca2+ response among conditions (two-way ANOVA, p<2.2×10-16, df=2, effect size=2.54) and samples (p=2.87×10-14, df=3, effect size=2.17). Error bars represent the SEM across at least independent experiments. G: Quantitative comparison of the rise time of ATP-induced Ca2+ response among conditions (two-way ANOVA, p=2.19×10-13, df=2, effect size =1.958) and samples (p=0.064, df=3, effect size=0.76). Intergroup comparison was performed using Post-hoc Tukey test. Error bars represent the SEM across at least three independent experiments. (****: p<0.0001, ***: p<0.001, **: p<0.01, *: p<0.05, ns: not significant).

To investigate whether the temporal difference in astrocytic switch between the BFP+ and BFP- cultures affects maturation and functionality of the derived astrocytes, we initiated astrocyte terminal differentiation by exposing the BFP+ and BFP- astrocyte precursors to CNTF and BMP4 (Krencik et al. 2011; Bradley et al. 2019) from d87 (referred to as early astrocytes) and d136 (late astrocytes). Both BFP+ and BFP-cultures exhibited a similar expression profile of classic astrocyte markers, including AQP4, EAAT2, and S100B, but few GFAP+ cells at both time points (Figure 2B and 2D with the original images shown in Figure S3).

As a reference, we also generated neural progenitors without employing any patterning cues and induced astrogenic switch and astrocyte differentiation from these non-patterned neural progenitors (Figure S4A-B). We found that, while the astrocyte cultures derived from the non-patterned progenitors contained a similar proportion of cells expressing AQP4, EAAT2 and S100B compared to the BFP+ and BFP- astrocyte cultures, there are more GFAP+ cells in the non-patterned astrocyte preparations (17.89±5.4%, Figure S4A-B).

Functional astrocytes exhibit transient calcium (Ca2+) spikes upon chemical stimulation, such as ATP (Zhang et al. 2016). Using a FLIPR Ca2+ release assay, we observed a sharp increase in the intracellular Ca2+ concentration upon ATP administration in both the early and late BFP+ astrocyte populations (early BFP+: p=5.37×10-12; late BFP+: p<2.2×10-16; Figure 2E-F). ATP induced Ca2+ release is partially mediated by inositol trisphosphate. Indeed, addition of an inositol trisphosphate receptor antagonist 2-aminoethoxydiphenylborate (2-APB) reduced the amplitude (early BFP+: p=5.88×10-5; late BFP+: p=9.22×10-10; Figure 2F) and rise time (early BFP+: p=2.63×10-5; late BFP+: p=3.59×10-4; late BFP-: p=0.028; Figure 2G) of ATP-induced Ca2+ response in both the early and late BFP+ astrocytes. Interestingly, the early BFP+ astrocytes had significantly lower peak amplitude than that observed in late BFP+ astrocytes (p=6.92×10-9; Figure 2F), despite their similar level of astrocyte marker expression, suggesting a difference in maturity at the functional level. Early and late BFP- astrocytes exhibited a similar profile of time-dependent increase in the amplitude of ATP-induced Ca2+ response but did not reach statistical significance (Figure 2E-F). However, late BFP- astrocytes showed a significantly lower peak amplitude than late BFP+ astrocytes (p<2.2×10-16; Figure 2E-F).

Taken together, our data demonstrate that astrogenesis occurred earlier in BFP+ cultures than BFP- cells. This temporal difference is also reflected in functional maturity of the derived astrocytes, despite a similar expression profile of classic astroglial markers.

Single cell RNA sequencing confirms the authenticity of PSC-derived astrocytes

To further characterize the PSC-derived astrocytes, we performed full-length scRNAseq on early and late BFP+ and BFP- astrocytes using the iCELL8 platform and SMART-seq technology, with non-patterned astrocytes derived from the LMX1A-Cre/AAVS1-BFP tracer line as a control. A sample of iPSC-derived neurons was included to facilitate downstream cell type identification. We profiled FACS purified astrocytes expressing CD49f as well as unsorted cultures for comparison of all three astrocyte populations (Barbar et al. 2020); Figure S4C-H). After stringent filtering (Figure S5A-C, see Methods and Materials on filtering), we obtained a total of 17478 protein-coding genes in 1786 qualifying cells, with an average of 6326 protein-coding genes detected per cell.

Unsupervised Louvain clustering identified 12 cell clusters (Figure 3A). Cells clustered mainly based on sample type (astrocytes and neurons; Figure S6A) and the estimated cell cycle phase (Figure S6B), while sorted CD49f+ and unsorted astrocytes were clustered together (Figure S6A). TagBFP was detected at a higher level in BFP+ astrocyte samples than in BFP- and NP samples, while TagBFP expression was negligible in the neuronal samples derived from an iPSC line without BFP transgene (Figure S6C-D). Using a set of known astrocyte and neuronal signature genes (Figure 3B), we identified cells in clusters 0, 1, and 5-11 as astrocytes (Figure 3B), which were enriched in expression of SOX9, NFIA, NFIB, HES1, S100A13, S100A16, EGFR, CD44 and GJA1 (Figure 3B). These transcripts were also detected at high levels in clusters 2 and 4, which were mostly estimated to be in cell cycle phases G2, M and S (Figure S6B). In addition, clusters 2 and 4 showed high levels of proliferation-related transcripts, such as TOP2A, MKI67, CDK1 and AURKA (Figure 3B), and are thus defined as astrocyte precursors. In contrast, Cluster 3 contains mostly cells from the neuronal sample (Figure 3A and Figure S6A) and indeed expressed high levels of genes closely related to neuronal structure and function (such as STMN2, SYT1, DCX, MAPT, and SNAP25; Figure 3B). We did not detect transcripts indicative of endoderm (GATA4), mesoderm (TBXT and TBX6), and oligodendrocyte progenitors (SOX10 and PDGFRA) in any of these clusters (Figure S6C).

Single cell RNA sequencing confirms the authenticity of PSC-derived astrocytes.

A, Uniform manifold approximation and projection plot of unbiased clustering, coloured by clusters. B, Heatmap of the normalised expression of selected markers in different clusters. The assigned identity to each cluster is shown at the top of the plot. C, Sankey plot summarising the result of reference mapping of cells in different clusters to eight published reference human brain scRNAseq datasets. The thickness of the thread is proportional to the number of cells mapped to the same identity in the reference datasets (predicted identity). Detailed results of referencing mapping to each reference datasets are shown in Figure S7A-H and prediction score shown in Figure S7I. (PRE: precursors; N: neurons)

To determine the authenticity of these PSC-derived astrocytes, we mapped our data to published scRNAseq datasets obtained from five foetal, an adult human brain and a PSC-derived astrocyte study using Seurat integration (La Manno et al. 2016; Sloan et al. 2017; Zhong et al. 2018; Polioudakis et al. 2019; Agarwal et al. 2020; Fan et al. 2020; Bhaduri et al. 2021; Eze et al. 2021). We found that cells in clusters annotated as astrocytes (clusters 0, 1, and 5-11) were indeed predominantly mapped to the reference astrocyte or astrocyte precursor populations with high confidence (prediction score over 0.5; Figure 3C and Figure S7). In contrast, neuronal cluster 2 was mapped to neurons of the foetal reference datasets, while the astrocyte precursor clusters (2 and 4) were mapped to progenitor populations in the foetal reference datasets (Figure 3C and Figure S7). These findings demonstrate that our iPSC-derived astrocytes closely resemble those in the human brains.

Distinct transcriptome fingerprints of LMX1A+ midbrain floor plate-derived astrocytes

Significant advance has been made recently in understanding the molecular profiles of midbrain dopamine neurons. However, our knowledge about midbrain astrocytes in this regard remains limited and does not inform anatomic or lineage origin of the cells. In this regard, the BFP+ astrocytes provide a unique resource to determine the transcriptomic characteristics of human midbrain floor plate derived astrocytes. By performing pairwise differential gene expression (details described in Methods and Materials), we identified 1153 genes differentially expressed (DEGs; adjusted p values less than 0.05 and log2 fold change over 0.25) in BFP+ astrocytes when compared to either BFP- or non-patterned astrocyte populations (Supplementary Data 1). Of these, 159 were unique to BFP+ astrocytes (BFP+ enriched, Figure 4A), which include genes associated with midbrain dopamine neuron development such as SULF1, LMO3, NELL2, and RCAN2 (Figure 4B) (Strelau et al. 2000; La Manno et al. 2016; Bifsha et al. 2017; Ahmed et al. 2021). Interestingly, LMX1A and FOXA2, which was used to evaluate PSC-derived midbrain astrocytes in previous studies, were not detected in BFP+ astrocytes (Figure S6D). We also identified 530 DEGs enriched in BFP- astrocytes only (BFP- enriched, Figure 4A and Supplementary Data 1), which include those known to be expressed in the ventrolateral - dorsal domain of the midbrain and hindbrain, such as IRX3, IRX5, PAX3, and PAX7 (Figure 4B)(Houweling et al. 2001; Matsunaga et al. 2001). This transcription profile supports the notion that the BFP- astrocytes are descendants of the initial minor populations of lateral midbrain progenitors. Moreover, 72 DEGs were shared by BFP+ and BFP- astrocytes against the non-patterned astrocytes (midbrain enriched, Figure 4A and Supplementary Data 1). This set of genes includes NR2F1, NR2F2, ZEB2, KCNJ6 and SRPX (Figure 4B), which have been reported to be signatures of mouse midbrain astrocytes (Endo et al. 2022). Together, our findings provide a new entry to transcriptomic characteristics of midbrain astrocytes and specifically a gene expression map of midbrain floor plate derived human astrocyte lineage.

Distinct transcriptome fingerprints of LMX1A+ midbrain floor plate-derived astrocytes

A, Heatmap of the normalised expression of population-specific genes in different populations of astrocytes. B, Violin plots of the normalised expression of selected candidate markers for BFP+, BFP-, and non-patterned (NP) astrocytes. C-D, Representative GO terms significantly enriched in BFP+ (C) and BFP- (D) enriched genes. Semantically similar representative terms were shown with the same colour.

Gene ontology (GO) enrichment analysis was performed on the 1153 DEGs enriched in BFP+ astrocytes (Supplementary Data 2). The significantly enriched GO terms were mainly related to various aspects of metabolism, stress response, biosynthesis, lysosomal activity, and cellular respiration (Figure 4C and Supplementary Data 3). These biological processes have previously been shown to be disrupted by several mutations causing familial Parkinson’s disease (di Domenico et al. 2019; Barbuti et al. 2020; Sonninen et al. 2020). In contrast, the GO terms associated with 530 BFP- astrocyte enriched DEGs were mostly related to formation of extracellular matrix and tissue development (Figure 4D and Supplementary Data 2-3). The differential enrichment of GO terms implicates functional difference between the BFP+ and BFP- astrocytes and supports the need for generating regional specific astrocytes for disease modelling.

Discussion

Despite the general belief that recapitulating astrocyte lineage heterogeneity is necessary for stem cell-based disease modelling and cell transplantation, the extent of astrocyte heterogeneity in different brain regions and subregions remains largely elusive. By harnessing an LMX1A based lineage tracing human iPSC line and cutting edge scRNAseq technology, we show that astrocytes derived from the LMX1A+ midbrain floor plate progenitors, the same cells giving rise to midbrain dopaminergic neurons, possess distinct transcriptional landmarks from those derived from non-midbrain patterned neural progenitors as well as other midbrain patterned progenitors. Moreover, we discovered unexpected negative selection against derivatives of LMX1A+ progenitors during astrocyte induction and progenitor expansion. Our study highlights the need for careful characterisation of PSC-derived astrocytes and provides a transcriptomic fingerprint for midbrain floor plate derived astrocytes.

Using a popular astrocyte in vitro differentiation paradigm, we found that astrocytes descended from the LMX1A+ midbrain progenitors could only be obtained from purified progenitors. In contrast, astrocytes derived from bulk midbrain patterned progenitors exhibits transcriptomic profiles of the lateral-dorsal midbrain, despite LMX1A+ midbrain progenitors being the predominant starting population. Our findings demonstrate that the lineage composition of the parent progenitors may not be faithfully preserved during astrocyte induction and progenitor expansion. FGF is the most popular inductive molecule used for astrocyte differentiation from stem cells (Chandrasekaran et al. 2016). It is evident however that FGF expanded neural progenitors, originated either from the brain or neutralized PSCs, exhibit restricted regional competence and positional gene expression. For example, bulk expanded human ventral midbrain neural progenitors (Jain et al. 2003), and fetal forebrain or spinal cord derived neural stem (NS) cells only give rise to GABAergic neurons (Sun et al. 2008); likewise, lt-NES cells display anterior hindbrain-like positional profile (Falk et al. 2012), while their antecedents, PSC-derived neural rosettes and early passage derivatives, express anterior forebrain markers (Koch et al. 2009). It is not clear whether this is due to deregulation of the original patterning at the level of gene expression or the loss of associated cell population (Gabay et al. 2003). In this study, since BFP+ astrocytes can be generated under the same culture condition with purified LMX1A+ progenitors, we reasoned that the loss of their derivatives in unsorted cultures was possibly due to differential growth capacity.

Our study highlights the need for careful assessment of astrocyte positional identity. A common practice in this regard is to confirm the regional characteristics of the founder progenitors following fate directed neural induction, with the assumption that the dominant positional features will be maintained by the astrocyte progeny (Krencik et al. 2011). This strategy is at least partly dictated by our limited knowledge in gene expression signatures of regional- and/or lineage-specific astrocytes. Hence, an end point evaluation of PSC-derived astrocytes often relies on region-specific markers defined in the developmental brain during the neurogenic period. For example, LMX1A and FOXA2 expression were used as criteria for midbrain astrocytes in previous studies (Barbuti et al. 2020; Crompton et al. 2023). However, scRNAseq of human fetal ventral midbrain and adult substantia nigra revealed negligible expression of these transcripts in astrocytes (La Manno et al. 2016; Agarwal et al. 2020; Kamath et al. 2022). Consistent with these findings, we did not detect LMX1A or FOXA2 in neither the BFP+ nor BFP- astrocytes. Our analysis however identified new positive and negative markers that could be applied to confirm the regional identity of ventral midbrain astrocytes.

In addition to distinct transcriptomic profile, BFP+ and BFP- astrocytes may also be functionally different. Astrocytes generated from progenitors broadly patterned to the dorsal forebrain, ventral forebrain and spinal cord have been shown to exhibiting different GO enrichment profile as well as different physiological and functional properties (Bradley et al. 2019). In comparison to the BFP- and non-patterned astrocytes, the current study revealed that GO terms enriched in BFP+ astrocytes, which originated from the same progenitor giving rise to midbrain dopaminergic neurons, were closely related to various biological processes disrupted in astrocytes carrying familial Parkinson’s disease mutations (di Domenico et al. 2019; Barbuti et al. 2020; Sonninen et al. 2020). Such a distinct enrichment profile implicates BFP+ astrocytes being functionally adapted to supporting midbrain dopaminergic neurons compared to BFP- and non-patterned astrocytes.

In conclusion, this study lends further support on regional diversity of astrocytes and identified a set of midbrain enriched genes. Crucially, the transcriptomic fingerprint of human midbrain floor plate-derived astrocytes described here offers a much-needed resource for assessing the authenticity of stem cell derived astrocytes in studies associated with Parkinson’s disease.

Methods and Materials

Stem cell culture and astrocyte differentiation

KOLF2 human iPSCs were maintained in E8 flex media (ThermoFisher) and manually dissociated using Gentle Cell Dissociation Reagent (STEMCELL Technologies) as previously described (Cardo et al. 2023). Astrocytes were differentiated using a three-stage stepwise strategy consisting neural induction and regional patterning, astrogenic switch and progenitor expansion, and astrocyte terminal differentiation. Midbrain floor plate progenitors were generated as previously described (Cardo et al. 2023). At day 19, cells were replated as single cells onto poly-D-lysine-laminin-coated plates at 1×106 cells/cm2 for astrogenic switch and progenitor expansion in N2B27 media supplemented with 10 ng/mL FGF2 (Peprotech) and 10 ng/mL Human EGF (Peprotech) and replated every 6-8 days. For astrocyte terminal differentiation, expanded neural progenitors were re-plated at a density of 3×104 cells/cm2 in expansion media and 24 hours later switched to N2B27 supplemented with 10 ng/mL human recombinant CNTF (Peprotech) and 10 ng/mL human recombinant BMP4 (Peprotech) for 7 days followed by media containing CNTF alone for another 13 days. 10 µM Y-27632 was used for 24 hours before and after each replating. The protocol for generating non-patterned astrocytes was the same as for floor plate-derived astrocyte except the neural progenitors were derived with duo-SMAD inhibitors only without ventral patterning reagents.

Flow cytometry analysis and cell isolation

Cells were dissociated in Accutase as described above and washed twice with DPBS by centrifugation for 5 minutes at 200 rcf. For evaluating BFP expression, dissociated cells were resuspended in 0.5 mM EDTA in DPBS (Sigma-Aldrich) and analysed on a BD LSRFortessa cell analyser (BD Biosciences). For purifying BFP+ cells, dissociated cells were resuspended in the same cell culture media. Background autofluorescence was compensated for using KOLF2 parental cell line at a similar stage of differentiation to define BFP- gating. For purifying CD49f+ astrocytes, dissociated cells were stained with Alexa Fluor 647-conjugated rat anti-CD49f antibody (5% v/v in a 100 µL reaction; BD Biosciences) for 25 minutes at 37°C on an orbital shaker at 200 rcf and resuspended in DPBS containing 0.5% bovine serum albumin and 50 units/mL DNase I (Sigma Aldrich). Background autofluorescence was compensated for using KOLF2 parental cell line at a similar stage of differentiation to define BFP- gating and unstained astrocytes to define CD49f- gating. Cell sorting was performed on a BD FACSAria III (BD Biosciences) using an 80 µm nozzle. Sorted cells were collected in the same media as for resuspension. Flow cytometry data were analysed in FlowJo v10.8.1 (BD Biosciences) as shown in Figure S1B-E. Briefly, non-debris events were selected using the eclipse gates on dot graphs of SSC-A versus FSC-A. Singlet events were sequentially gated using polygonal gates on dot graphs of FSC-H versus FSC-A and SSC-H versus SSC-A by selecting the events in the diagonal region. The positive and negative gates in the fluorescence channel were set as bifurcate gates at a minimum of 99.9% percentile (usually at 99.99% percentile) on the histogram of the fluorescence intensity of the negative control sample of the same flow cytometry experiment and applied to all samples of the same flow cytometry experiment.

Immunocytochemistry

Cultures were fixed with 3.7% PFA for 15-20 min at 4 °C. For nuclear antigen detection, an additional fixation with methanol gradient was performed, which include 5 mins each in 33% and 66% methanol at room temperature followed by 100% methanol for 20 min at -20°C. Cultures were then returned to PBST via inverse gradient and were then permeabilized with three 10-minute washes in 0.3% Triton-X-100 in PBS (PBS-T) and then blocked in PBS-T containing 1% BSA and 3% donkey serum. Cells were incubated with primary antibodies in blocking solution overnight at 4°C. Following three PBS-T washes, Alexa-Fluor secondary antibodies (Thermo Fisher Scientific) were added at 1:1000 PBS-T for 1 hour at ambient temperature in the dark. Three PBS-T washes were then performed that included once with DAPI (Molecular Probes). Images were taken on a Leica DMI6000B inverted microscope. Quantification was carried out in Cell Profiler (Stirling et al. 2021) or manually using ImageJ (Schindelin et al. 2012) by examining at least four randomly selected fields from three independent experiments. The antibodies used are provided in the Supplementary Table 1. Representative images shown in main figures were cropped by randomly selecting the region of interest in the DAPI-stained channel only, with the original unedited images shown in Figure S1A, S2 and S3.

Single-cell RNA-sequencing

Cells were dissociated with Accutase with 10 units/mL of papain (Sigma-Aldrich) for 10 minutes at 37°C and resuspended in 0.5% bovine serum albumin (Sigma Aldrich) with 50 units/mL DNase I in DPBS without calcium or magnesium (Gibco) and passed through a cell strainer with 35 µm mesh. Cells were stained with 1 µM SYTO16 (Invitrogen) and 0.08% (v/v) propidium iodide (part of the Invitrogen ReadyProbes™ Cell Viability Imaging Kit, Blue/Red) for 20 minutes on ice and then dispensed into the nano-well plates of the ICELL8® cx Single-Cell System (Takara). Wells containing single viable cells were automatically selected using the ICELL8 cx CellSelect v2.5 Software (Takara) with the green and not red logic. Manual triage was performed to recover additional candidate wells that contain viable single cells. The library for sequencing was then prepared using the SMART-Seq ICELL8 application kit (Takara) following the manufacturer’s recommended protocol. Next-generation sequencing was performed using the NovaSeq6000 and the Xp Workflow on a S4 flow cell for 200 cycles of pair-end sequencing.

Single-cell RNA-sequencing analysis

Using the Cogent NGS Analysis Pipeline software v 1.5.1 (Takara), FASTQ files containing all indices for each chip were demultiplexed to FASTQ files containing one index per file. Adaptor sequences were removed using cutadapt 3.2 with the following settings: -m 15 --trim-n --max-n 0.7 -q 20. Trimmed FASTQ files were aligned to the Homo sapiens GRCh38.106 primary assembly with the BFP reporter gene attached to the end of the genome, using STAR 2.7.9a (Dobin et al. 2013) with the following settings: --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM -- outReadsUnmapped Fastx --outSAMstrandField intronMotif --chimSegmentMin 12 -- chimJunctionOverhangMin 8 --chimOutJunctionFormat 1 --alignSJDBoverhangMin 10 --alignMatesGapMax 100000 --alignIntronMax 100000 --alignSJstitchMismatchNmax 5 -1 5 5 --chimMultimapScoreRange 3 --chimScoreJunctionNonGTAG -4 -- chimMultimapNmax 20 --chimNonchimScoreDropMin 10 --peOverlapNbasesMin 12 - -peOverlapMMp 0.1 –alignInsertionFlush Right -- alignSplicedMateMapLminOverLmate 0 --alignSplicedMateMapLmin 30. After alignment, gene-level quantification was performed using featureCounts from subread 2.0.0 (Liao et al. 2013) with the following settings: -t exon --primary -R CORE -F GTF -Q 0 -B -g gene_id. The count matrix of each index was combined in R 4.2.0 (Team 2023).

All downstream analysis was performed in R 4.3.0 using Seurat 4.3.0 (Stuart et al. 2019). Gene-level filtering was applied by including only protein-coding genes with at least five total counts across all cells and being expressed in at least 1% of all cells. Poor quality cells were then identified using the is.outlier function from the scater 1.28.0 (McCarthy et al. 2017). Poor quality cells were defined as having a high percentage of mitochondrial gene count, or high or low the total number of genes detected, or high total gene counts. The thresholds of each metrics for each sample were determined as twice the median absolute deviation of the sample. Raw gene counts were log normalised with a scale.factor setting of 1×105. Data from the two batches of experiments were integrated using the FindIntegrationAnchors and IntegrateData based on the common top 2000 highly variable genes and the first 30 dimensions of principal components. The percentage of mitochondrial gene count and total gene count were regressed out using the ScaleData function. Principal component analysis (PCA) was performed on the top 2000 high variable genes and the number of principal components used for uniform manifold approximation and projection (UMAP) was determined using the JackStraw method (Chung and Storey 2015). UMAP and unbiased Louvain clustering was performed on the first 33 principal components. Pairwise differential gene expression analysis was performed using the MAST method (Finak et al. 2015) with “Chip” as the latent variable. Gene ontology enrichment analysis was performed using enrichGO function in the clusterProfiler 4.10.0 package (Stirling et al. 2021) with all genes in the filtered dataset as the background. GO term database was downloaded using the org.Hs.eg.db 3.18.0 package (Carlson 2019). Revigo v1.8.1 was used to group the representative GO terms based on semantic similarity using a size setting of 0.5, Homo sapiens database, and SimRel method for semantic similarity (Schlicker et al. 2006; Supek et al. 2011).

Published datasets were downloaded from NCBI’s Gene Expression Omnibus (Clough and Barrett 2016) and processed in R 4.2.0. Gene level filtering was performed by retaining only protein-coding genes with more than five total counts across all cells. Gene counts were normalised using the NormalizeData function (scale.factor settings are listed in Supplementary Table 2). PCA was performed based on the top 2000 highly variable genes to obtain the first 50 PCs. Visual inspection of the elbow plot was used to determine the number of PCs for downstream analysis. Batch effect between subjects was evaluated on the two-dimensional PC2∼PC1 plot. Where inter-subject batch effect was observed, Harmony integration was performed based on the PCs selected in the previous step (Supplementary Table 2). UMAP was performed based on either the PCA or Harmony reduction (using the top 30 dimensions), and Louvain clustering was performed (settings shown in Supplementary Table 2). Cluster identities were verified against the reported annotation where possible. For datasets without detailed annotation published or astrocyte lineage reported (Supplementary Table 2), reannotation was performed based on the expression of known markers (Figure S7A-E). Reference mapping was performed using FindTransferAnchors and TransferData function in Seurat based on the first 30 dimensions of either the PCA or Harmony loadings of the reference dataset.

Statistical analyses

All data were collected from at least three independent experiments and presented as mean ± standard error of means unless otherwise specified. Data were tested for normality with the Shapiro-Wilk test and for equal variance with Levene test before performing statistical analyses by two-way ANOVA with post-hoc Tukey test for multiple comparisons where relevant. Kruskal-Wallis test with post-hoc Dunn’s test for pairwise comparison was used where parametric test was not suitable. Effect size was calculated as Cohen’s f for ANOVA or eta squared based on the H-statistic for Kruskal-Wallis test. All statistical tests were performed in R4.3.0.

Supplementary Information

Original images of immunocytochemistry of d19 progenitors and gating strategy of BFP flow cytometry analysis.

A, the original images shown in Figure 1B-C. The gating strategy used for BFP flow cytometry analysis are shown in B-E and described in Methods and Materials. B, scatter plot of SSC-A versus FSC-A. C, scatter plot of FSC-H versus FSC-A. D-E, histogram of BFP fluorescence in the negative control (D) and BFP-expressing (E) samples.

Original images of immunocytochemistry of astrogenic markers shown in Figure 2A.

Original images of immunocytochemistry of astrocyte markers shown in Figure 2B.

Non-patterned astrocytes and fluorescence-activated cell sorting of astrocytes for single cell RNA sequencing.

A, Representative view of immunocytochemistry of astrocyte marker expression in non-patterned astrocytes (scale bar represents 100 µm). B, Quantification of astrocyte marker expression in astrocytes. Error bars represent SEM of three independent experiments. C-E, negative control samples; F-H, one sample of BFP+ astrocytes). C and F shows the dot plot of SSC-A versus FSC-A. D and G shows the dot plot of FSC-H versus FSC-A. E and H shows the dot plot of Alexa Fluor-647-A (labelling CD49f) versus DAPI-A (labelling BFP). P3 was used to isolate CD49f+ population (including both BFP+ and BFP-), while P4 was used to isolate CD49f+/BFP+ population.

Processing of single cell RNA sequencing data.

A-C, Violin plots showing the results of adaptive cell-level filtering based on the number of genes detected per cell (A), percentage of mitochondrial gene counts per cell (B), and total gene count per cell (C). D-E, UMAP plot of all filtered cells before (D) and after integration (E) coloured by chip. F-G, UMAP plot of the subset of sorted BFP+ astrocytes on Chip A and B derived from the same astrocyte differentiation before (F) and after integration (G) coloured by chip. (N: neuron; SBN: sorted BFP-; SBP: sorted BFP+; SNP: sorted non-patterned; UBN: unsorted BFP-; UBP: unsorted BFP+; UNP: unsorted non-patterned; numbers represent independent samples)

Marker expression in scRNAseq.

A-B, UMAP plot of all filtered cells after integration coloured by sample type and sorting status (A) and estimated cell cycle phase (B). C, Raincloud plots of normalised expression of BFP. D, Histogram of the distribution of raw gene count of BFP. E, Violin plots of the normalised expression of endoderm, mesoderm, neuroectoderm, and oligodendrocyte progenitor markers. F, Violin plots of the normalised expression of classic ventral midbrain markers.

Reference mapping to human brain single cell RNA sequencing datasets.

A-H, Sankey plot summarising the result of reference mapping of cells in different clusters to eight published reference human brain scRNAseq datasets. The thickness of the thread is proportional to the number of cells mapped to the same identity in the reference datasets (predicted identity). Cluster IDs in this study are shown on the left. Heatmap in Panel A-E shows the expression of marker genes in different clusters in the re-annotated reference datasets. I, Heatmap of prediction score from Seurat integration.

Antibodies used in this study.

Settings used for processing published datasets.

Acknowledgements

We would like to thank Mark Bishop and Joanne Morgan for conducting FACS and next generation sequencing, respectively. We thank Kathryn Peall and Laura Abram for providing iPSC-derived neurons for scRNAseq. We also thank the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via the Welsh Government.

Author Contribution

Z.L. and M.L. conceived the study and designed the experiments. Z.L. performed all cell experiments and analysed all data. L.C. and M.L. designed the lineage tracing system, and L.C. generated the cell line with assistance from Z.L. M.R. performed iCELL8 library preparation. J.M.S., F.W., and V.V. provided guidance and contributed to general discussions on scRNAseq data analysis. C.W. contributed to the design of scRNAseq and general discussions throughout the work. Z.L. and M.L wrote the paper. All authors edited and approved the paper.

Funding

This work was supported by the UK Dementia Research Institute, jointly funded by the UK Medical Research Council, Alzheimer’s Society and Alzheimer’s Research UK, to C.W. (MC_PC_17112) and Z.L. (DRI-TRA2021-02), and a seed corn fund to Z.L. from the Neuroscience and Mental Health Innovation Institute, Cardiff University. Z.L. was funded by a UK Dementia Research Institute PhD studentship.

Data Availability

The sequencing data discussed in this publication have been deposited in NCBI’s Gene Expression Omnibus (Clough and Barrett 2016) and are accessible through a GEO Series accession number which will be made available upon final publication.

Declaration of interests

The authors declare no competing interests.