Quantifying changes in the T cell receptor repertoire during thymic development

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

One of the feats of adaptive immunity is its ability to recognize foreign pathogens while sparing the self. During maturation in the thymus, T cells are selected through the binding properties of their antigen-specific T-cell receptor (TCR), through the elimination of both weakly (positive selection) and strongly (negative selection) self-reactive receptors. However, the impact of thymic selection on the TCR repertoire is poorly understood. Here, we use transgenic Nur77-mice expressing a T-cell activation reporter to study the repertoires of thymic T cells at various stages of their development, including cells that do not pass selection. We combine high-throughput repertoire sequencing with statistical inference techniques to characterize the selection of the TCR in these distinct subsets. We find small but significant differences in the TCR repertoire parameters between the maturation stages, which recapitulate known differentiation pathways leading to the CD4⁺ and CD8⁺ subtypes. These differences can be simulated by simple models of selection acting linearly on the sequence features. We find no evidence of specific sequences or sequence motifs or features that are suppressed by negative selection. These results favour a collective or statistical model for T-cell self non-self discrimination, where negative selection biases the repertoire away from self recognition, rather than ensuring lack of self-reactivity at the single-cell level.

Editor's evaluation

This paper addresses an important question within adaptive immunity, namely whether the T cell receptor (TCR) repertoire of negatively selected thymocytes shares common features. The authors analyze T cell receptor sequences from mice as they progress through positive selection, CD4/CD8 lineage commitment, and negative selection. Thereby they find small but consistent differences between the repertoires at these selection stages, providing arguments that their findings do not indicate any sequence-specific selection.

https://doi.org/10.7554/eLife.81622.sa0

Introduction

In order to protect themselves against infection, jawed vertebrates have evolved an adaptive immune system. T lymphocytes play a leading role in this system. Each T lymphocyte expresses a unique T-cell receptor (TCR) capable of binding short protein fragments presented by the host’s Major Histocompatibility Complexes (MHC), subsequently triggering clonal expansion and differentiation of immune effector function. The T cell system discriminates pathogen derived ‘foreign’ proteins from the body’s own ‘self’ proteins, in such a way that an immune response is usually triggered only by peptides from exposure to a potentially harmful threat. We ask if we can identify specific TCR features which allow the system to discriminate foreign and self-peptides.

TCRs are generated in a stochastic assembly process based on random recombinations of genomic templates and additional non-templated insertions and deletions (Hozumi and Tonegawa, 1976). The ability to discriminate between self and non-self targets cannot therefore be exclusively inherited, but must at least in part be learned afresh in each individual. This process is widely believed to occur during the development of haemopoetic precursors into mature T cells, which occurs in a specialized microenvironment within the thymus. This process has been studied in considerable detail. T cells precursors first produce a β chain and if the generated chain is functional, the cell proliferates and an α chain is generated. While the TCR chains are being assembled, CD4 and CD8 surface markers are expressed as precursor cells transit to the Double Positive state (DP). DP TCR are subject to thymic selection, a process that tests receptor binding by presenting them with the organism’s own proteins, and eliminates very weak binders (positive selection), but also too strongly self-reactive receptors (negative selection) (Yates, 2014). During thymic selection, DP cells differentiate into CD4⁺ or CD8⁺ cells by keeping expression of only one of these molecules, which determines their function. While this picture is well-established and the maturation trajectory has a well established gene expression signature (Park et al., 2020), the TCR sequences removed during thymic selection, which should be manifested as ‘holes’ in the repertoire, have never been directly observed. The lack of quantifiable signatures of thymic selection, differentiation and proliferation hinders a dynamic description of TCR maturation (Robert et al., 2021).

Positive and negative selection imposes upper and lower boundaries on the binding energy of the interaction between TCR and self peptide-MHC complexes (Kosmrlj et al., 2009). However, it remains unclear whether every thymocite is exposed to every self-antigen, or how efficient the process of selection is. Negative selection is known to be leaky (Yu et al., 2015), letting auto-reactive cells differentiate into regulatory cells (Bains et al., 2013; Wing and Sakaguchi, 2010). The efficiency of negative selection for the naive conventional (non-regulatory) effector T cell compartment remains unclear (Yu et al., 2015; Gallegos and Bevan, 2006). Partial or incomplete negative selection may limit its impact on the repertoire.

The difficulty of characterizing selection is partly due to survivor bias when sampling functional immune repertoires in the periphery (Madi et al., 2017; Madi et al., 2014; Izraelson et al., 2018; Sethna et al., 2017). To overcome this limitation, we sequenced the TCR repertoire of thymocyte subpopulations isolated from mice carrying a reporter transgene linked to Nur77, a marker of T cell activation both within the thymus and in the periphery. Nur77 expression, in combination with Annexin V, a marker of cell death, allows us to identify cells that are more likely to pass thymic selection, and those that are most likely not to pass selection. Although the CD4+CD8+Annexin V population may still contain some cells which will be negatively selected, but have not yet expressed Annexin V, the overall strategy provides us with a window into the repertoire at various stages of selection. By comparing the sequenced repertoires to statistical models of mouse TCR generation (Sethna et al., 2017), and subset-specific models of thymic selection, we searched for specific TCR sequence features that correlate with the different stages of intra-thymic T-cell developement.

Results

Tracking T cell development stages by flow cytometry

To identify specific sequence features of TCR during each step of thymic selection, we performed high-throughput sequencing of TCR repertoires from different subpopulations of thymocytes from transgenic Nur77 reporter expressing mice. These mice carry a fluorescent reporter gene which is co-expressed with Nur77, a marker of T cell activation (Liebmann et al., 2018). Three genetically identical Nur77 reporter mice were sacrificed at the age of 6 weeks, when thymus development is completed and its cell population is stable (Gray et al., 2006). All animals were handled according to Weizmann Institute’s Animal Care guidelines, in compliance with national and international regulations. Thymus and spleen were removed, and stained for fluorescence-activated cell sorting (see Materials and Methods). The cells were sorted based on Nur77 reporter expression (to detect activation), Annexin V (to detect early apoptosis) in combination with CD3, CD4, and CD8 cell surface markers. We used the gating strategy illustrated in Figure 1A, B and C to isolate double positive DP cells preceding selection (CD4⁺CD8⁺, Nur77^-, Annexin V^-: DP pre), DP cells in the process of being positively selected (CD4⁺CD8⁺, Nur77⁺ Annexin V^-: DP pos), DP cells dying by neglect or possibly by damage during the preparation (CD4⁺CD8⁺, Nur77 Annexin V⁺: DP dbn); and single positive (SP) cells: CD4⁺CD8^-, Annexin V⁺ (CD4 apo), and CD4^-CD8⁺, Annexin V⁺ cells (CD8 apo). The Annexin V staining was not very strong and did not give a very clear separation between positive and negative populations. In addition, Annexin V⁺ subsets may be contaminated by cells that are dying for other reasons than negative selection. Nevertheless, we may still assume that the two apo subsets are enriched in negatively selected cells. In addition, we sequenced the repertoires of mature (post-selection) single positive SP CD4⁺ and CD8⁺ cells from the spleen (CD4 spl and CD8 spl). The proposed differentiation pathway between these populations at different maturation stages are schematically represented in Figure 1D. Together, these seven repertoires should contain both the selected thymocytes and the pre-selection repertoires, as well as the thymocytes that fail either positive or negative selection and die in the thymus.

Figure 1 with 1 supplement see all

Download asset Open asset

Experiment outline and repertoire sampling.

(A) Flow cytometry scatterplots of T cell population from the thymus according to the markers CD4 and CD8. (B) The DP population is separated from DN according to CD3 expression (insert). Cells are then FACS sorted according to the expression of Nur77 and AnnexinV. (C) CD4 cells in the spleen (above) and CD8 (below) are FACS sorted according to the expression of CD3 and AnnexinV. (D) Schematic evolution of the sampled cell types during thymic maturation. (E) Analysis workflow: annotated reads in sampled repertoires are input for model inference (see Materials and Methods). Out-of-frame TCR sequences are pooled from all mice and stages to learn a generation model. In-frame sequences are used to learn maturation stage specific selection models with the generation model as background.

TCR repertoire sequencing

We sequenced and annotated the TCR repertoires of each subset as described in Materials and Methods. The cDNA of individual α and β genes (TRA and TRB) were barcoded with unique molecular identifiers (UMI) in order to allow for correction of sequencing errors and PCR bias. However, in this analysis we focused on unique sequences (discarding count information) to avoid expression and amplification biases. As a quality control of the whole procedure, we showed that the number of α and β sequences within each population was highly correlated (Figure 1—figure supplement 1A). We further verified that the relative fraction of TCRα sequences associated with iNKT cells (identified by TRAV11 and TRAJ18 genes [Garner et al., 2018]) is higher in CD4 than in CD8 cells (see Figure 1—figure supplement 1B).

We obtained seven datasets for both chains and for each of the three mice. A small fraction of sequences contain stop codons, usually because of a frameshift in the CDR3. These sequences likely come from transcription from a chromosome carrying a nonproductive chain, which is known to persist despite allelic exclusion acting on the TRB locus. The rest of the sequences are assumed to be productive. Since nonproductive TCR owe their survival to the productive gene on the other chromosome, they are not affected by selection. We combined all nonproductive sequences from all subsets to infer a generative mechanistic model of the V(D)J recombination process using IGoR (Marcou et al., 2018). Once trained, the model can be used to assign a generation probability $P_{gen}$ to any TCR sequence observed (Murugan et al., 2012; Marcou et al., 2018; see Materials and Methods and Figure 1E).

The datasets contain ∼1000–50,000 unique productive sequences per subset (Figure 1—figure supplement 1C for the α chain and Figure 1—figure supplement 1D for the β chain). Since the 3 mice were isogenic and shared the same MHC haplotype, we expect their repertoires to be subject to the same processes of recombination and selection (Madi et al., 2014). Unless specified otherwise, all downstream analyses were therefore carried out on pooled productive TCR sequences from each population from the three individuals to increase statistical power.

Repertoires from different T cell populations have different statistical parameters

To assess how selection acts at the different maturation stages, we studied the distribution of sequence features in TCRα repertoires. We compared TRAV and TRAJ gene usage at the different maturation stages with each other and with their excepted frequency from the generation model learned from nonproductive sequences, which we will refer to as the pre-selection model or $P_{gen}$ . TRAV usage broadly follows the pattern of the pre-selection model (Figure 2A), although SP CD4⁺ repertoires have a lower proportion of TRAV12-2, and most populations have an increased proportion of TRAV7-2. TRAJ gene usage also broadly agrees with the pre-selection model predictions (Figure 2—figure supplement 2A), although SP CD8⁺ repertoires have a lower proportion of TRAJ31, SP CD4⁺ repertoires have an increased proportion of TRAJ27 and TRAJ32 which is underrepresented in all cell types. For both V and J genes, we see little difference between the repertoires of spleen CD4 and CD8 cells, and their discarded counterparts in the thymus (apo). We also observe strong similarities between all the DP subsets. TRB gene usage follows similar trends, although there are some differences in J gene usage between selected and unselected SP CD4⁺ and CD8⁺ cells. Overall, the biases of the recombination process dominate any effects of selection on V and J region usage (Figure 2—figure supplement 1A, Figure 2—figure supplement 2B).

Figure 2 with 3 supplements see all

Download asset Open asset

Properties of the α chain sequence (the analogous plot for the β chain is showed in Figure 2—figure supplement 1).

The color code is common to all subplots. (A) TRAV gene distribution at different maturation stages compared to the pre-selection model distribution $P_{gen}$ (see Figure 2—figure supplement 2A for TRAJ). Only the most frequent according to the $P_{gen}$ model are reported. Errorbars correspond to the empirical standard deviation across the three different mice. (B) CDR3 length distribution of TCRα sequences. The errors associated with mouse variability are minor and illustrated via the shaded curves. See Figure 2—figure supplement 3A for individual curves. The dashed line is the average CDR3 length from the $P_{gen}$ model. Standard deviations of the average length distributions are shown at right. (C) Distribution of the most frequent amino acids at different maturation stages. The counts correspond to the number of observations within the CDR3 (i.e. excluding the first two and the last positions), summed for all the sequences in the subpopulation. Error bars represent the empirical standard deviation across mice. (D) Principal component analysis of the TRAV gene distribution at each maturation stage. Insert: projection on the principal axis of the five most abundant TRAV genes (see Materials and Methods). Analogous results for TRAJ are shown in Figure 2—figure supplement 2C. Source code available at https://github.com/statbiophys/thymic_development_2022/blob/main/fig2.ipynb.

For both chains, CDR3 amino acid length of SP CD4⁺ and CD8⁺ has a sharper distribution compared to earlier maturation stages (DP) (see Figure 2B, Figure 2—figure supplement 1B and Figure 2—figure supplement 3). This has previously been interpreted as a signature of selection due to structural constraints on the pMHC-TCR complex (Madi et al., 2017; Lu et al., 2019; Carter et al., 2019). We also compared the single amino acid usage (excluding the constant regions) across the different repertoires (Figure 2C for α chain, Figure 2—figure supplement 1C for β chain). We observe similarities between the DP stages, the CD4 stages and the CD8 stages, as observed for the gene usage. The repertoires from different maturation stages cannot be distinguished by any one individual feature discussed above. However, Principal Component Analysis (PCA) on the TRAV gene usage distributions in individual mice at different stages identified clusters of related cell types (Figure 2D). The DP Nur77^- populations cluster with the pre-selection model, the SP CD4⁺ and CD8⁺ populations form distinct clusters, and the DP pos Nur77⁺ cells, which we hypothesise are cells in the process of positive selection, occupy an intermediate position between these three clusters. This pattern is consistent with the known developmental trajectory as illustrated by the arrows in Figure 2D. PCA of TRAJ usage also shows similar clustering patterns (Figure 2—figure supplement 2C). The PCA of TRBV and TRBJ usage also discriminates between SP CD4⁺ and CD8⁺ populations, and from the pre-selection populations, although the overall pattern is less clear (Figure 2—figure supplement 1D and Figure 2—figure supplement 2D). The overall distribution of TCR generation probabilities, $P_{gen}$ , does not change from the pre-selection and post-selection thymic stages to the mature peripheral SP repertories (Figure 2—figure supplement 2E, F), consistent with previous reports comparing thymic and peripheral repertoires (Sethna et al., 2017). In summary, the effects of selection impose subtle changes on the pattern of TCR variable gene usage, which cannot be adequately captured by looking at any single V or J gene, but only by a combination of features.

V and J gene usage, and CDR3 length are coarse grained measures of a TCR repertoire. We therefore explored whether the repertoires of different maturation stages could be linked to more precise features of the TCR sequence, in particular incorporating the sequence of the CDR3. We encoded each TCR as a sparse {0,1} binary vector $\vec{σ}$ which captures V gene, J gene and CDR3 amino acid sequence (for details see Materials and Methods). We then trained a logistic regression model on the set of $\vec{σ}$ from repertoires of different subsets. We trained and tested the classifier to distinguish pairs of repertoires from different subsets. The classifier achieved only moderate Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) scores (Figure 3A for TCRα, and Figure 3—figure supplement 1A for TCRβ), in agreement with previous studies (Emerson et al., 2013; Isacchini et al., 2021). We verify that this result is not an artifact introduced by pooling repertoires of different mice, by testing the same techniques on the individual with the largest datasets (mouse 3). The AUC scores for the α and β repertoires are shown respectively in Figure 3—figure supplement 1C, D.

Figure 3 with 2 supplements see all

Download asset Open asset

AUC scores for the pooled TCRα datasets.

(A) Area under the curves (AUC) values computed from Receiver Operating Characteristic (ROC) curves of linear classifiers of TCRα between two subsets. The training/testing set is a random subsample containing 70%/30% of the full dataset at a given maturation stage. (B) ROC curves for classifying a group of $m$ sequences from the same maturation stage, between CD4 spl and CD4 apo (red frame in A), illustrating the improvement with increasing number of TCRs. See Figure 3—figure supplement 1 and (B) for the analogous analysis on TCRβ. Source code available at https://github.com/statbiophys/thymic_development_2022/blob/main/fig3.ipynb.

Controls in which population labels were shuffled, resulted in AUC close to 0.5 (Figure 3—figure supplement 2A, B for the α chain, Figure 3—figure supplement 2C, D for β). The results shown in Figure 3 indicate that the TCR populations differ at a statistical level (i.e. have different distributions of sequence features), but that each individual TCR is only a weak predictor of repertoire class. However, better classification efficiencies can be achieved by combining the predictions from sets of TCRs. For example, multiplying the predictions from 30 TCR sequences from the same repertoire (Figure 3B), we can distinguish CD4 spl and CD4 apo TCRα with an AUC score of >0.85; see Figure 3—figure supplement 1B for TCRβ. Thus, statistical properties of a repertoire can distinguish it from another repertoire, even when the feature distributions of individual TCRs are largely overlapping.

Selection models and n-grams capture the relations between the stages of thymic development

A number of studies have highlighted the importance of short amino acid motifs ( $k$ -mers or $n$ -grams) within the CDR3 sequence in determining TCR specificity (Thomas et al., 2014; Sun et al., 2017; Cinelli et al., 2017) (see Figure 4A). Specifically, $n$ -grams can be used to reduce the dimensionality of the TCR space, while capturing amino acid correlations or patterns which might play a role in antigen recognition. We therefore counted the frequency of $n$ -grams in each repertoire. We excluded from the analysis the most conserved regions (the first two positions in the CDR3 that are usually a cysteine and alanine, and the last one, typically a phenylalanine). We then used these $n$ -gram frequency distributions to calculate the diversity of the repertoire as quantified by the Shannon entropy $S$ (see Materials and Methods). In practice, the Shannon entropy is computationally too expensive to calculate exactly for very large data sets, and we therefore restricted our analysis to $n$ -grams of length 4 or less, using the approximate Nemenman-Shafee-Bialek (NSB) entropy estimator (Nemenman et al., 2002) to correct for finite sampling bias (see Materials and Methods). This estimator outcompetes alternative entropy estimators on synthetic data (Figure 4—figure supplement 1). Our analysis combines together CDR3 of different amino acid lengths which may influence the entropy measurements. However, detailed analysis of the entropy of DP repertoires, using different CDR3 lengths separately, demonstrated that the differences observed due to the lengths effects were small compared to error due to sequencing (Figure 4—figure supplement 1). Another advantage of the Nemenman-Shafee-Bialek estimator is that it was shown to converge at the sizes of the smallest datasets (∼10³-10⁴ clonotypes), as reported in Figure 4—figure supplement 2. Once computed the set of entropy measurements based on $n$ -gram frequencies for each different repertoire, we compared the data-derived entropy measurements with the prediction of a simple generative model of each repertoire which treated each feature of each TCR (V gene, J gene and each CDR3 amino acid) as independent. Taking the set of TCR vectors $\vec{σ}$ we fitted a set of parameters $E_{stage}$ by maximising the posterior probability over all of the TCRs for each repertoire separately $P_{stage} (\vec{σ}) = (1 / Z) e^{- E_{stage} (\vec{σ})} P_{gen} (\vec{σ})$ , where $P_{gen} (\vec{σ})$ are the pre-selection generative probabilities for all the TCRs, $E_{stage} (\vec{σ})$ is a linear function of the features (Elhanati et al., 2014; Sethna et al., 2020), and $Z$ is a normalization factor (Figure 1E and Materials and Methods). The enrichment factors $E_{stage} (\vec{σ})$ encode the intuition that due to selection, a given TCR in a given repertoire is seen with higher or lower frequency than expected by the pre-selection generation model. Once we had learnt the enrichment factors for each repertoire, we used the resulting model to generate in silico synthetic repertoires of 3 × 10⁶ TCRs, and recalculated $n$ -gram frequency distributions and entropy estimates for each synthetic repertoire.

Figure 4 with 8 supplements see all

Download asset Open asset

$n$ -gram frequency discriminates between repertoires.

(A) $n$ -gram definition. We count how many times $n$ -gram amino acid subsequences are seen in the CDR3 across a repertoire. (B) Shannon entropy $S$ of the $n$ -gram distributions normalized by $n$ for the maturation stages. The entropy is estimated with the Nemenman et al., 2002 estimator and it is expressed in bits. The error on the estimated Shannon entropy from data is estimated from the sequencing error (see Materials and Methods). (C) Clustering according to Jensen-Shannon divergence between the 3-gram distributions computed from the selection model $P_{stage}$ on synthetic repertoires. Dendrogram are computed with the Ward method (see Materials and Methods). (D) Clustering based on Jensen-Shannon divergence for the full $P_{stage}$ selection model using $P_{stage}$ . Source code available at https://github.com/statbiophys/thymic_development_2022/blob/main/fig4.ipynb.

The comparison of the estimated entropy for each $n$ -gram length, and each subpopulation of T cells, using both directly data-derived and model-derived repertoires is illustrated for TCRα (Figure 4B) and TCRβ (Figure 4—figure supplement 3A) chains. An upper bound for the entropy is given by uniformly distributed amino acids, $S_{\max} / n = \log_{2} 20 \sim 4.3$ bits, while amino acids distributed according to their frequency in the overall vertebrate proteome gives a slightly smaller value of ∼4.2 bits per position (King and Jukes, 1969). Both the observed and model-derived entropies are less than this maximum even for single amino acids ( $n$ -grams of length $n = 1$ ), and decrease further with $n$ -gram length (see Figure 4—figure supplement 3B, C). This reflects strong bias on the abundance of individual amino acids, and strong correlations between amino acids within the CDR3 which are observed in all CDR3 repertoires, and are captured by the frequency distribution of the longer $n$ -grams. Two additional important points can be observed. First, the entropy of the repertoires after selection and lineage commitment (in the single positive populations) is less than the earlier pre-selection DP repertoires, which match closely the entropy of the pre-selection generative model (shown by the dotted line for each $n$ -gram length). This decrease becomes more evident with longer $n$ -gram length (the circles lie below the dotted lines). Thus, as predicted, selection does impose some decrease in repertoire diversity, although this is a much smaller effect than the decrease in diversity imposed by the generation process itself. The second key observation is that the entropy calculated directly from $n$ -gram frequency in the data is very similar to that of the synthetic repertoires generated using the linear generative models in which individual TCR amino acids are treated as independent variables. Thus, at least at the level of diversity of $n$ -grams, there is no evidence that selection at any step involves complex sequence motifs, or amino acid interactions, which would not be captured by the linear model. We looked in more detail at the $n$ -gram ( $n = 3$ ) distributions derived by the linear selection models for the different maturation stages. A plot of the Jensen-Shannon divergences (JSD) between all pairwise comparisons largely recapitulated the expected relationships between the subsets, with DP pre and DP dbn clustering with the pre-selection generative model, while the single positive CD4 and CD8 populations clustered separately, and DP pos have an intermediate position (Figure 4C). A comparison for both TCRα and β for different $n$ is shown in Figure 4—figure supplement 4. Since some differences between populations were seen even for amino acid usage ( $n = 1$ ), we compared the discriminatory power of models based on $n$ -grams with $n = 1$ and $n = 3$ (Figure 4—figure supplement 5). The 3-gram model outperformed the 1-gram model in almost all cases. We can go beyond $n$ -grams and use the subset-specific $P_{stage}$ models to predict the entropy of the full sequence (Materials and Methods), shown in Figure 4—figure supplement 6A for $α$ and Figure 4—figure supplement 6C for β. This entropy is substantially reduced from generation to the DP stages, and further reduced in the single positive stages, especially in CD4⁺ subsets. We also computed the $JSD$ of the distributions $P_{stage}$ between subtypes (Figure 4C for TCRα and Figure 4—figure supplement 6D for TCRβ). These JSD showed similar patterns as with $n$ -grams, except for CD8⁺ spleen cells showing more similarity to the $P_{gen}$ distribution in TCRβ. Note that the absolute values of the entropies and JSD are larger, since they include information about longer sequences, with additional V and J gene usage information. In summary, we fitted the data with a set of stage-specific generative models based on linear weighted combinations of TCR sequence features. The repertoires generated by this model accurately estimate the sequence and $n$ -gram entropy derived directly from the data, and generate repertoires which differ in a small but reproducible manner from each other. The magnitude of these differences reflect the expected developmental relationships between the different populations.

Models capture modulations of hydrophobic residues in different subpopulations

We inspected single amino acid usage in terms of the model marginals to check for relative positional enrichments between pairs of repertoires (Equation 3), but we did not observe any striking signal for amino acid charge properties. The logo plots with a visualization of this results are shown in Figure 4—figure supplement 7. Hydrophobic residues in the central positions of the CDR3 have been reported to be enriched in the TCRs of regulatory versus conventional T cells (Lagattuta et al., 2022). This suggests hydrophobicity may function as a proxy for auto-reactivity, and might be enriched in cells selected for negative selection (Stadinski et al., 2016; Daley et al., 2019). To test this idea, we defined a stage-specific hydrophobicity score $U$ , obtained by summing the enrichment factors of hydrophobic residues CFILMWY at central positions of the CDR3 as learnt by our model at each stage (see Equation 4 in Materials and Methods).

We observe a clear increase of this score from DP pre to DP pos, suggesting that positive selection introduces a bias toward more hydrophobic TCRs (Figure 4—figure supplement 7 and B). The score also decreases in the single positive sets (CD4 and CD8), in agreement with the known role of negative selection to prune too strongly self-reactive T cells (Butler et al., 2013). Finally, AnnexinV+ single positive sets (‘apo’) show a slightly higher score than their respective spleen (‘spl’) sets (with the exception of the CD8 α chain scores). Overall, these changes in hydrophobicity are consistent with the hypothesised position of the different populations defined in our study in the stages of TCR selection.

Note that this score (like other scores found in the literature Isacchini et al., 2021; Lagattuta et al., 2022) is statistical and can not be used to classify individual sequences. To assess how much of single-sequence discriminability is explained by the presence of hydrophobic residues, we then introduced an empirical ‘hydrophobicity index’ $u$ , here defined as the number of hydrophobic residues (again CFILMWY) contained in the CDR3, normalized by its amino acid length (see Materials and Methods, Equation 5). The classifiers using this feature yielded poor performance (Figure 4—figure supplement 8C, D), worse than the 1-gram models (Figure 4—figure supplement 5A, B).

Discriminatory power of thymic selection

The stage-specific enrichment factors in the generative models described above can be considered as capturing the combination of features which drives a particular selection step. A prediction of this idea is that, at each selection point, the TCRs which are selected and those which are not would have a distribution of model probabilities ( $P_{stage}$ ) which are anti-correlated. For example, a TCR that is present in the DP pos repertoire but “forbidden” from the CD4 repertoire (e.g. because of cross-reactivity to a Class II self pMHC) would be expected to have a large positive $P_{DP pos}$ and a $P_{CD4 spl} \approx 0$ , reflecting the large enrichment factor between these two populations. A toy example illustrating this idea is illustrated in (Figure 5A). We consider a simple model in which TCRs are selected according to their CDR3 length into a ‘long’ population with probability $P (long | L) = L^{h} / (L^{h} + L_{0}^{h})$ and into a ‘short’ population otherwise. We apply this selection process in silico to $P_{gen}$ -generated TCRs, and fit a separate $P_{population}$ model on the synthetic sequences found in each subset. We then calculate $E_{population}$ for each TCR according to both subset models, and plot these values against each other. The distribution of enrichment strengths according to the two models are clearly anti-correlated (Figure 5A). In other words, if a TCR is more likely to be classified as a ‘long’ sequence, it is in general less likely to be classified as a ‘short’ one. Interestingly, however, the enrichment strengths distributions from the two models are significantly overlapping. As a result, attempts to classify individual TCRs according to their enrichment strengths is poor, AUC∼ 0.57. We then consider a different toy model where TCRs are selected according to the empirical hydrophobicity index $u$ (Equation 5). Similarly, we choose to generate a synthetic ‘high hydrophobicity’ ( $HH$ ) population filtering $P_{gen}$ -generated TCRs with a probability $P (HH | u) = u^{h} / (u^{h} + u_{0}^{h})$ , otherwise ‘low hydrophobicity’. Repeating the analysis performed with the length example, we observe in the corresponding scatterplot that enrichment strengths are again anti-correlated (Figure 5B).

Figure 5 with 1 supplement see all

Download asset Open asset

Density scatter-plots of TCRα sequences comparing the selection energies learnt at two different stages.

(A) Synthetic example of soft discrimination between ‘short’ and ‘long’ CDR3, where sequences are randomly assigned into either of the two populations with a bias that depends on their CDR3 length. The density scatter plot shows a clear anti-correlation between the selection energies learnt from these two populations. Yet, sequence classification is imprecise, as quantified by the low AUC = 0.57. The parameters chosen for the filter in this example are $L_{0} = 13$ and $h = 2$ . (B) Synthetic example of soft discrimination between ‘low’ and ‘high hydrophobic’ CDR3 showing clear anti-correlation between these two populations. Sequence classification is again poor AUC = 0.60. The parameters chosen for the filter on the ‘hydrophobic index’ $u$ in this example are $u_{0} ≃ 0.2$ (the median value over a set of $P_{gen}$ -distributed sequences) and $h = 1$ . (C) The differential enrichment parameter of each TCR calculated according to $P_{DP dbn}$ model is plotted against the energy calculated against the $P_{DP pos}$ model. To correct for bias imposed by the TCRα generation process, the DP pre energy, which encodes background selection common to both stages, is subtracted. The black line is the direction of the major eigenvector of the dots moments matrix. The value $r$ reported in each plot is the Pearson’s correlation coefficient (see Materials and Methods). (D) Differential enrichment parameter according to CD4 spl and CD8 spl models, relative to DP pos. (E) Differential enrichment parameter according to CD4 spl and CD4 apo models, relative to DP pos. (F) Differential enrichment parameter according to CD8 spl and CD8 apo models, relative to DP pos. Source code available at https://github.com/statbiophys/thymic_development_2022/blob/main/fig5.ipynb.

We extended this approach to look for relationships between enrichment strengths for TCRs at different developmental stages. Since all cells pass through a preceding selection stage, we must consider it as a common background distribution for all the successive thymic stages. We therefore considered the differential enrichment parameter $E_{stage} - E_{pre - stage}$ , a linear operator which predicts whether a sequence is more or less likely to be present in a particular developmental stage as compared to the previous stage. We generated a set of sequences using the generation model $P_{gen}$ (thus with no selection bias), and then computed differential enrichment parameters for each TCR according to all the stage specific models. The full set of pairwise correlations between enrichment values for the different populations relative to $P_{DP pre}$ are shown in (Figure 5—figure supplement 1A for TCRα and Figure 5—figure supplement 1C for TCRβ). The DP dbn repertoire showed a narrow distribution of values, which was uncorrelated to any other subset, in particular to DP pos (Figure 5C). This would be consistent with the DP dbn repertoire containing a random sample of the DP pre repertoire, unrelated to its TCR sequence. To check if the signal coming from DP pos stage is the principal cause of the high correlation between the single positive stages, we repeated the analysis for CD4 and CD8 using $P_{DP pos}$ as the common background distribution (the full set of scatterplots for TCRα is shown in Figure 5—figure supplement 1B, in Figure 5—figure supplement 1D for TCRβ). There was therefore no evidence of selection pressure operating on TCR sequence to distinguish these two populations. The correlation between the CD4⁺ and CD8⁺ subsets was negligible ( $r \sim 0$ ), suggesting that the selection pressures operating on the two populations are distinct (Figure 5D). The spleen SP and the thymic apo populations were also highly correlated for both CD4⁺ and CD8⁺ cells ( $r = 0.79$ for CD4 spl vs CD4 apo, in Figure 5E; $r = 0.66$ for CD8 spl vs CD8 apo, in Figure 5F). Similar results are obtained for the sequences of the β chain (Figure 5—figure supplement 1C D). In contrast to the examples illustrated above, most plots showed a positive correlation between enrichment values for two models. Thus a common dominant selection process is driving the repertoire shift between the DP pos and all subsequent stages, which dominates the impact of individual stage-specific selection processes. In summary, the TCR enrichment value distributions differ between different thymic populations, but do not show evidence of dominant exclusive sequence-based selection operating at any step of the selection process.

Discussion

Thymic selection is often portrayed as a simple discrimination process that eliminates TCRs capable of strongly binding any self-peptide, while promoting TCRs that bind them weakly. However, this simple picture has been challenged and the fidelity of the negative selection process and the proportion of the self-repertoire which can effectively be scanned by each individual thymocyte during the window of negative selection remains incompletely understood (Yu et al., 2015; Gallegos and Bevan, 2006). If significant number of T cells escape negative selection and enter the peripheral repertoire, no sequence feature will unambiguously distinguish TCRs from pre and post-selection repertoires. Many efforts have been made to connect TCR sequences to peptide recognition (Weber et al., 2021; Montemurro et al., 2021; Isacchini et al., 2021). However, these approaches cannot yet be used to define the target peptidome of entire repertoires. Here we take the complementary approach, by looking for TCR sequence features that are linked to thymic selection.

Although there has been a lot of work on understanding and modeling thymic development (Yates, 2014; Robert et al., 2021) our study presents the first comprehensive analysis of TCR repertoire of different developmental stages of thymic maturation. By incorporating a reporter for the activation marker Nur77, which is activated during thymic selection, and an early marker of apoptosis, Annexin V, we were able to enrich for identifying subpopulations during the process of positive or negative selection. Although this more sophisticated strategy in principal allows the unbiased isolation of the major stages of thymic selection, some limitations remain. For example, the time interval during which negatively selected cells survive after they received their instruction to go into apoptosis may depend on signal strength. If strong TCR singal strength translates into short subsequent lifetime, then the AnnexinV+ cells sorted may be enriched for cells receiving a rather weaker negative signal. We examined the repertoires from two perspectives. In the first part of the paper, we compare statistical properties of the sequences of the repertoires using features of different dimensionalities, which include V gene, J gene and CDR3 length frequency distributions, and individual CDR3 sequences represented as sparse {0,1} binary vectors. The analysis incorporated both coarse-grained (V, J and CDR3 length) and fine-grained (individual CDR3 sequence) features, and the results were remarkably consistent. No single feature adequately discriminated between any pair of repertoires. Combination of features when averaged across a repertoire did show subtle but reproducible differences between repertoires, which could be used to discriminate between subpopulations using both unsupervised (PCA) and supervised (logistic regression) analysis. Furthermore, the difference between these statistical parameters captured the known developmental trajectory of thymic development, illustrated schematically in Figure 1D. Interestingly, the smallest distances observed were between mature CD4 or CD8 cells, and their thymic SP negatively selected (apo) counterparts. This suggests that negative selection of single positives is only weakly associated with the sequence properties of single TCRs, or at least single chains. It is in principle possible that larger differences exist in the paired α-β repertoires, which would not be detectable in either the alpha or beta repertoires alone, but previous work on the functional alpha-beta repertoire has suggested that pairing was largely random, with weak associations between some germline genes (Grigaityte et al., 2017; Dupic et al., 2019).

An additional possibility which must be considered is that Annexin V staining does not exclusively capture the negatively selected population, but also identifies cells which were damaged during the preparation. Contamination of the AnnexinV+ population by these damaged or dying cells will weaken the selection signature observed, although the fact we do manage to discriminate between the apo and spleen subpopulations (Figure 3A) indicates that these differences, however small, do exist. Conversely, cells marked for deletion may not have the time to express Annexin V, so that the DPpos subset may contain cells that are being negatively selected against, in addition to cells that are being positively selected (Stritesky et al., 2013).

A limitation of our sorting strategy is that we do not identify Treg from conventional CD4⁺ T cells. It has been suggested that regulatory T cells (Tregs), which are more auto-reactive and should thus bear the same marks as the cells that fail negative selection, have distinctive TCR features, notably the presence of hydrophobic residues at key positions (Stadinski et al., 2016; Daley et al., 2019). TCR scores based on more detailed features than hydrophobicity have been proposed (Isacchini et al., 2021; Lagattuta et al., 2022). We note that these scores are statistical and do not classify individual sequences. Consistent with these previous results, we can project our model parameters to build a single hydrophobicity index, which we observe to be significantly increased in positively selected cells (DP pos) versus DP pre, and decreased in single positive sets (Figure 4—figure supplement 8A, B). Beyond hydrophobicity, it remains an open question whether the features that drive Treg fate are the same that drive negative selection.

Although the statistical properties of the repertoires differed between subpopulations, it was not possible to classify individual TCRs at high accuracy. As discussed above, this may in part be due to the fact that the populations we define only imperfectly correlate with their fate and self-reactivity. However, the differences between CD4⁺ and CD8⁺repertoires, which are much less likely to be affected by issues of functional or physical cross-contamination, are also seen only at a statistical population level, and not an individual TCR sequence level. Learning the collective properties of at least a few dozen TCRs was required in order to achieve good discrimination between repertoires.

The statistical population-level differences between populations of thymocytes and mature T cells which we observe is reminiscent of previous models emphasising the importance of collective, rather than individual T cell behavior. Butler et al., 2013 proposed that a minimum number of T cells must collectively recognize a peptide to trigger a response, proposing quorum sensing as a mechanistic explanation of this collective decision making. Recent experiments have confirmed that quorum sensing between TCRs can occur, mediated via cytokine signaling (Polonsky et al., 2018), and estimating a minimum quorum size of activated T cells to be 30. Our results suggest that thymic selection imposes only a rather weak selective pressure on the repertoire, which is consistent with Butler et al., 2013 ’s hypothesis that most self-peptides are not screened by TCR during negative selection. Our results are consistent with their model, in which even a subtle depletion, rather than complete elimination of non-self TCRs, may still translate into robust self/non-self discrimination in populations of reactive TCRs. Self versus foreign peptide discrimination by TCR is somewhat the conjugate task of self-reactive versus a non self-reactive T cell discrimination during negative selection. While the performance of the two tasks cannot be directly compared at first sight, they are related in that both are impaired by a factor $(1 - f)$ due to partial screening of self peptides, where $f$ is the fraction of self-peptides that are presented during thymic developement. The common point is that even when $f$ is small, the law of large numbers can rescue the discrimination task when there are multiple observations. In Materials and Methods, we argue using the model of Butler et al., 2013 how the idealized performance of repertoire discrimination using multiple ( $m$ ) TCR (akin to the task of Figure 3B) may be compared to the task of telling self from foreign peptides in the periphery, when the number of T cells specific to one particular peptide and recruited to the site of infection is $m \times \bar{n}$ , where $\bar{n}$ is the average number of self-peptides recognized by a random TCR. While those numbers cannot be applied directly to the results of Figure 3B, which are based on an imperfect classifier from a single chain, they give a sense of how the same principle of discrimination apply to both cases.

In the second part of the study we explore in more detail whether we can discover any evidence that thymic selection depends on specific sequence motifs (i.e. a strong correlative structure between CDR3 amino acids). For this purpose, we build on our previous work which have established a framework for the development of generative statistical models of repertoire generation, based firmly on a mechanistic understanding of TCR generation and selection. Specifically, we construct models which incorporate only linear combinations of CDR3 sequences to capture the selective process which can transform one repertoire into another. These models produce an ‘enrichment factor’ for each TCR which estimates its relative likelihood of being in a particular stage-specific population. Intuitively, one can consider these factors as capturing the probable enrichment or depletion of a TCR with a particular sequence when comparing two repertoires. We demonstrate that these linear models effectively capture the progressive decrease in repertoire diversity which we observe in the preselected DP to the SP transition. They also effectively capture the known developmental relationships between the thymic subpopulations. Thus we find no evidence that complex non-linear amino acid sequence interactions are required to explain the observed changes in repertoire observed in our data. We also compared the distributions of enrichment factors between populations. We demonstrate that, contrary to the predictions of a strong binary selection model, we do not observe any negative correlation between enrichment factor distributions between selected and non-selected repertoires. Instead, we observe a set of positive correlations, revealing a dominant conserved selection process spanning the developmental stages between pre-selection DP and mature SP. Consistent with the clustering data discussed above, we find strong correlation between the enrichment factor distributions of mature SP and thymic negatively selected population, and no evidence of binary selection between these two populations.

In conclusion, we report a comprehensive analysis of the TCR repertoire at various stages of thymic development. We then combine data-driven and model-based analysis of these repertoires. Our conclusions are incompatible with a model of thymic developments which involves a sequence of clear-cut binary selection processes, based on TCR sequence features. Rather, our data suggest a probabilistic fuzzy decision making process at each selection step. We propose that this model is compatible with robust self/non-self discrimination, if T cell responses to antigen are governed by collective quorum based decision making. Further experimental and theoretical work is required to test these hypotheses, which have fundamental implications for strategies to modulate the immune response for prophylaxis or therapy of human disease.

Materials and methods

Animals

The experiment was carried out using three 6-week-old male inbred Nur77-GFP/Foxp3-mCherry (C57BL/6 background) (Moran et al., 2011). The cross was bred and maintained at the Weizmann institute. This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to approved institutional animal care and use committee (IACUC) protocols (#21661115–2) of the Weizmann Institute of Science. The protocol was approved by the Committee on the Ethics of Animal Experiments of the Weizmann Institute of Science. Every effort was made to minimize suffering.

Share this article

Cite this article

Experiment outline and repertoire sampling.

Properties of the α chain sequence (the analogous plot for the β chain is showed in Figure 2—figure supplement 1).

AUC scores for the pooled TCRα datasets.

n-gram frequency discriminates between repertoires.

Density scatter-plots of TCRα sequences comparing the selection energies learnt at two different stages.

Cell sorting based on fluorochrome-labeled mouse antibodies.

Author details

Francesco Camaglia

Contribution

Contributed equally with

Competing interests

Arie Ryvkin

Contribution

Contributed equally with

Competing interests

Erez Greenstein

Contribution

Competing interests

Shlomit Reich-Zeliger

Contribution

Competing interests

Benny Chain

Contribution

Competing interests

Thierry Mora

Contribution

Contributed equally with

For correspondence

Competing interests

Aleksandra M Walczak

Contribution

Contributed equally with

For correspondence

Competing interests

Nir Friedman (deceased)

Contribution

Contributed equally with

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading

$n$ -gram frequency discriminates between repertoires.