Conserved biophysical compatibility among the highly variable germline-encoded regions shapes TCR-MHC interactions

  1. Christopher T Boughter  Is a corresponding author
  2. Martin Meier-Schellersheim  Is a corresponding author
  1. Computational Biology Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, United States
16 figures, 1 table and 3 additional files

Figures

A breakdown of the canonical docking orientation adopted by TCRs over the pMHC complex in all solved crystal structures to date highlights the strong structural conservation of the TCR-pMHC interaction.

(A, B) Example renderings of TCR-pMHC complexes for class I (A, PDB: 6MTM) and class II (B, PDB: 1J8H) MHC molecules. The CDR loops are largely representative of the placement over the MHC helices and peptide for the vast majority of TCR-pMHC complexes. (C, D) Polar coordinate plots of TCR docking angles over class I (C) and class II (D) MHC molecules (data via the TCR3D database Gowthaman and Pierce, 2019). Note that these docking angles do not perfectly overlay on to the structures of panels A and B, as there are also slight deviations in the location of the TCR center of mass over the MHC complex.

Figure 2 with 4 supplements
Quantification of diversity of TRAV, TRBV, and HLA alleles reveals limited sites for fully conserved interactions.

(A) Visualization of the TRAV amino acid sequences paired with a subsampled HLA class I dataset. (B) This subsampling is repeated 1000 times to generate the average population Shannon entropy for each allelic subset as a function of position in the matrix of (A). (C) Visualization of the alignment-encoded matrices of HLA class IIa and classIIb datasets. (D) Calculated position-sensitive entropy for the HLA class IIa and class IIb datasets using the same subsampling scheme as (B). The color bar for panels (A) and (C) give the amino acid key for the matrix encoding, where each individual color represents an individual amino acid, and gaps in the matrix are colored white. In panels (B) and (D), the maximal entropy for the 20 possible amino acids at a given site is 4.3 bits, whereas an entropy of 0 bits represents a fully conserved amino acid. Variation in the entropies over subsampling repetitions is represented by the standard deviation as shadowed regions around the solid line averages.

Figure 2—figure supplement 1
Quantification of mutual information (MI) between germline encoded TCR regions and MHC α-helices show no strong signal.

Peptide-contacting residues of the MHC platform domain are included as a reference. (A) Mutual information calculated between TRAV germline encoded regions and specific structural features of class I HLA molecules. The boxed red region highlights the TCR-MHC mutual information, which is then further characterized in (B). Within-molecule mutual information (i.e. TCR-TCR and MHC-MHC) shows clear signals of co-variation over each dataset, contrasting strongly with the TCR-MHC mutual information. (B) Difference calculated between TRAV-MHC mutual information (MI) and TRBV-MHC mutual information (MI) to search for patterns in co-varying residues that differ between the two datasets. Positive values signify increased TRAV-MHC MI in that region, whereas negative values signify increased TRBV-MHC MI in that region. We would expect that TRBV-MHC MI should be higher (blue) in the α1-helix region, whereas TRAV-MHC MI should be higher (red) in the α2-helix region. Neither trend is seen strongly here.

Figure 2—figure supplement 2
Information theoretic analysis of TCR-MHC pairs from a range of mammalian species shows that strongly co-varying residues between TCR and MHC germline sequences are rare.

TCR-MHC pairs are matched for each bootstrapped sample, as discussed in the above methods. (A) Average bootstrapped Shannon entropy of CDR1, CDR2, and CDR3 for TRAV and TRBV amino acid sequences coupled with class I sequences. (B) Average bootstrapped Shannon entropy of CDR1, CDR2, and CDR3 for TRAV and TRBV amino acid sequences coupled with class II sequences. Standard deviation given by width of lines. Site-wise mutual information is calculated from the entropies in panel (A), with class I - TRAV MI calculated in (C) and class I - TRBV MI in (D).

Figure 2—figure supplement 3
Mutual information differences between calculated TRAV and TRBV quantities from Figure 2—figure supplement 2C, D (A), or from class IIa (B), or class IIb (C) quantities.

Positions shaded blue represent a stronger MHC-TRAV signal, whereas red positions represent stronger MHC-TRBV signals.

Figure 2—figure supplement 4
Information theoretic metrics for crystallized TCR-MHC pairs validate the combinatorial calculation approaches of Figure 2 and Figure 2—figure supplements 13.

(A) AIMS encoding of the germline-encoded amino acids of crystallized class I complexes deposited to the PDB. (B) Shannon entropy calculations show roughly similar MHC diversity to the full dataset but substantially decreased CDR diversity. Despite this, we still find substantial TRAV-TRAV and TRBV-TRBV mutual information (C). However, a zoom in of the mutual information between TRAV/TRBV and MHC (D) shows no clear signal for a preference between TRAV-α2 and TRBV-α1 interactions.

Figure 3 with 2 supplements
Position sensitive biophysical characterization of germline-encoded CDR loops highlights increased variation when compared to KIR or MHC sequences.

Position sensitive amino acid charge averaged over all germline-encoded TCR CDR loops (n=48) (A), MHC class I α-helices (n=882) (B), KIR MHC-contacting regions (n=31) (C), or HLA-C α-helices (n=265) (D). Solid lines represent averages over unique amino acid sequences, while the standard deviations about these averages are given by the shaded regions. Visualization of specific interactions for KIR-MHC (E) or TCR-MHC (F) and their relative conservation across each respective dataset. Interactions are represented either in a single sequence context where a known binding network exists or as columns of residues present across polymorphisms or across TRBV genes. Key in the center gives color coding for either the biophysical properties of each amino acid (letters) or the relative contribution of these amino acids to an interaction interface (lines). Charges are normalized to a mean of 0 and a standard deviation of 1. Only polymorphisms across two-domain KIRs with well-characterized binding partners (KIR2DL1, 2DL2, 2DL3, 2DS1, 2DS2) are considered.

Figure 3—figure supplement 1
Position-sensitive hydropathy shows similar trends to the position-sensitive charge, showing higher variability in the germline-encoded TCR CDR loops than all other tested datasets.

Position sensitive amino acid hydropathy averaged over all germline-encoded TCR CDR loops (n=48) (A), MHC class I α-helices (n=882) (B), KIR MHC-contacting regions (n=31) (C), or HLA-C α-helices (n=265) (D). Solid lines represent averages over unique amino acid sequences, while the standard deviations about these averages are given by the shaded regions. A positive hydropathy score corresponds to a hydrophilic residue, while a negative score corresponds to a hydrophobic residue. Scores are normalized to a mean of 0 and a standard deviation of 1.

Figure 3—figure supplement 2
The killer cell immunoglobulin-like receptors (KIRs) and their recognition of HLA-C represent a suitable comparison to the germline-encoded CDR loops and their recognition of MHC.

(A) Structure of KIR2DL2 in complex with HLA-C*07:02 [PDB: 6PA1] highlights the conserved binding mode of KIR. (B) Inset highlighting the salt bridge and hydrogen bonding network in the KIR-MHC interface shown in (A). Red lines represent electrostatic interactions. (C) Matrix encoding and alignment of the MHC-contacting regions of the KIR. Each color represents a unique amino acid. (D) Position-sensitive Shannon entropy from the matrix in (C) highlighting the exceptionally low diversity of the KIRs in the MHC-contacting regions.

Figure 4 with 2 supplements
Interaction score between every TRBV (A) or TRAV (B) sequence and HLA allele for all four germline-encoded CDR loops.

The x-axis moves across each productive germline-encoded TCR gene (see Supplementary file 2 for key), while the y-axis is grouped by each broad HLA class I allele group. The color bar gives the interaction potential (unitless, higher potentials suggest stronger interactions). Alignments of TRBV (C) or TRAV (D) sequences highlighting genes with a range of interaction potentials, colored by biophysical property. Gene names for each sequence are colored by interaction potential: high-black, moderate-gray, and negligible-light gray. Color coding for alignment: grey - hydrophilic, blue - positive, red - negative, orange - hydrophobic, white - non-interacting. Gaps in an alignment are denoted by a dot (.).

Figure 4—figure supplement 1
Per-gene interaction potentials averaged over all class I HLA molecules for TRAV genes (A) and TRBV genes (B).

The variance across all of these interaction potentials, in other words over all genes, is reported within each plot. Reported values give interaction potential averaged over both HLA helices, not just the helix each TCR chain ”canonically” interacts with, and over both CDR1 and CDR2.

Figure 4—figure supplement 2
AIMS interaction potentials for CDR1 and CDR2 β and α chains with HLA class II molecules.

These calculations are made separately for class IIa (A) and class IIb (B) alleles. The interaction potential scale is given in the center of each plot. The key for which TRAV or TRBV gene corresponds to which number on the x-axis of each plot can be found in Supplementary file 2.

Figure 5 with 3 supplements
Comparison of interaction predictions to crystallized TCR-pMHC complexes.

(A) Heat map representation of the AIMS interaction potential (version 2), normalized by mapping negative interaction potentials to 0. (B) A similar heat map representation for an empirical comparison to crystallized TCR-pMHC, showing a symmetrized version of the count matrix of crystal contacts between germline-encoded sidechains. Colorbars give either the AIMS interaction potential (A) or the raw contact count (B). A more rigorous quantitative comparison between the AIMS interaction potential and crystallized TCR-pMHC complexes gives these contact counts as violin plots for TRAV (C) or TRBV (D) encoded CDRs predicted to be weak (TRAV: n = 32; TRBV: n = 38), moderate (TRAV: n = 68; TRBV: n = 82), or strong (TRAV: n = 40; TRBV: n = 20) binders to MHC. Dashed inner lines give the quartiles of each distribution. Statistics determined using a non-parametric permutation test, * - p<0.05, ** - p<0.01, ns - not significant. X-axis Key: SC-SC:Sidechain-Sidechain, SC-Back:TCR Sidechain-MHC Backbone, Back-SC: TCR Backbone-MHC Sidechain, Back-Back:Backbone-Backbone.

Figure 5—figure supplement 1
Non-normalized profiles of the data in Figure 5.

(A) The raw AIMS interaction potential (V2) highlights the nuanced details in the amino acid interactions. (B) The raw (non-symmetrized) crystal contact count matrix for sidechain interactions. (C) Comparison of the frequency of each amino acid in either the studied crystallized structures (Crystal MHC and TCR) or in the unique amino acid sequences available via IMGT (All MHC and TCR). (D) These expected IMGT frequencies can be used to create a background-subtracted version of panel B, highlighting over-represented interactions (like Y-Q).

Figure 5—figure supplement 2
The (non-symmetrized, symmetrized) crystal contact count matrices for (A, B) TCR Backbone - MHC Sidechain; (C, D) TCR Sidechain - MHC Backbone; and (E, F) TCR Backbone - MHC Backbone interactions.

Contact counts are solely for class I TCR-MHC complexes.

Figure 5—figure supplement 3
Rendering of a TCR-MHC complex dominated by CDR3-peptide interactions (PDB:2AK4).

Interactions with CDR3 or peptide are rendered in gray, germline-encoded interactions are colored cyan, and the peptide is shown in yellow.

Figure 6 with 1 supplement
Positive interactions between each germline-encoded TCR CDR loop and all HLA alleles.

Violin plots give distribution of interaction scores, with dashed lines separating quartiles of the distributions. The individual interaction potentials are shown for the individual CDR loop interactions with MHC Class I α1-helix (A), MHC Class I α2-helix (B), MHC Class II α-helix (C), and MHC Class II β-helix (D). Reported values are averages for each individual TRAV or TRBV gene and their interactions with each individual MHC allele over both CDR loops. Number of points in each violin plot given as follows (MHC: n = #TRAV, #TRBV): HLA-A: n = 10755, 11472; HLA-B: n = 17010, 18144; HLA-C: n = 11925, 12720; HLA-DQα: n=945, 1008; HLA-DRα: n = 45, 48; HLA-DPα: n = 990, 1056; HLA-DQβ: n = 5985, 6384; HLA-DRβ: n = 8550, 9120; HLA-DPβ: n=4860, 5184.

Figure 6—figure supplement 1
AIMS clash potentials calculated for all possible CDR-MHC helix interactions.

Clashes are estimated from single sites in the CDR and MHC that should form unfavorable interactions (as shown in Supplementary file 1). Calculations are shown for HLA class I interactions with the α1 helix (A), the α2 helix (B), and for HLA class II interactions with the α-helix (C) and the β helix (D). Reported values are averages for each individual TRAV or TRBV gene and their proposed possible clashes with each individual MHC polymorphism. Clashes are, on the whole, rare between germline-encoded CDR loops and MHC α-helices. Number of points in each violin plot given as follows (MHC: n = #TRAV, #TRBV): HLA-A: n = 10755, 11472; HLA-B: n = 17010, 18144; HLA-C: n = 11925, 12720; HLA-DQα: n=945, 1008; HLA-DRα: n = 45, 48; HLA-DPα: n = 990, 1056; HLA-DQβ: n = 5985, 6384; HLA-DRβ: n = 8550, 9120; HLA-DPβ: n=4860, 5184.

Breakdown of the interaction potential for every exposed residue on HLA surfaces.

Color coded structures of HLA class I (A, PDB: 6MTM) and class II (B, PDB: 1J8H) provide the structural context to the bioinformatic results. The colors of the amino acid side-chains match the labels of the later panels. The averaged interaction potential across TRAV and TRBV genes with each individual MHC helix is shown in the top panels for HLA class I (C) and HLA class II (D). Lower panels of (C) and (D) compare these interaction potentials to total crystal contacts with these same residues (n = 149 HLA class I structures, n = 44 HLA class II structures).

Figure 8 with 1 supplement
Identification of well conserved regions of low interaction potential finalize a working model for the root cause of canonical TCR-MHC docking orientations.

Position-sensitive Shannon entropy (top) and normalized amino acid hydropathy (bottom) for class I (A) and class II (B) HLA molecules. Red lines in the hydropathy plots indicate an average over all HLA molecules, while gray lines give the position-sensitive biophysical properties of individual molecules. Alignments of class I (C) or class II (D) HLA alleles from a subsampling of parental alleles, colored by biophysical property. Color coding for alignment: grey - hydrophilic, blue - positively charged, red - negatively charged, orange - hydrophobic, white - non-interacting. Renders of class I (E, PDB: 6MTM) and class II (F, PDB: 1J8H) HLA molecules with α-helices colored by interaction potential. Green - regions of high interaction potential, cyan - regions of moderate interaction potential, black - regions of negligible interaction potential. Orange ovals give probable contact regions for TCRβ, while purple ovals give probable contact regions for TCRα, defining canonical docking orientations.

Figure 8—figure supplement 1
Across a range of mammalian species, the regions of low interaction potential on class I and class II MHC α-helices are very well conserved.

Species used in this analysis highlighted in (A) as found in the IPD-MHC Database. Color-coded matrix alignments of MHC class I (B), MHC class IIa (C), MHC class IIb (D), TRAV (E), and TRBV (F) highlight the extent of this conservation. Specifically for the MHC alignments, the regions of conserved low interaction potential have the extent of conservation quantified, along with the amino acid identity at these sites. Colors of the named amino acids match the colors in the matrices. By comparison, we can see by eye that outside of the TRBV CASS motif and the TRAV AV motif in the germline-encoded region of CDR3, there is almost no clear conservation in the germline-encoded CDR loops.

Figure 9 with 1 supplement
Molecular simulations of the so-called ”knob-in-hole” interaction (Garcia, 2012) highlight the dynamic nature of protein sidechains.

(A) The starting crystal structure (PDB:1FYT) suggests a tight packing between TYR50 on CDR2β and GLN57 and ALA61 on the class II α-chain. (B) Short all-atom simulations show that this suggestive tight packing is due to the static nature of crystal structures, with GLN57 freely adopting alternate conformations over the course of the simulation.

Figure 9—figure supplement 1
Triplicate molecular dynamics simulations highlight the short-lived nature of the ”knob-in-hole” interaction.

(A) Structural visualization of the atoms used to measure distances and track the motion of the Tyr-Ala-Gln sidechain interaction trio (PDB: 1FYT). (B) The Tyr-Gln O-O tracks the strong deviations made by Gln 52 over the course of simulated trajectories, causing the ‘knob-in-hole’ to become a more flat interface that no longer appears specially evolved solely for interactions with Tyr. Measurements of Tyr-Ala and Tyr-Gln atomic distances closer to the MHC molecule backbone (C, D) show that these measured Gln sidechain deviations are not due to strong variation in the overall TCR-pMHC interaction, but instead due almost solely due to side-chain flexibility. The black lines in each figure give the crystal structure distances as a reference. Each line in panels B-D reperesent the atomic distances measured across the three replicate (Rep) simulations.

Author response image 1
Class I Interaction Scoring with v0 Matrix (Compare to Figure 4).
Author response image 2
Class IIa Interaction Scoring matrix with V0 scoring (Compare to Figure S8A).
Author response image 3
Class IIb interaction scoring matrix with V0 scoring (Compare to Figure S8B).
Author response image 4
The V0 scoring matrix (for direct comparison with Supplemental Figure 9A).
Author response image 5
By-gene interaction scores with “proper” helix interactions (i.e. TRAV interactions with Class I alpha2 helix, TRBV interactions with Class I alpha1 helix).
Author response image 6
Structural information used to generate the cartoons of Figure 8.

Overlay of a range of class I TCR-MHC structures with strong, moderate, and weak TRAV/TRBV AIMS-predicted binding propensity. While there is variation in the precise location of the CDR loops, they typically occupy a similar space over the class I MHC surface (colored to match the class I MHC in Figure 8E). We highlight with a dashed red circle where CDR loops tend to avoid docking directly over, likely due to the region of low interaction potential.

Author response image 7

Tables

Author response table 1
Comparison of CDR loop sequences of TCR 1G4 and the TCR-like antibody that binds to the same pMHC complex.

Sequences identified via PMID 19307587

LoopCDR1A/HCDR2A/HCDR1B/LCDR2B/L
TCR SeqDSAIYNIQSSQREMNHEYSVGAGI
Antibody SeqGFTFSTYQIVSSGGSTTGTSRDVGGYNYVSDVIERSS

Additional files

Supplementary file 1

Table used for the second version of the AIMS scoring of pairwise amino acid interactions.

The table attempts to recapitulate the interactions between amino acids at the level of an introductory biochemistry course.

https://cdn.elifesciences.org/articles/90681/elife-90681-supp1-v2.csv
Supplementary file 2

Key for Figure 4 and Figure 4—figure supplement 1 relating the numbers on the X-axis of each plot to the corresponding TRAV or TRBV gene.

Note, the pairing of TRAV and TRBV genes to a specific X-axis number has no meaningful relation. Genes are listed in the same order as found on IMGT, with pseudogenes not included.

https://cdn.elifesciences.org/articles/90681/elife-90681-supp2-v2.csv
MDAR checklist
https://cdn.elifesciences.org/articles/90681/elife-90681-mdarchecklist1-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Christopher T Boughter
  2. Martin Meier-Schellersheim
(2023)
Conserved biophysical compatibility among the highly variable germline-encoded regions shapes TCR-MHC interactions
eLife 12:e90681.
https://doi.org/10.7554/eLife.90681