Abstract
Antibodies are a major component of adaptive immunity against invading pathogens. Here we explore possibilities for an analytical approach to characterize the antigen-specific antibody repertoire directly from the secreted proteins in convalescent serum. This approach aims to perform simultaneous antibody sequencing and epitope mapping using a combination of single particle cryo-electron microscopy (cryoEM) and bottom-up proteomics techniques based on mass spectrometry (LC-MS/MS). We evaluate the performance of the deep-learning tool ModelAngelo in determining de novo antibody sequences directly from reconstructed 3D volumes of antibody-antigen complexes. We demonstrate that while map quality is a critical bottleneck, it is possible to sequence antibody variable domains from cryoEM reconstructions with accuracies of up to 80-90%. While the rate of errors exceeds the typical levels of somatic hypermutation, we show that the ModelAngelo-derived sequences can be used to assign the used V-genes. This provides a functional guide to assemble de novo peptides from LC-MS/MS data more accurately and improves the tolerance to a background of polyclonal antibody sequences. Following this proof-of-principle, we discuss the feasibility and future directions of this approach to characterize antigen-specific antibody repertoires.
Introduction
Adaptive immunity to invading pathogens is mediated to an important degree by antibodies1–4. The available repertoire of antibodies is unique in individuals and constantly shifting in response to immunological pressure under which activation, selection, proliferation, maturation, and differentiation of the antibody-producing B-cells takes place4–6. Therefore, understanding the molecular mechanisms of antibody-mediated immunity requires knowledge of how a complex repertoire of polyclonal antibodies targets a diverse landscape of epitopes on their respective antigens.
The analytical challenge at hand is to resolve antigen-antibody interactions down to the pairwise contacts between specific amino acid residues in the epitope and paratope regions, respectively. This would enable reconstruction of the evolutionary pathways of somatic recombination and hypermutation that lead to high-affinity antigen binding. Conversely, this would also reveal how this selective immune pressure drives the evolutionary pathways of antigenic drift in targeted pathogens7–10. Knowledge of the precise antibody sequences, as well as near-atomic details of the epitope-paratope interaction, are thus prerequisites to understand the coevolution between replicating pathogens and antibody-mediated adaptive immunity in the host.
Current methods to determine the antigen-specific antibody repertoire rely on single (memory) B-cell sorting, followed by targeted sequencing of the coding mRNAs for the heavy and light chains11,12. This enables the production of recombinant monoclonal antibodies, whose epitopes may be mapped to near-atomic structural detail by X-ray crystallography and cryo electron microscopy (cryoEM). This approach has generated a wealth of information about antibody-antigen interactions, though it is biased by the limited pool of memory B-cells that it probes. Antibodies function as circulating glycoproteins in bodily fluid, secreted from plasma cells which are in turn derived from diverse pools of memory B cells located in various tissues, including bone marrow, spleen, lymph nodes, and only to a minor degree in blood13–15. Serological assays aimed to determine binding and neutralization titers specifically look at the secreted antibody in bodily fluid, and it remains an outstanding question how this ‘serum compartment’ of the antibody repertoire relates both qualitatively and quantitatively to the minor population of memory B-cells found in peripheral blood. This calls for new analytical approaches that can derive both antibody sequence and epitope information straight from the secreted antibody product.
Such approaches have been developed in recent years, based on mass spectrometry and electron microscopy. First, using a bottom-up proteomics approach, antibody-derived peptides can be sequenced de novo from fragmentation spectra and assembled into full heavy/light chain sequences12,16. Sequence accuracy is such that functional monoclonal antibodies can be reconstructed from the input data, and several reports have described successful sequencing efforts of human serum, milk, and urine-derived antibodies17–32. Second, both hydrogen-deuterium exchange mass spectrometry and electron microscopy have been used to resolve a complex landscape of epitopes targeted by polyclonal antibody mixtures33–42. Ward and colleagues have reported that with the latter approach, which they coined Electron Microscopy based Polyclonal Epitope Mapping (EMPEM), they obtained near-atomic resolution reconstructions by cryoEM to resolve side-chain densities in the epitope-paratope region33. This opens the possibility to derive antibody sequence information and integrate this into the structural modelling of the interaction. In essence, the reconstructions might reveal which antibodies from the complex polyclonal mixture bind which epitopes on the antigens. The improved resolutions of current cryoEM approaches thus allow for a type of visual proteomics in which protein identity (i.e. antibody sequence) may be directly inferred from the reconstructed 3D volumes43–51. While the pure sequencing accuracies from these approaches is obviously limited by resolution/map quality, many tools have been recently developed to infer protein identity in an automated fashion, including cryoID, DeepTracer-ID, findmysequence, and most recently ModelAngelo52–55.
Here we explore the use of ModelAngelo to derive de novo antibody sequences from experimental cryoEM density maps of antibody-antigen complexes. We have previously developed the software tool Stitch, which sorts and assembles MS-derived peptide sequences into full heavy/light chain sequences across complex repertoires56,57. We adapted Stitch to perform the same task on ModelAngelo-derived de novo models and test the accuracy of the approach on a benchmark of 164 publicly available cryoEM maps of monoclonal antibody-antigen pairs. We demonstrate that map quality is a critical bottleneck, but that antibody sequences can be derived with up to 80-90% accuracy. We test the utility of these sequences for assigning the used V-genes, which together with reconstruction of CDRH3 may offer a useful guide to assemble more accurate MS-derived peptide sequences. We show that such EM-derived templates indeed improve MS-based sequencing accuracy in the context of complex antibody mixtures and that publicly available EMPEM reconstructions are of sufficient quality to leverage this approach. This proof-of-principle offers a promising perspective to integrate cryoEM and MS methods for a comprehensive characterization of the antibody repertoire on both sequence and epitope levels.
Results
To assess the feasibility of deriving de novo antibody sequences from experimental cryoEM density maps, we assembled a benchmark dataset from the Electron Microscopy Data Bank (see Supplementary Table S1). To infer the antibody sequences, we chose the recently published deep-learning tool ModelAngelo, developed by Jamali, Kimanius, Scheres, and colleagues, as it is capable of inferring complete de novo models in cryoEM density maps without the need for user input sequences or main chain models52–55. We searched the EMDB for published maps containing antigen-binding fragments (Fabs) of any species, at nominal resolutions of ≤ 4 Å, released after the training data for ModelAngelo was obtained. These results were filtered for maps that included a deposited atomic model and which contained only a single unique monoclonal Fab (of which multiple copies may be bound in the reconstructed antigen complex). The final benchmark includes 164 maps, including Fabs from human, rabbit, mouse, and macaque species. The maps were used as input data for ModelAngelo without user-provided sequences, yielding completely de novo atomic models of both antigen and Fabs.
The output models from ModelAngelo are typically fragmented to varying degrees depending on the local quality of the map. In addition, maps may contain multiple copies of the same unique Fab molecule. We therefore aimed to consolidate all fragments for the built Fabs into a single consensus sequence for the antibody variable domains of the heavy and light chains. We have previously developed the software Stitch, which performs assembly of LC-MS/MS derived de novo peptide reads into the correct framework of the heavy and light chains by alignment to germline-template sequences from the ImMunoGeneTics (IMGT) database58–60. We adapted Stitch to use models from ModelAngelo (or any mmCIF file) as input data, extract the amino acid sequences, and perform the same template-based assembly with the resulting reads. Of the 164 input maps, 141 and 144 yielded a non-zero alignment score for the heavy and light chains in the Stitch result, respectively. These add up to a total of 152 maps which were analyzed further to evaluate the sequence accuracy of this approach (including 134 maps with a non-zero alignment score for both heavy and light chain).
Results for one of the top-scoring entries in the benchmark, an Influenza B virus neuraminidase tetramer in complex with four identical copies of a neutralizing Fab61, is shown in Figure 1. The consensus sequences generated in Stitch have complete coverage of both heavy and light chain variable domains, including both CDRL3 and CDRH3. The de novo determined sequence is 84% and 86% identical to the true heavy and light chain sequences, respectively. The 15% rate of sequencing errors exceeds the typical levels of somatic hypermutation observed in mature antibody sequences, which are on the order of 1-10%. While the derived sequence should therefore not be taken at face value to reconstruct recombinant monoclonal antibodies, we reasoned that the accuracy is nevertheless likely sufficient to correctly infer the corresponding germline V-genes of the mature antibodies.
For each of the 152 maps, we compared the pairwise sequence identity between the top scoring V-genes from the ModelAngelo input in Stitch with the top scoring V-genes from the true sequences (as deposited in the corresponding PDB entry). For reference, we also calculated the pairwise sequence identities of all available V-gene templates per species, reflecting what a completely random draw from the available V-gene repertoire would look like. As shown in Figure 2, the top scoring V-genes from the ModelAngelo sequences have significantly higher pairwise identities than a random draw from the V-gene repertoire in IMGT for both the heavy and light chains (p < 0.0001 in unpaired, two-tailed Kolmogorov Smirnov tests). Furthermore, the identity of the inferred V-gene with the true sequence scales with the alignment scores in Stitch, making it a valuable metric for the quality of the V-gene inference. The mean/median V-gene identity in a random draw from the IMGT repertoire is 0.59/0.52 and 0.56/0.53 for the heavy and light chains, respectively. With the ModelAngelo-derived sequences this improves to 0.78/0.87 and 0.75/0.85 for the heavy and light chains in the full dataset. This gradually improves with higher alignment scores to 0.89/1.00 and 0.88/1.00 for heavy and light chains, starting at a cutoff of 80 (representing 92/141 and 80/134 maps for heavy and light chain, respectively). The complete complementarity determining regions CDRH3 and CDRL3 were covered in 66 and 68 maps for the heavy and light chain, respectively (see Supplementary Figure S1). The length of the complete antigen binding loops was estimated with an average error of 0.5 ± 3.3 or 1.7 ± 6.0 residues for heavy and light chain, with average sequence identities of 0.63 and 0.41. We found the global FSC resolution of the input map to be a poor predictor of both the Stitch alignment score and the inferred V-gene identity, likely because it is dominated by the bulk of the antigen and not representative of the local resolution in the epitope-paratope region (see Supplementary Figure S2). These results demonstrate that candidate V-genes for the antibodies resolved in cryoEM densities can be accurately narrowed down using ModelAngelo and Stitch, and that a limited subset of maps contains accurate information on CDR3 sequence and length.
In the context of polyclonal antibody mixtures, this analysis suggests that cryoEM densities of antigen-antibody complexes from EMPEM experiments can be leveraged to guide sequence assembly from complementary proteomics-based profiling of the same sample (see Figure 3). In such an experiment, reconstructed cryoEM densities would be used as input data for ModelAngelo, from which the sequences are extracted and run through Stitch to select the top-scoring V-gene and construct a placeholder sequence for CDR3 of both the heavy and light chain. These reconstructed variable domains may then act as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.
As a proof-of-principle, we tested this on the monoclonal antibody CR3022, for which both a cryoEM reconstruction and LC-MS/MS data are publicly available. This antibody was isolated from a convalescent survivor of a SARS-CoV infection and targets a cryptic, but conserved epitope at the base of the Spike receptor-binding domain, with cross-neutralization to SARS-CoV-262,63. The antibody consists of an IGHV5-51 heavy chain, paired with an IGKV4-1 light chain. When complexed with full-length SARS-CoV-2 Spike, its Fab induces an odd rearrangement of the Spike protomers to yield an antiparallel dimer of S1 subunits in the cryoEM reconstructions64,65. When using this map as input for ModelAngelo and subsequently Stitch, the IGHV5-51 and IGKV4-1 germline sequences are correctly identified based on their alignment scores. Moreover, complete sequences for CDR3 of the heavy and light chains are built. When using these reconstructed variable domains as templates to guide assembly of the de novo peptide reads from the LC-MS/MS data published by Person and colleagues66, the final consensus sequences are 96% and 99% identical to the true heavy and light chain respectively. Of note, the only three remaining errors in the six CDR sequences are I/L assignments, which have identical masses and are notoriously challenging for MS-based sequencing.
For this CR3022 dataset, derived from the monoclonal antibody, the LC-MS/MS data alone assembled with Stitch against the full range of V-genes already yields a similar accuracy of 97% and 98% for the heavy and light chain, respectively. For the case of an EMPEM experiment, the challenge would rather be to correctly assemble the CR3022 sequence against a background of unrelated antibody sequences in a complex mixture. We therefore tested the utility of the ModelAngelo-derived templates for sequence assembly by also mixing input reads from LC-MS/MS data of unrelated antibodies (Figure 4). First, we mixed in a background of whole IgG from a hospitalized COVID-19 patient56, representing a diffuse polyclonal background in approximately a 1:1 ratio to the target input reads. Second, we mixed in a background of five additional unrelated anti-Influenza-HA monoclonal antibodies from the same study of Person and colleagues66, amounting to a 5:1 ratio of background to target input reads. In our previous work on sequencing serum-derived antibodies by bottom-up proteomics with Stitch, we found that assembly of the consensus sequences becomes much more tolerant to background data if, beyond the top scoring V-gene, the remaining unrelated V-genes are also included as decoys for the final template matching step56. The use of these decoy sequences is also included in the comparison here. Furthermore, we also included the use of the true CR3022 sequences as templates, serving as a best-case scenario, positive control. The analysis confirmed that even without ModelAngelo-derived templates, the sequence assembly with Stitch is already tolerant to the diffuse polyclonal background from the whole IgG fraction, yielding a similar accuracy as in the absence of these background peptides. By contrast, the sequence accuracy plummets to below 0.6 when the background consists of the five additional monoclonal antibodies in a 5:1 ratio to the target input reads. The accuracy is recovered to >0.95 when using either the true CR3022 sequences or the ModelAngelo-derived templates. There is also a gain in accuracy by using decoy templates with the LC-MS/MS data alone, though smaller compared to the use of ModelAngelo-derived templates. These results demonstrate that ModelAngelo-derived templates are useful for sequence assembly against complex polyclonal backgrounds.
We have demonstrated that cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences. We also evaluated whether EMPEM data is indeed of sufficient quality to infer V-genes from automated de novo modeling in the maps. We downloaded all published EMPEM maps from EMDB of which 23 were of sufficient quality to give a non-zero alignment score when using the ModelAngelo results in Stitch (see Supplementary Table S2). Analysis of the benchmark of monoclonal antigen-antibody complexes showed that the quality of the V-gene inference scales with the alignment scores. In this set of EMPEM reconstructions, the alignment scores range from ca. 20 to 300. Of the 23 maps, 8 have alignment scores above 50 for both heavy and light chain, at which point we estimate the mean/median V-gene identity to be approximately 0.85/0.90. This analysis shows that experimental EMPEM studies may yield sufficiently detailed reconstructions of the antigen-antibody complexes to narrow down the candidate V-genes of the resolved Fabs accurately.
Discussion
The development of EMPEM and MS-based polyclonal antibody sequencing now make it possible to profile the antigen-specific antibody repertoire straight from the secreted pool of antibodies in bodily fluids. This approach bridges the gap between established single B-cell sequencing approaches and serological assays to probe binding and neutralization titers. The present work demonstrates that epitope and sequence information can be integrated using ModelAngelo and Stitch. This approach holds promise to better understand the serum compartment of the antibody repertoire. Use of the cryoEM data in these workflows complements the MS data beyond epitope mapping in several significant ways. First, we demonstrated that narrowing down the candidate V-genes improves the tolerance of LC-MS/MS-derived peptide sequence assembly to background in a complex antibody mixture. Second, heavy-light chain pairing is a problematic blind spot to proteomics sequencing, as the antibodies are denatured and digested as part of the sample workup. In contrast, this pairing is trivial in cryoEM data as the chains are in direct contact in the resolved Fabs in the map. Finally, reconstruction of CDRH3 is especially challenging with proteomics data alone, as it spans the junction between the recombined V-, D-, and J-segments, of which the D-segment is short and hypervariable to a point that germline sequences do not provide a functional template for sequence assembly. While CDRH3 coverage is also limited in ModelAngelo data, it can be built in many cases and a correct estimate of CDRH3 length is already useful to guide assembly of the de novo peptide reads.
Here we have extracted the flat sequences from ModelAngelo output to use as input for the template matching step in Stitch, using a modified Smith-Waterman Alignment. We believe the template matching step in Stitch could be further improved in several ways. First, ModelAngelo has a built-in database search based on HMMer using profile Hidden Markov models (HMM) encoding the amino acid probabilities across the alphabet at each position67. While the ModelAngelo search currently does not consolidate the fragmented output models into a single search to build consensus sequences, we anticipate that implementing a similar HMM profile search in Stitch may further improve the V-gene inference. Similarly, from the benchmark dataset analyzed here we might learn what sequencing mistakes are common in ModelAngelo data. This could be used to adjust the conventional Smith-Waterman Alignment in Stitch accordingly, analogous to recent improvements in the alignment algorithm we implemented for MS data57. Finally, the template matching step in Stitch is now solely based on the sequence of the ModelAngelo models, but we may expand this to a structure-based alignment to better place the error-prone sequence reads in the correct framework of the Ig-domains.
Next steps in our efforts are to bring together the EMPEM work and MS-based polyclonal antibody sequencing on antigen-Fab complexes purified from patient serum. The goal is to reconstruct functional monoclonal antibodies from these analyses as ultimate proof for the accuracy of the derived antibody sequences and then start working on the throughput and depth of coverage in the repertoire. We believe that both EMPEM and MS-based polyclonal antibody sequencing are still limited to the top 1-10 antibodies in the polyclonal mixture. The EMPEM approach is biased towards bigger and well-ordered target antigens, which calls for additional complementary approaches like HDX-MS for a comprehensive polyclonal epitope mapping exercise. Bringing together these perspectives in an integrated structural biology approach promises new insights into the serum compartment of the antibody repertoire to better understand the coevolutionary processes of antibody maturation and antigenic drift.
Materials and Methods
Benchmark of monoclonal antibody-antigen pairs and EMPEM maps from EMDB
We searched the EMDB for published maps containing antigen-binding fragments (Fabs) of any species, at nominal resolutions of ≤ 4 Ångström, released after the training data for ModelAngelo was obtained (April 1st 2022). The search was performed on February 11th 2023 at https://www.ebi.ac.uk/emdb/ using the search term “antibody fab resolution:[* TO 4] AND release_date:[2022-03-31T00:00:00Z TO 2023-11-02T00:00:00Z] sample_name:*fab* fitted_pdbs:[* TO *]”. These results were filtered for maps that included a deposited atomic model and which contained only a single unique monoclonal Fab (of which multiple copies may be bound in the reconstructed antigen complex). When multiple redundant maps from the same study were included, we selected the single representative map of the highest quality, based on manual inspection. Global FSC resolution was not a good indicator as the maps were often dominated by the bound antigen, which may be better resolved than the bound antibody. Typically, the selected map was the focused/local refinement around the epitope-paratope region, despite its lower nominal resolution. A full overview of the selected maps is provided in Supplementary Table S1. The EMPEM maps were compiled based on a literature survey, complemented with a search of the EMDB using the term “polyclonal”. A full overview of the selected EMPEM maps is provided in Supplementary Table S2.
Changes made to Stitch to take CIF input
Stitch was extended to allow mmCIF files as input. From these files all chains were extracted as separate amino acid sequences to align in Stitch. ModelAngelo outputs a confidence score per residue in the B-factor column of the mmCIF input, which was used as a local confidence for the sequence. First, each polypeptide gets assigned an Average Local Confidence (ALC) score based on the average across all residues, which can be used as an input filter on the data (along with polypeptide length). Second, the local confidence is used as weight in determining the consensus sequence of overlapping polypeptides, following assembly in Stitch.
Analysis of monoclonal antigen-antibody and EMPEM benchmarks
The deposited model mmCIF and EM maps for the full benchmark were downloaded and ran with ModelAngelo (version 1.01) in ‘build_no_seq’ mode. For each entry in the benchmark the deposited mmCIF was run with Stitch (version 1.5.0-rc.1+6d3b540) using CutoffALC 80, minimum length 5 and TemplateMatching CutoffScore 8. The same was done for each mmCIF file produced by ModelAngelo but with an additional segment containing the antigen and Ig constant domain template sequences. From these runs the consensus sequence and highest scoring germline for IGHV and IGLV (lambda + kappa) were retrieved. For each consensus sequence the CDR3 was determined if the flanking cysteine on the V gene and the tryptophan or phenylalanine on the J gene were present. As these conserved residues were not all positioned correctly in the IMGT database the data was manually fixed based on the same rules. The data from the deposited and produced Stitch runs was compared to produce the identity between the consensus sequences, distance between the inferred germlines, and identity between the CDR3s. The script used for this analysis is deposited as Supplementary Data. The results generated by this analysis is included in Supplementary Table S1. The EMPEM benchmark was downloaded, ran through ModelAngelo and subsequently Stitch with identical parameters as above. In contrast to the benchmark detailed before, the ground-truth sequences of these antibodies is not known.
Analysis of CR3022 data
The EM data for CR3022 was downloaded from EMDB (EMD-11648) and run with ModelAngelo (version 1.01) in ‘build_no_seq’ mode. The raw data for monoclonal antibodies CR3022, 107, 1028, 2771, 3576, and 3634 from PRIDE PXD030094 was downloaded and analyzed with PEAKS 10+. These listed monoclonal antibodies were chosen because these are the five most distant sequences from CR3022 and therefore present the biggest challenge to sequence assembly in Stitch (version 1.5.0-rc.1+6d3b540). The PEAKS analysis for the COVID-19 data from PRIDE PXD031941 was downloaded. Three sets of input were prepared: ‘no background’ consisting of only the CR3022 data, ‘+whole IgG’ consisting of the CR3022 and the COVID-19 data, and ‘+5 mAbs’ which consists of all mAbs from the CR3022 study. Three Stitch configurations where prepared: ‘True’ using the known CR3022 sequence as template as retrieved from PDB 7A5R, ‘IMGT’ using the conventional configuration of Stitch with all IMGT germlines as templates, and ‘MA’ using the closest V gene germline to the ModelAngelo consensus sequence of CR3022 together with the CDR3 sequence present. Each of these configurations were run with Stitch Recombine Decoy off and on, with this on for ‘True’ and ‘MA’ the full IMGT germline database was added and for ‘IMGT’ the Stitch parameter Decoy in Recombine was set allowing any unused germline from the Template Matching step to matched in the Recombine step. For all Stitch runs the CutoffALC was 90 and the TemplateMatching CutoffScore 10. The resulting consensus sequences from these 18 Stitch runs were then compared with the known CR3022 sequence to determine the identity. The full script used for this analysis is included in the deposited Supplementary Data.
Data and code availability
Stitch is available at https://github.com/snijderlab/stitch. The CR3022 and COVID-19 whole IgG LC-MS/MS data were taken from PRIDE Archive (http://www.ebi.ac.uk/pride/archive/) via the PRIDE partner repository with the data set identifiers PXD030094 and PXD031941, respectively. All ModelAngelo and Stitch results, including a script to reproduce the full analysis, are made available on Zenodo under 10.5281/zenodo.12207014.
Acknowledgements
This research was funded by the Dutch Research Council NWO Gravitation 2013 BOO, Institute for Chemical Immunology (ICI; 024.002.009), and the European Research Council Executive Agency HORIZON ERC-2022-STG (FLAVIR; 101077640).
Additional files
Supplementary Table S1. Overview and results of maps for benchmark.
Supplementary Table S2. Overview and results of maps for EMPEM benchmark.
References
- (1)Adaptive ImmunityJ. Allergy Clin. Immunol. 125:S33–S40https://doi.org/10.1016/j.jaci.2009.09.017
- (2)Understanding the Human Antibody RepertoiremAbs 12https://doi.org/10.1080/19420862.2020.1729683
- (3)Antiviral Neutralizing Antibodies: From in Vitro to in Vivo ActivityNat. Rev. Immunol 23:720–734https://doi.org/10.1038/s41577-023-00858-w
- (4)B Cell Activation and Response Regulation During Viral InfectionsViral Immunol 33:294–306https://doi.org/10.1089/vim.2019.0207
- (5)Plasma Cells: The Programming of an Antibody-secreting MachineEur. J. Immunol 49:30–37https://doi.org/10.1002/eji.201847517
- (6)B-Cell Biology and DevelopmentJ. Allergy Clin. Immunol. 131:959–971https://doi.org/10.1016/j.jaci.2013.01.046
- (7)How Repertoire Data Are Changing Antibody ScienceJ. Biol. Chem 295:9823–9837https://doi.org/10.1074/jbc.REV120.010181
- (8)Co-Evolution of Immunity and Seasonal Influenza VirusesNat. Rev. Microbiol. 21:805–817https://doi.org/10.1038/s41579-023-00945-8
- (9)Progress toward Improved Understanding of Antibody MaturationCurr. Opin. Struct. Biol 67:226–231https://doi.org/10.1016/j.sbi.2020.11.008
- (10)B-Cell Memory Responses to Variant Viral AntigensViruses 13https://doi.org/10.3390/v13040565
- (11)The Promise and Challenge of High-Throughput Sequencing of the Antibody RepertoireNat. Biotechnol 32:158–168https://doi.org/10.1038/nbt.2782
- (12)Next-Generation Sequencing and Protein Mass Spectrometry for the Comprehensive Analysis of Human Cellular and Serum Antibody RepertoiresCurr. Opin. Chem. Biol 24:112–120https://doi.org/10.1016/j.cbpa.2014.11.007
- (13)Memory B CellsNat. Rev. Immunol. 24:5–17https://doi.org/10.1038/s41577-023-00897-3
- (14)An Atlas of B-Cell Clonal Distribution in the Human BodyNat. Biotechnol 35:879–884https://doi.org/10.1038/nbt.3942
- (15)B Cell Memory: Building Two Walls of Protection against PathogensNat. Rev. Immunol 20:229–238https://doi.org/10.1038/s41577-019-0244-2
- (16)A Perspective toward Mass Spectrometry-Based de Novo Sequencing of Endogenous AntibodiesmAbs 14https://doi.org/10.1080/19420862.2022.2079449
- (17)A Robust Pipeline for Rapid Production of Versatile Nanobody RepertoiresNat. Methods 11:1253–1260https://doi.org/10.1038/nmeth.3170
- (18)Reverse Engineering the Anti-MUC1 Hybridoma Antibody 139H2 by Mass Spectrometry-Based de Novo SequencingbioRxiv https://doi.org/10.26508/lsa.202302366
- (19)Into the Dark Serum Proteome: Personalized Features of IgG1 and IgA1 Repertoires in Severe COVID-19 PatientsMol. Cell. Proteomics 23https://doi.org/10.1016/j.mcpro.2023.100690
- (20)Direct Mass Spectrometry-Based Detection and Antibody Sequencing of Monoclonal Gammopathy of Undetermined Significance from Patient Serum: A Case StudyJ. Proteome Res 22:3022–3028https://doi.org/10.1021/acs.jproteome.3c00330
- (21)Complete De Novo Assembly of Monoclonal Antibody SequencesSci. Rep 6https://doi.org/10.1038/srep31730
- (22)Human Plasma IgG1 Repertoires Are Simple, Unique, and DynamicCell Syst 12:1131–1143https://doi.org/10.1016/j.cels.2021.08.008
- (23)Primary Sequence Determination of a Monoclonal Antibody against α-Synuclein Using a Novel Mass Spectrometry-Based ApproachInt. J. Mass Spectrom 312:61–69https://doi.org/10.1016/j.ijms.2011.05.005
- (24)Automated Antibody De Novo Sequencing and Its Utility in Biopharmaceutical DiscoveryJ. Am. Soc. Mass Spectrom 28:803–810https://doi.org/10.1007/s13361-016-1580-0
- (25)Combining Phage Display with de Novo Protein Sequencing for Reverse Engineering of Monoclonal AntibodiesmAbs 8:501–512https://doi.org/10.1080/19420862.2016.1145865
- (26)Database-Independent Protein Sequencing (DiPS) Enables Full-Length de Novo Protein and Antibody Sequence DeterminationMol. Cell. Proteomics 16:1151–1161https://doi.org/10.1074/mcp.O116.065417
- (27)Mass Spectrometry-Based De Novo Sequencing of Monoclonal Antibodies Using Multiple Proteases and a Dual Fragmentation SchemeJ. Proteome Res 20:3559–3566https://doi.org/10.1021/acs.jproteome.1c00169
- (28)De Novo MS/MS Sequencing of Native Human AntibodiesJ. Proteome Res 16:45–54https://doi.org/10.1021/acs.jproteome.6b00608
- (29)A Proteomics Approach for the Identification and Cloning of Monoclonal Antibodies from SerumNat. Biotechnol 30:447–452https://doi.org/10.1038/nbt.2167
- (30)Resurrection of a Clinical Antibody: Template Proteogenomic de Novo Proteomic Sequencing and Reverse Engineering of an Anti-lymphotoxin-α AntibodyProteomics 11:395–405https://doi.org/10.1002/pmic.201000487
- (31)Automated de Novo Protein Sequencing of Monoclonal AntibodiesNat. Biotechnol 26:1336–1338https://doi.org/10.1038/nbt1208-1336
- (32)De Novo Sequencing of Antibody Light Chain Proteoforms from Patients with Multiple MyelomaAnal. Chem 93:10627–10634https://doi.org/10.1021/acs.analchem.1c01955
- (33)From Structure to Sequence: Antibody Discovery Using cryoEMSci. Adv 8https://doi.org/10.1126/sciadv.abk2039
- (34)High-Resolution Structural Analysis of Enterovirus-Reactive Polyclonal Antibodies in Complex with Whole VirionsPNAS Nexus 1https://doi.org/10.1093/pnasnexus/pgac253
- (35)Structural Mapping of Antibody Landscapes to Human Betacoronavirus Spike ProteinsSci. Adv 8https://doi.org/10.1126/sciadv.abn2911
- (36)Quadrivalent Influenza Nanoparticle Vaccines Induce Broad ProtectionNature 592:623–628https://doi.org/10.1038/s41586-021-03365-x
- (37)High-Resolution Mapping of the Neutralizing and Binding Specificities of Polyclonal Sera Post-HIV Env Trimer VaccinationeLife 10https://doi.org/10.7554/eLife.64281
- (38)Epitope Mapping of Human Polyclonal Antibodies to the fHbp Antigen of a Neisseria Meningitidis Vaccine by Hydrogen-Deuterium Exchange Mass Spectrometry (HDX-MS)Mol. Cell. Proteomics 23https://doi.org/10.1016/j.mcpro.2024.100734
- (39)Polyclonal Epitope Mapping Reveals Temporal Dynamics and Diversity of Human Antibody Responses to H5N1 VaccinationCell Rep 34https://doi.org/10.1016/j.celrep.2020.108682
- (40)Mapping Polyclonal Antibody Responses in Non-Human Primates Vaccinated with HIV Env Trimer Subunit VaccinesCell Rep 30:3755–3765https://doi.org/10.1016/j.celrep.2020.02.061
- (41)Epitope Mapping of Polyclonal Antibodies by Hydrogen–Deuterium Exchange Mass Spectrometry (HDX-MS)Anal. Chem 93:11669–11678https://doi.org/10.1021/acs.analchem.1c00696
- (42)Rapid Screening for Potential Epitopes Reactive with a Polycolonal Antibody by Solution-Phase H/D Exchange Monitored by FT-ICR Mass SpectrometryJ. Am. Soc. Mass Spectrom 24:1016–1025https://doi.org/10.1007/s13361-013-0644-7
- (43)The Molecular Sociology of the CellNature 450:973–982https://doi.org/10.1038/nature06523
- (44)Structural Specializations of the Sperm TailCell 186:2880–2896https://doi.org/10.1016/j.cell.2023.05.026
- (45)Ciliary Central Apparatus Structure Reveals Mechanisms of Microtubule PatterningNat. Struct. Mol. Biol 29:483–492https://doi.org/10.1038/s41594-022-00770-2
- (46)Structures of Radial Spokes and Associated Complexes Important for Ciliary MotilityNat. Struct. Mol. Biol 28:29–37https://doi.org/10.1038/s41594-020-00530-0
- (47)Delineating Organizational Principles of the Endogenous L-A Virus by Cryo-EM and Computational Analysis of Native Cell ExtractsCommun. Biol 7https://doi.org/10.1038/s42003-024-06204-7
- (48)Structural Basis of Integrator-Dependent RNA Polymerase II TerminationNature 629:219–227https://doi.org/10.1038/s41586-024-07269-4
- (49)Cryo-EM Analysis of Pseudomonas Phage Pa193 Structural ComponentsApril 12https://doi.org/10.21203/rs.3.rs-4189479/v1
- (50)Amyloid Fibrils in FTLD-TDP Are Composed of TMEM106B and Not TDP-43Nature 605:304–309https://doi.org/10.1038/s41586-022-04670-9
- (51)FilamentID Reveals the Composition and Function of Metabolic Enzyme Polymers during GametogenesisCell 187:3303–3318https://doi.org/10.1016/j.cell.2024.04.026
- (52)Bottom-up Structural Proteomics: cryoEM of Protein Complexes Enriched from the Cellular MilieuNat. Methods 17:79–85https://doi.org/10.1038/s41592-019-0637-y
- (53)DeepTracer-ID: De Novo Protein Identification from Cryo-EM MapsBiophys. J 121:2840–2848https://doi.org/10.1016/j.bpj.2022.06.025
- (54)findMySequence: A Neural-Network-Based Approach for Identification of Unknown Proteins in X-Ray Crystallography and Cryo-EMIUCrJ 9:86–97https://doi.org/10.1107/S2052252521011088
- (55)Automated Model Building and Protein Identification in Cryo-EM MapsNature 628:450–457https://doi.org/10.1038/s41586-024-07215-4
- (56)Template-Based Assembly of Proteomic Short Reads For De Novo Antibody Sequencing and Repertoire ProfilingAnal. Chem 94:10391–10399https://doi.org/10.1021/acs.analchem.2c01300
- (57)A Handle on Mass Coincidence Errors in de Novo Sequencing of Antibodies by Bottom-up ProteomicsFebruary 22https://doi.org/10.1101/2024.02.20.581155
- (58)IMGT® Databases, Related Tools and Web Resources through Three Main Axes of Research and DevelopmentNucleic Acids Res 50:D1262–D1272https://doi.org/10.1093/nar/gkab1136
- (59)Immunoglobulin and T Cell Receptor Genes: IMGT® and the Birth and Rise of ImmunoinformaticsFront. Immunol 5https://doi.org/10.3389/fimmu.2014.00022
- (60)IMGT®, the International ImMunoGeneTics Information System® 25 Years OnNucleic Acids Res 43:D413–D422https://doi.org/10.1093/nar/gku1056
- (61)A Pan-Influenza Antibody Inhibiting Neuraminidase via Receptor MimicryNature 618:590–597https://doi.org/10.1038/s41586-023-06136-y
- (62)Human Monoclonal Antibody Combination against SARS Coronavirus: Synergy and Coverage of Escape MutantsPLoS Med 3https://doi.org/10.1371/journal.pmed.0030237
- (63)A Highly Conserved Cryptic Epitope in the Receptor Binding Domains of SARS-CoV-2 and SARS-CoVScience 368:630–633https://doi.org/10.1126/science.abb7269
- (64)Antibody-Mediated Disruption of the SARS-CoV-2 Spike GlycoproteinNat. Commun 11https://doi.org/10.1038/s41467-020-19146-5
- (65)Neutralization of SARS-CoV-2 by Destruction of the Prefusion SpikeCell Host Microbe 28:445–454https://doi.org/10.1016/j.chom.2020.06.010
- (66)Template-Assisted De Novo Sequencing of SARS-CoV-2 and Influenza Monoclonal Antibodies by Mass SpectrometryJ. Proteome Res 21:1616–1627https://doi.org/10.1021/acs.jproteome.1c00913
- (67)Challenges in Homology Search: HMMER3 and Convergent Evolution of Coiled-Coil RegionsNucleic Acids Res 41:e121–e121https://doi.org/10.1093/nar/gkt263
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2025, Schulte et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 39
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.