Simultaneous polyclonal antibody sequencing and epitope mapping by cryo electron microscopy and mass spectrometry

eLife Assessment

The paper addresses the problem of optimising the mapping of serum antibody responses against a known antigen. The manuscript describes a method using EM polyclonal epitope mapping to help elucidate endogenous antibodies. The work is interesting and valuable to the fields of immunology and serology, and the strength of evidence to support its findings is considered solid.

https://doi.org/10.7554/eLife.101322.3.sa0

Significance of the findings:

Valuable: Findings that have theoretical or practical implications for a subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Solid: Methods, data and analyses broadly support the claims with only minor weaknesses

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Antibodies are a major component of adaptive immunity against invading pathogens. Here, we explore possibilities for an analytical approach to characterize the antigen-specific antibody repertoire directly from the secreted proteins in convalescent serum. This approach aims to perform simultaneous antibody sequencing and epitope mapping using a combination of single particle cryo-electron microscopy (cryoEM) and bottom-up proteomics techniques based on mass spectrometry (LC-MS/MS). We evaluate the performance of the deep-learning tool ModelAngelo in determining de novo antibody sequences directly from reconstructed 3D volumes of antibody-antigen complexes. We demonstrate that while map quality is a critical bottleneck, it is possible to sequence antibody variable domains from cryoEM reconstructions with accuracies of up to 80–90%. While the rate of errors exceeds the typical levels of somatic hypermutation, we show that the ModelAngelo-derived sequences can be used to assign the used V-genes. This provides a functional guide to assemble de novo peptides from LC-MS/MS data more accurately and improves the tolerance to a background of polyclonal antibody sequences. Following this proof-of-principle, we discuss the feasibility and future directions of this approach to characterize antigen-specific antibody repertoires.

Introduction

Adaptive immunity to invading pathogens is mediated to an important degree by antibodies (Bonilla and Oettgen, 2010; Rees, 2020; Burton, 2023; Lam et al., 2020). The available repertoire of antibodies is unique in individuals and constantly shifting in response to immunological pressure under which activation, selection, proliferation, maturation, and differentiation of the antibody-producing B-cells takes place (Lam et al., 2020; Tellier and Nutt, 2019; Pieper et al., 2013). Therefore, understanding the molecular mechanisms of antibody-mediated immunity requires knowledge of how a complex repertoire of polyclonal antibodies targets a diverse landscape of epitopes on their respective antigens.

The analytical challenge at hand is to resolve antigen-antibody interactions down to the pairwise contacts between specific amino acid residues in the epitope and paratope regions, respectively. This would enable reconstruction of the evolutionary pathways of somatic recombination and hypermutation that lead to high-affinity antigen binding. Conversely, this would also reveal how this selective immune pressure drives the evolutionary pathways of antigenic drift in targeted pathogens (Marks and Deane, 2020; Han et al., 2023; Vajda et al., 2021; White, 2021). Knowledge of the precise antibody sequences, as well as near-atomic details of the epitope-paratope interaction, are thus prerequisites to understand the coevolution between replicating pathogens and antibody-mediated adaptive immunity in the host.

Current methods to determine the antigen-specific antibody repertoire rely on single (memory) B-cell sorting, followed by targeted sequencing of the coding mRNAs for the heavy and light chains (Georgiou et al., 2014; Lavinder et al., 2015). This enables the production of recombinant monoclonal antibodies, whose epitopes may be mapped to near-atomic structural detail by X-ray crystallography and cryo electron microscopy (cryoEM). This approach has generated a wealth of information about antibody-antigen interactions, though it is biased by the limited pool of memory B-cells that it probes. Antibodies function as circulating glycoproteins in bodily fluid, secreted from plasma cells which are in turn derived from diverse pools of memory B cells located in various tissues, including bone marrow, spleen, lymph nodes, and only to a minor degree in blood (Inoue and Kurosaki, 2024; Meng et al., 2017; Akkaya et al., 2020). Serological assays aimed to determine binding and neutralization titers specifically look at the secreted antibody in bodily fluid, and it remains an outstanding question how this ‘serum compartment’ of the antibody repertoire relates both qualitatively and quantitatively to the minor population of memory B-cells found in peripheral blood. This calls for new analytical approaches that can derive both antibody sequence and epitope information straight from the secreted antibody product.

Such approaches have been developed in recent years, based on mass spectrometry and electron microscopy. First, using a bottom-up proteomics approach, antibody-derived peptides can be sequenced de novo from fragmentation spectra and assembled into full heavy/light chain sequences (Lavinder et al., 2015; de Graaf et al., 2022). Sequence accuracy is such that functional monoclonal antibodies can be reconstructed from the input data, and several reports have described successful sequencing efforts of human serum, milk, and urine-derived antibodies (Fridy et al., 2014; Peng et al., 2023b; Bondt et al., 2024; Peng et al., 2023a; Tran et al., 2016; Bondt et al., 2021; Sousa et al., 2012; Sen et al., 2017; Rickert et al., 2016; Savidor et al., 2017; Peng et al., 2021; Guthals et al., 2017; Cheung et al., 2012; Castellana et al., 2011; Bandeira et al., 2008; Dupré et al., 2021; Peng et al., 2025). Second, both hydrogen-deuterium exchange mass spectrometry and electron microscopy have been used to resolve a complex landscape of epitopes targeted by polyclonal antibody mixtures (Antanasijevic et al., 2022a; Antanasijevic et al., 2022b; Bangaru et al., 2022; Boyoglu-Barnum et al., 2021; Dingens et al., 2021; Grauslund et al., 2024; Han et al., 2021; Nogal et al., 2020; Ständer et al., 2021; Zhang et al., 2013; Vorauer et al., 2024). Ward and colleagues have reported that with the latter approach, which they coined Electron Microscopy based Polyclonal Epitope Mapping (EMPEM), they obtained near-atomic resolution reconstructions by cryoEM to resolve side-chain densities in the epitope-paratope region (Antanasijevic et al., 2022a). This opens the possibility to derive antibody sequence information and integrate this into the structural modelling of the interaction. In essence, the reconstructions might reveal which antibodies from the complex polyclonal mixture bind which epitopes on the antigens. The improved resolutions of current cryoEM approaches thus allow for a type of visual proteomics in which protein identity (i.e. antibody sequence) may be directly inferred from the reconstructed 3D volumes (Robinson et al., 2007; Leung et al., 2023; Gui et al., 2022; Gui et al., 2021; Schmidt et al., 2024; Fianu et al., 2024; Cingolani et al., 2024; Jiang et al., 2022; Hugener et al., 2024). While the pure sequencing accuracies from these approaches is obviously limited by resolution/map quality, many tools have been recently developed to infer protein identity in an automated fashion, including cryoID, DeepTracer-ID, findmysequence, and most recently ModelAngelo (Ho et al., 2020; Chang et al., 2022; Chojnowski et al., 2022; Jamali et al., 2024).

Here, we explore the use of ModelAngelo to derive de novo antibody sequences from experimental cryoEM density maps of antibody-antigen complexes. We have previously developed the software tool Stitch, which sorts and assembles MS-derived peptide sequences into full heavy/light chain sequences across complex repertoires (Schulte et al., 2022; Schulte and Snijder, 2024). We adapted Stitch to perform the same task on ModelAngelo-derived de novo models and test the accuracy of the approach on a benchmark of 164 publicly available cryoEM maps of monoclonal antibody-antigen pairs. We demonstrate that map quality is a critical bottleneck, but that antibody sequences can be derived with up to 80–90% accuracy. We test the utility of these sequences for assigning the used V-genes, which together with reconstruction of CDRH3 may offer a useful guide to assemble more accurate MS-derived peptide sequences (see Figure 1). We show that such EM-derived templates indeed improve MS-based sequencing accuracy in the context of complex antibody mixtures and that publicly available EMPEM reconstructions are of sufficient quality to leverage this approach. This proof-of-principle offers a promising perspective to integrate cryoEM and MS methods for a comprehensive characterization of the antibody repertoire on both sequence and epitope levels.

Figure 1

Download asset Open asset

Schematic workflow to estimate de novo antibody sequences by the integration of cryoEM and LC-MS/MS data with Stitch.

Results

To assess the feasibility of deriving de novo antibody sequences from experimental cryoEM density maps, we assembled a benchmark dataset from the Electron Microscopy Data Bank (see Supplementary file 1). To infer the antibody sequences, we chose the recently published deep-learning tool ModelAngelo, developed by Jamali, Scheres, and colleagues, as it is capable of inferring complete de novo models in cryoEM density maps without the need for user input sequences or main chain models (Ho et al., 2020; Chang et al., 2022; Chojnowski et al., 2022; Jamali et al., 2024). We searched the EMDB for published maps containing antigen-binding fragments (Fabs) of any species, at nominal resolutions of ≤4 Å, released after the training data for ModelAngelo was obtained. These results were filtered for maps that included a deposited atomic model and which contained only a single unique monoclonal Fab (of which multiple copies may be bound in the reconstructed antigen complex). The final benchmark includes 164 maps, including Fabs from human, rabbit, mouse, and macaque species. The maps were used as input data for ModelAngelo without user-provided sequences, yielding completely de novo atomic models of both antigen and Fabs.

The output models from ModelAngelo are typically fragmented to varying degrees depending on the local quality of the map. In addition, maps may contain multiple copies of the same unique Fab molecule. We therefore aimed to consolidate all fragments for the built Fabs into a single consensus sequence for the antibody variable domains of the heavy and light chains. We have previously developed the software Stitch, which performs assembly of LC-MS/MS derived de novo peptide reads into the correct framework of the heavy and light chains by alignment to germline-template sequences from the ImMunoGeneTics (IMGT) database (Manso et al., 2022; Lefranc, 2014; Lefranc et al., 2015). We adapted Stitch to use models from ModelAngelo (or any mmCIF file) as input data, extract the amino acid sequences, and perform the same template-based assembly with the resulting reads. Of the 164 input maps, 141 and 144 yielded a non-zero alignment score for the heavy and light chains in the Stitch result, respectively. These add up to a total of 152 maps which were analyzed further to evaluate the sequence accuracy of this approach (including 134 maps with a non-zero alignment score for both heavy and light chain).

Results for one of the top-scoring entries in the benchmark, an Influenza B virus neuraminidase tetramer in complex with four identical copies of a neutralizing Fab (Momont et al., 2023), is shown in Figure 2. The consensus sequences generated in Stitch have complete coverage of both heavy and light chain variable domains, including both CDRL3 and CDRH3. The de novo determined sequence is 84% and 86% identical to the true heavy and light chain sequences, respectively. The 15% rate of sequencing errors exceeds the typical levels of somatic hypermutation observed in mature antibody sequences, which are on the order of 1–10%. While the derived sequence should therefore not be taken at face value to reconstruct recombinant monoclonal antibodies, we reasoned that the accuracy is nevertheless likely sufficient to correctly infer the corresponding germline V-genes of the mature antibodies.

Figure 2

Download asset Open asset

Determining de novo antibody sequences from cryoEM data with ModelAngelo.

(A) Exemplary map (with top10% alignment scores of 1076/1174 for HC/LC) from the benchmark dataset, representing an Influenza B virus neuraminidase (NA) in complex with four copies of a neutralizing Fab, at global FSC resolution of 2.3 Å. Shown are the deposited map, model, and the de novo model generated by ModelAngelo, along with a detailed view of CDRH3. (B) Consensus sequences for heavy and light chains as generated by Stitch compared to the true sequences. Sequencing errors are indicated by an asterisk (*).

For each of the 152 maps, we compared the pairwise sequence identity between the top scoring V-genes from the ModelAngelo input in Stitch with the top scoring V-genes from the true sequences (as deposited in the corresponding PDB entry). For reference, we also calculated the pairwise sequence identities of all available V-gene templates per species, reflecting what a completely random draw from the available V-gene repertoire would look like. As shown in Figure 3, the top scoring V-genes from the ModelAngelo sequences have significantly higher pairwise identities than a random draw from the V-gene repertoire in IMGT for both the heavy and light chains (p<0.0001 in unpaired, two-tailed Kolmogorov Smirnov tests). Furthermore, the identity of the inferred V-gene with the true sequence scales with the alignment scores in Stitch, making it a valuable metric for the quality of the V-gene inference. The mean/median V-gene identity in a random draw from the IMGT repertoire is 0.59/0.52 and 0.56/0.53 for the heavy and light chains, respectively. With the ModelAngelo-derived sequences this improves to 0.78/0.87 and 0.75/0.85 for the heavy and light chains in the full dataset. This gradually improves with higher alignment scores to 0.89/1.00 and 0.88/1.00 for heavy and light chains, starting at a cutoff of 80 (representing 92/141 and 80/134 maps for heavy and light chain, respectively). The complete complementarity determining regions CDRH3 and CDRL3 were covered in 66 and 68 maps for the heavy and light chain, respectively, see Figure 4. The length of the complete antigen binding loops was estimated with an average error of 0.5±3.3 or 1.7±6.0 residues for heavy and light chain, with average sequence identities of 0.63 and 0.41. While CDRH3 is the more challenging region in MS-based approaches to antibody sequencing, we believe that the moderately better length and sequence accuracy of CDRH3 compared to CDRL3 in ModelAngelo output reflects the CDRH3’s notoriously tight involvement in antigen binding, hence a greater relative stability in the antibody-antigen complex, resulting in better order in the reconstructed EM density maps. We found the global FSC resolution of the input map to be a poor predictor of both the Stitch alignment score and the inferred V-gene identity, likely because it is dominated by the bulk of the antigen and not representative of the local resolution in the epitope-paratope region as shown in Figure 5. These results demonstrate that candidate V-genes for the antibodies resolved in cryoEM densities can be accurately narrowed down using ModelAngelo and Stitch and that a limited subset of maps contains accurate information on CDR3 sequence and length.

Figure 3

Download asset Open asset

V-gene assignment from ModelAngelo data.

(A) Correlation between Stitch alignment score and sequence identity between the top-scoring V-gene of the ModelAngelo vs PDB sequence of the heavy and light chain variable domains, as indicated with the non-parametric Spearman correlation coefficient. (B) Distribution of V-gene sequence identity for progressive alignment score cutoffs, compared to the pairwise V-gene sequence identity in the IMGT repertoire.

Figure 4

Download asset Open asset

Analysis of de novo CDR3 modeling in ModelAngelo-Stitch.

(A) Coverage of CDR3 for the heavy and light chain. CDR3 was counted for coverage if the de novo sequence spanned the flanking cysteine on the V gene and the tryptophan or phenylalanine on the J gene. Proportion of maps with CDR3 coverage in red/blue, maps with missing CDR3 in grey. (B) Difference in length between de novo modelled CDR3 vs. true sequence. (C) Sequence identity of de novo modelled CDR3 vs. true sequence.

Figure 5

Download asset Open asset

Correlation between global FSC resolution and Stitch alignment score (A) or inferred V-gene identity (B).

The non-parametric Spearman correlation coefficient is indicated for heavy and light chain.

In the context of polyclonal antibody mixtures, this analysis suggests that cryoEM densities of antigen-antibody complexes from EMPEM experiments can be leveraged to guide sequence assembly from complementary proteomics-based profiling of the same sample (see Figure 1). In such an experiment, reconstructed cryoEM densities would be used as input data for ModelAngelo, from which the sequences are extracted and run through Stitch to select the top-scoring V-gene and construct a placeholder sequence for CDR3 of both the heavy and light chain. These reconstructed variable domains may then act as templates to guide the assembly of de novo peptides from LC-MS/MS data to improve the accuracy of the candidate sequence.

As a proof-of-principle, we tested this on the monoclonal antibody CR3022, for which both a cryoEM reconstruction and LC-MS/MS data are publicly available (Figure 6). This antibody was isolated from a convalescent survivor of a SARS-CoV infection and targets a cryptic, but conserved epitope at the base of the Spike receptor-binding domain, with cross-neutralization to SARS-CoV-2 (ter Meulen et al., 2006; Yuan et al., 2020). The antibody consists of an IGHV5-51 heavy chain, paired with an IGKV4-1 light chain. When complexed with full-length SARS-CoV-2 Spike, its Fab induces an odd rearrangement of the Spike protomers to yield an antiparallel dimer of S1 subunits in the cryoEM reconstructions (Wrobel et al., 2020; Huo et al., 2020). When using this map as input for ModelAngelo and subsequently Stitch, the IGHV5-51 and IGKV4-1 germline sequences are correctly identified based on their alignment scores. Moreover, complete sequences for CDR3 of the heavy and light chains are built. When using these reconstructed variable domains as templates to guide assembly of the de novo peptide reads from the LC-MS/MS data published by Person and colleagues (Gadush et al., 2022), the final consensus sequences are 96% and 99% identical to the true heavy and light chain respectively. Of note, the only three remaining errors in the six CDR sequences are I/L assignments, which have identical masses and are notoriously challenging for MS-based sequencing.

Figure 6

Download asset Open asset

Sequencing CR3022 with integrated cryoEM and LC-MS/MS data.

(A) Shown are the deposited cryoEM map (global FSC resolution 4.1 Å), model, and de novo ModelAngelo output for the CR3022 Fab in complex with the SARS-CoV-2 Spike S1 subunit. The sequences were extracted from the de novo model and used as input for Stitch, resulting in the identification of the indicated V-genes and CDR3 sequences. These variable domains were used as templates in Stitch to assemble the LC-MS/MS derived de novo peptides. (B) Consensus sequences for CR3022 from the integrated cryoEM-MS data in Stitch compared to the true heavy and light chain sequences. Sequencing errors are indicated with an asterisk (*) .

For this CR3022 dataset, derived from the monoclonal antibody, the LC-MS/MS data alone assembled with Stitch against the full range of V-genes already yields a similar accuracy of 97% and 98% for the heavy and light chain, respectively. For the case of an EMPEM experiment, the challenge would rather be to correctly assemble the CR3022 sequence against a background of unrelated antibody sequences in a complex mixture. We therefore tested the utility of the ModelAngelo-derived templates for sequence assembly by also mixing input reads from LC-MS/MS data of unrelated antibodies (Figure 7). First, we mixed in a background of whole IgG from a hospitalized COVID-19 patient (Schulte et al., 2022), representing a diffuse polyclonal background in approximately a 1:1 ratio to the target input reads. Second, we mixed in a background of five additional unrelated anti-Influenza-HA monoclonal antibodies from the same study of Person and colleagues (Gadush et al., 2022), amounting to a 5:1 ratio of background to target input reads. In our previous work on sequencing serum-derived antibodies by bottom-up proteomics with Stitch, we found that assembly of the consensus sequences becomes much more tolerant to background data if, beyond the top scoring V-gene, the remaining unrelated V-genes are also included as decoys for the final template matching step (Schulte et al., 2022). The use of these decoy sequences is also included in the comparison here. Furthermore, we also included the use of the true CR3022 sequences as templates, serving as a best-case scenario, positive control. The analysis confirmed that even without ModelAngelo-derived templates, the sequence assembly with Stitch is already tolerant to the diffuse polyclonal background from the whole IgG fraction, yielding a similar accuracy as in the absence of these background peptides. By contrast, the sequence accuracy plummets to below 0.6 when the background consists of the five additional monoclonal antibodies in a 5:1 ratio to the target input reads. The accuracy is recovered to >0.95 when using either the true CR3022 sequences or the ModelAngelo-derived templates. There is also a gain in accuracy by using decoy templates with the LC-MS/MS data alone, although smaller compared to the use of ModelAngelo-derived templates. These results demonstrate that ModelAngelo-derived templates are useful for sequence assembly against complex polyclonal backgrounds.

Figure 7

Download asset Open asset

Targeted sequencing of CR3022 against a complex background of other antibodies.

Plotted are the de novo consensus sequence identities derived from the LC-MS/MS data using either the true sequences, the full IMGT repertoire, or the ModelAngelo-derived variable domains as templates. We compare the output from the CR3022 dataset alone (‘No backgr.’) with the output after adding either a diffuse polyclonal IgG background from a COVID-19 patient (‘+whole IgG’) or full datasets from five additional anti-Influenza-HA monoclonals (‘+5 mAbs’). Use of decoy sequences as indicated by dark/light colors.

Figure 8

Download asset Open asset

Inferring V-genes from published EMPEM data.

(A) Plotted are all non-zero alignment scores in Stitch from published EMPEM maps. (B) Views of the variable domains of EMPEM maps with alignment scores >50 for both heavy and light chains. The EMDB identifiers are indicated at each panel.

We have demonstrated that cryoEM reconstructions of monoclonal antigen-antibody complexes may contain sufficient information to accurately narrow down candidate V-genes and that this can be integrated with proteomics data to improve the accuracy of candidate sequences. We also evaluated whether EMPEM data is indeed of sufficient quality to infer V-genes from automated de novo modeling in the maps (Figure 8). We downloaded all 46 published EMPEM maps from EMDB, of which 23 were of sufficient quality to give a non-zero alignment score when using the ModelAngelo results in Stitch (see Supplementary file 2). Analysis of the benchmark of monoclonal antigen-antibody complexes showed that the quality of the V-gene inference scales with the alignment scores. In this set of EMPEM reconstructions, the alignment scores range from ca. 20–300. Of the 23 maps, 8 have alignment scores above 50 for both heavy and light chain, at which point we estimate the mean/median V-gene identity to be approximately 0.85/0.90. This analysis shows that experimental EMPEM studies may yield sufficiently detailed reconstructions of the antigen-antibody complexes to narrow down the candidate V-genes of the resolved Fabs accurately.

Discussion

The development of EMPEM and MS-based polyclonal antibody sequencing now make it possible to profile the antigen-specific antibody repertoire straight from the secreted pool of antibodies in bodily fluids. This approach bridges the gap between established single B-cell sequencing approaches and serological assays to probe binding and neutralization titers. While sample complexity remains an important bottleneck, and questions remain about the dynamic range of the true serum antibody repertoire and the depth of coverage from these novel experimental approaches, several studies have recently reached the important milestone of reconstructing functional antibodies from direct measurements of the secreted serum components (Fridy et al., 2014; Bondt et al., 2024; Peng et al., 2023a; Bondt et al., 2021; Sousa et al., 2012; Rickert et al., 2016; Guthals et al., 2017; Castellana et al., 2011; Antanasijevic et al., 2022a; Ferguson et al., 2024). The present work demonstrates that epitope and sequence information can be integrated using ModelAngelo and Stitch. This approach holds promise to better understand the serum compartment of the antibody repertoire. Use of the cryoEM data in these workflows complements the MS data beyond epitope mapping in several significant ways. First, we demonstrated that narrowing down the candidate V-genes improves the tolerance of LC-MS/MS-derived peptide sequence assembly to background in a complex antibody mixture. Second, heavy-light chain pairing is a problematic blind spot to proteomics sequencing, as the antibodies are denatured and digested as part of the sample workup. In contrast, this pairing is trivial in cryoEM data as the chains are in direct contact in the resolved Fabs in the map. Finally, reconstruction of CDRH3 is especially challenging with proteomics data alone, as it spans the junction between the recombined V-, D-, and J-segments, of which the D-segment is short and hypervariable to a point that germline sequences do not provide a functional template for sequence assembly. While CDRH3 coverage is also limited in ModelAngelo data, it can be built in many cases and a correct estimate of CDRH3 length is already useful to guide assembly of the de novo peptide reads.

Here, we have extracted the flat sequences from ModelAngelo output to use as input for the template matching step in Stitch, using a modified Smith-Waterman Alignment. We believe the template matching step in Stitch could be further improved in several ways. First, ModelAngelo has a built-in database search based on HMMer using profile Hidden Markov Models (HMM) encoding the amino acid probabilities across the alphabet at each position (Mistry et al., 2013). While the ModelAngelo search currently does not consolidate the fragmented output models into a single search to build consensus sequences, we anticipate that implementing a similar HMM profile search in Stitch may further improve the V-gene inference. Indeed, a recent report by Ward and colleagues demonstrates that ModelAngelo and its HMMer search could be implemented to correlate high-resolution EMPEM data with B-cell receptor mRNA sequencing data to speed up and improve success rates of antibody discovery compared to previously reported data analysis pipelines (Ferguson et al., 2024). In addition, from the benchmark dataset analyzed here we might learn what sequencing mistakes are common in ModelAngelo data. This could be used to adjust the conventional Smith-Waterman Alignment in Stitch accordingly, analogous to recent improvements in the alignment algorithm we implemented for MS data (Schulte and Snijder, 2024). Finally, the template matching step in Stitch is now solely based on the sequence of the ModelAngelo models, but we may expand this to a structure-based alignment to better place the error-prone sequence reads in the correct framework of the Ig-domains.

Next steps in our efforts are to bring together the EMPEM work and MS-based polyclonal antibody sequencing on antigen-Fab complexes purified from patient serum. The goal is to reconstruct functional monoclonal antibodies from these analyses as ultimate proof for the accuracy of the derived antibody sequences and then start working on the throughput and depth of coverage in the repertoire. We believe that both EMPEM and MS-based polyclonal antibody sequencing are still limited to the top 1–10 antibodies in the polyclonal mixture. The EMPEM approach is biased towards bigger and well-ordered target antigens, which calls for additional complementary approaches like HDX-MS for a comprehensive polyclonal epitope mapping exercise. Bringing together these perspectives in an integrated structural biology approach promises new insights into the serum compartment of the antibody repertoire to better understand the coevolutionary processes of antibody maturation and antigenic drift.

Materials and methods

Benchmark of monoclonal antibody-antigen pairs and EMPEM maps from EMDB

Request a detailed protocol

We searched the EMDB for published maps containing antigen-binding fragments (Fabs) of any species, at nominal resolutions of ≤4 Ångström, released after the training data for ModelAngelo was obtained (April 1st 2022). The search was performed on February 11th 2023 at https://www.ebi.ac.uk/emdb/ using the search term ‘antibody fab resolution:[* TO 4] AND release_date:[2022-03-31T00:00:00Z TO 2023-11-02T00:00:00Z] sample_name:*fab* fitted_pdbs:[* TO *]’. These results were filtered for maps that included a deposited atomic model and which contained only a single unique monoclonal Fab (of which multiple copies may be bound in the reconstructed antigen complex). When multiple redundant maps from the same study were included, we selected the single representative map of the highest quality, based on manual inspection. Global FSC resolution was not a good indicator as the maps were often dominated by the bound antigen, which may be better resolved than the bound antibody. Typically, the selected map was the focused/local refinement around the epitope-paratope region, despite its lower nominal resolution. A full overview of the selected maps is provided in Supplementary file 1. The EMPEM maps were compiled based on a literature survey, complemented with a search of the EMDB using the term ‘polyclonal’. A full overview of the selected EMPEM maps is provided in Supplementary file 2.

Changes made to Stitch to take CIF input

Request a detailed protocol

Stitch was extended to allow mmCIF files as input. From these files all chains were extracted as separate amino acid sequences to align in Stitch. ModelAngelo outputs a confidence score per residue in the B-factor column of the mmCIF input, which was used as a local confidence for the sequence. First, each polypeptide gets assigned an Average Local Confidence (ALC) score based on the average across all residues, which can be used as an input filter on the data (along with polypeptide length). Second, the local confidence is used as weight in determining the consensus sequence of overlapping polypeptides, following assembly in Stitch.

Analysis of monoclonal antigen-antibody and EMPEM benchmarks

Request a detailed protocol

Software used in this project was curated by SBGrid (Morin et al., 2013). The deposited model mmCIF and EM maps for the full benchmark were downloaded and ran with ModelAngelo (version 1.01) in ‘build_no_seq’ mode. For each entry in the benchmark the deposited mmCIF was run with Stitch (version 1.5.0-rc.1+6d3b540) using CutoffALC 80, minimum length 5 and TemplateMatching CutoffScore 8. The same was done for each mmCIF file produced by ModelAngelo but with an additional segment containing the antigen and Ig constant domain template sequences. From these runs, the consensus sequence and highest scoring germline for IGHV and IGLV (lambda +kappa) were retrieved. For each consensus sequence, the CDR3 was determined if the flanking cysteine on the V gene and the tryptophan or phenylalanine on the J gene were present. As these conserved residues were not all positioned correctly in the IMGT database, the data was manually fixed based on the same rules. The data from the deposited and produced Stitch runs was compared to produce the identity between the consensus sequences, distance between the inferred germlines, and identity between the CDR3s. The script used for this analysis is deposited as Supplementary Data. The results generated by this analysis is included in Supplementary file 1. The EMPEM benchmark was downloaded, ran through ModelAngelo and subsequently Stitch with identical parameters as above. In contrast to the benchmark detailed before, the ground-truth sequences of these antibodies is not known.

Analysis of CR3022 data

Request a detailed protocol

The EM data for CR3022 was downloaded from EMDB (EMD-11648) and run with ModelAngelo (version 1.01) in ‘build_no_seq’ mode. The raw data for monoclonal antibodies CR3022, 107, 1028, 2771, 3576, and 3634 from PRIDE PXD030094 was downloaded and analyzed with PEAKS 10+. These listed monoclonal antibodies were chosen because these are the five most distant sequences from CR3022 and therefore present the biggest challenge to sequence assembly in Stitch (version 1.5.0-rc.1+6d3b540). The PEAKS analysis for the COVID-19 data from PRIDE PXD031941 was downloaded. Three sets of input were prepared: ‘no background’ consisting of only the CR3022 data, ‘+whole IgG’ consisting of the CR3022 and the COVID-19 data, and ‘+5 mAbs’ which consists of all mAbs from the CR3022 study. Three Stitch configurations where prepared: ‘True’ using the known CR3022 sequence as template as retrieved from PDB 7A5R, ‘IMGT’ using the conventional configuration of Stitch with all IMGT germlines as templates, and ‘MA’ using the closest V gene germline to the ModelAngelo consensus sequence of CR3022 together with the CDR3 sequence present. Each of these configurations were run with Stitch Recombine Decoy off and on, with this on for ‘True’ and ‘MA’ the full IMGT germline database was added and for ‘IMGT’ the Stitch parameter Decoy in Recombine was set allowing any unused germline from the Template Matching step to matched in the Recombine step. For all Stitch runs, the CutoffALC was 90 and the TemplateMatching CutoffScore 10. The resulting consensus sequences from these 18 Stitch runs were then compared with the known CR3022 sequence to determine the identity. The full script used for this analysis is included in the deposited Supplementary Data.

Data availability

Stitch is available at https://github.com/snijderlab/stitch (copy archived at Schulte et al., 2024). The CR3022 and COVID-19 whole IgG LC-MS/MS data were taken from PRIDE Archive (https://www.ebi.ac.uk/pride/archive/) via the PRIDE partner repository with the data set identifiers PXD030094 and PXD031941, respectively. All ModelAngelo and Stitch results, including a script to reproduce the full analysis, are made available on Zenodo under https://zenodo.org/records/12207014.

The following data sets were generated

(2024) Zenodo
Supplementary data - Simultaneous polyclonal antibody sequencing and epitope mapping by cryo electron microscopy and mass spectrometry - a perspective.

https://doi.org/10.5281/zenodo.12207014

The following previously published data sets were used

1. Faull P
2. Person M
(2022) PRIDE
ID PXD030094. De Novo Sequencing of SARS-CoV-2 and influenza monoclonal antibodies by mass spectrometry using HCD and EThcD fragmentation and Supernovo software.

https://www.ebi.ac.uk/pride/archive/projects/PXD030094
1. Snijder J
(2022) PRIDE
ID PXD031941. Template-based assembly of proteomic short reads for de novo antibody sequencing and repertoire profiling.

https://www.ebi.ac.uk/pride/archive/projects/PXD031941

References

(2020) B cell memory: building two walls of protection against pathogens
Nature Reviews. Immunology 20:229–238.

https://doi.org/10.1038/s41577-019-0244-2
- PubMed
- Google Scholar
1. Antanasijevic A
2. Bowman CA
3. Kirchdoerfer RN
4. Cottrell CA
5. Ozorowski G
6. Upadhyay AA
7. Cirelli KM
8. Carnathan DG
9. Enemuo CA
10. Sewall LM
11. Nogal B
12. Zhao F
13. Groschel B
14. Schief WR
15. Sok D
16. Silvestri G
17. Crotty S
18. Bosinger SE
19. Ward AB
(2022a) From structure to sequence: Antibody discovery using cryoEM
Science Advances 8:eabk2039.

https://doi.org/10.1126/sciadv.abk2039
- PubMed
- Google Scholar
(2022b) High-resolution structural analysis of enterovirus-reactive polyclonal antibodies in complex with whole virions
PNAS Nexus 1:gac253.

https://doi.org/10.1093/pnasnexus/pgac253
- PubMed
- Google Scholar
1. Bandeira N
2. Pham V
3. Pevzner P
4. Arnott D
5. Lill JR
(2008) Automated de novo protein sequencing of monoclonal antibodies
Nature Biotechnology 26:1336–1338.

https://doi.org/10.1038/nbt1208-1336
- PubMed
- Google Scholar
1. Bangaru S
2. Antanasijevic A
3. Kose N
4. Sewall LM
5. Jackson AM
6. Suryadevara N
7. Zhan X
8. Torres JL
9. Copps J
10. de la Peña AT
11. Crowe JE Jr
12. Ward AB
(2022) Structural mapping of antibody landscapes to human betacoronavirus spike proteins
Science Advances 8:eabn2911.

https://doi.org/10.1126/sciadv.abn2911
- PubMed
- Google Scholar
1. Bondt A
2. Hoek M
3. Tamara S
4. de Graaf B
5. Peng W
6. Schulte D
7. van Rijswijck DMH
8. den Boer MA
9. Greisch JF
10. Varkila MRJ
11. Snijder J
12. Cremer OL
13. Bonten MJM
14. Heck AJR
(2021) Human plasma IgG1 repertoires are simple, unique, and dynamic
Cell Systems 12:1131–1143.

https://doi.org/10.1016/j.cels.2021.08.008
- PubMed
- Google Scholar
1. Bondt A
2. Hoek M
3. Dingess K
4. Tamara S
5. de Graaf B
6. Peng W
7. den Boer MA
8. Damen M
9. Zwart C
10. Barendregt A
11. van Rijswijck DMH
12. Schulte D
13. Grobben M
14. Tejjani K
15. van Rijswijk J
16. Völlmy F
17. Snijder J
18. Fortini F
19. Papi A
20. Volta CA
21. Campo G
22. Contoli M
23. van Gils MJ
24. Spadaro S
25. Rizzo P
26. Heck AJR
(2024) Into the dark serum proteome: personalized features of IgG1 and IgA1 repertoires in severe COVID-19 patients
Molecular & Cellular Proteomics 23:100690.

https://doi.org/10.1016/j.mcpro.2023.100690
- PubMed
- Google Scholar
1. Bonilla FA
2. Oettgen HC
(2010) Adaptive immunity
The Journal of Allergy and Clinical Immunology 125:S33–S40.

https://doi.org/10.1016/j.jaci.2009.09.017
- PubMed
- Google Scholar
1. Boyoglu-Barnum S
2. Ellis D
3. Gillespie RA
4. Hutchinson GB
5. Park YJ
6. Moin SM
7. Acton OJ
8. Ravichandran R
9. Murphy M
10. Pettie D
11. Matheson N
12. Carter L
13. Creanga A
14. Watson MJ
15. Kephart S
16. Ataca S
17. Vaile JR
18. Ueda G
19. Crank MC
20. Stewart L
21. Lee KK
22. Guttman M
23. Baker D
24. Mascola JR
25. Veesler D
26. Graham BS
27. King NP
28. Kanekiyo M
(2021) Quadrivalent influenza nanoparticle vaccines induce broad protection
Nature 592:623–628.

https://doi.org/10.1038/s41586-021-03365-x
- PubMed
- Google Scholar
1. Burton DR
(2023) Antiviral neutralizing antibodies: from in vitro to in vivo activity
Nature Reviews. Immunology 23:720–734.

https://doi.org/10.1038/s41577-023-00858-w
- PubMed
- Google Scholar
1. Castellana NE
2. McCutcheon K
3. Pham VC
4. Harden K
5. Nguyen A
6. Young J
7. Adams C
8. Schroeder K
9. Arnott D
10. Bafna V
11. Grogan JL
12. Lill JR
(2011) Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody
PROTEOMICS 11:395–405.

https://doi.org/10.1002/pmic.201000487
- PubMed
- Google Scholar
1. Chang L
2. Wang F
3. Connolly K
4. Meng H
5. Su Z
6. Cvirkaite-Krupovic V
7. Krupovic M
8. Egelman EH
9. Si DD-I
(2022) DeepTracer-ID: De novo protein identification from cryo-EM maps
Biophysical Journal 121:2840–2848.

https://doi.org/10.1016/j.bpj.2022.06.025
- PubMed
- Google Scholar
1. Cheung WC
2. Beausoleil SA
3. Zhang X
4. Sato S
5. Schieferl SM
6. Wieler JS
7. Beaudet JG
8. Ramenani RK
9. Popova L
10. Comb MJ
11. Rush J
12. Polakiewicz RD
(2012) A proteomics approach for the identification and cloning of monoclonal antibodies from serum
Nature Biotechnology 30:447–452.

https://doi.org/10.1038/nbt.2167
- PubMed
- Google Scholar
(2022) findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM
IUCrJ 9:86–97.

https://doi.org/10.1107/S2052252521011088
- PubMed
- Google Scholar
Preprint
1. Cingolani G
2. Iglesias S
3. Hou CF
4. Lemire S
5. Soriaga A
6. Kyme P
(2024) Cryo-EM analysis of Pseudomonas phage Pa193 structural components
Research Square.

https://doi.org/10.21203/rs.3.rs-4189479/v1
- Google Scholar
1. de Graaf SC
2. Hoek M
3. Tamara S
4. Heck AJR
(2022) A perspective toward mass spectrometry-based de novo sequencing of endogenous antibodies
mAbs 14:2079449.

https://doi.org/10.1080/19420862.2022.2079449
- PubMed
- Google Scholar
1. Dingens AS
2. Pratap P
3. Malone K
4. Hilton SK
5. Ketas T
6. Cottrell CA
7. Overbaugh J
8. Moore JP
9. Klasse PJ
10. Ward AB
11. Bloom JD
(2021) High-resolution mapping of the neutralizing and binding specificities of polyclonal sera post-HIV Env trimer vaccination
eLife 10:e64281.

https://doi.org/10.7554/eLife.64281
- PubMed
- Google Scholar
1. Dupré M
2. Duchateau M
3. Sternke-Hoffmann R
4. Boquoi A
5. Malosse C
6. Fenk R
7. Haas R
8. Buell AK
9. Rey M
10. Chamot-Rooke J
(2021) De Novo sequencing of antibody light chain proteoforms from patients with multiple myeloma
Analytical Chemistry 93:10627–10634.

https://doi.org/10.1021/acs.analchem.1c01955
- PubMed
- Google Scholar
1. Ferguson JA
2. Raghavan SSR
3. Peña Alzua G
4. Bhavsar D
5. Huang J
6. Rodriguez AJ
7. Torres JL
8. Bottermann M
9. Han J
10. Krammer F
11. Batista FD
12. Ward AB
(2024) Functional and epitope specific monoclonal antibody discovery directly from immune sera using cryoEM
Immunology 01:27063.

https://doi.org/10.1101/2024.12.06.627063
- Google Scholar
1. Fianu I
2. Ochmann M
3. Walshe JL
4. Dybkov O
5. Cruz JN
6. Urlaub H
7. Cramer P
(2024) Structural basis of Integrator-dependent RNA polymerase II termination
Nature 629:219–227.

https://doi.org/10.1038/s41586-024-07269-4
- PubMed
- Google Scholar
1. Fridy PC
2. Li Y
3. Keegan S
4. Thompson MK
5. Nudelman I
6. Scheid JF
7. Oeffinger M
8. Nussenzweig MC
9. Fenyö D
10. Chait BT
11. Rout MP
(2014) A robust pipeline for rapid production of versatile nanobody repertoires
Nature Methods 11:1253–1260.

https://doi.org/10.1038/nmeth.3170
- PubMed
- Google Scholar
(2022) Template-assisted De Novo sequencing of SARS-CoV-2 and influenza monoclonal antibodies by mass spectrometry
Journal of Proteome Research 21:1616–1627.

https://doi.org/10.1021/acs.jproteome.1c00913
- PubMed
- Google Scholar
(2014) The promise and challenge of high-throughput sequencing of the antibody repertoire
Nature Biotechnology 32:158–168.

https://doi.org/10.1038/nbt.2782
- PubMed
- Google Scholar
1. Grauslund LR
2. Ständer S
3. Veggi D
4. Andreano E
5. Rand KD
6. Norais N
(2024) Epitope mapping of human polyclonal antibodies to the fHbp antigen of a neisseria meningitidis vaccine by hydrogen-deuterium exchange mass spectrometry (HDX-MS)
Molecular & Cellular Proteomics 23:100734.

https://doi.org/10.1016/j.mcpro.2024.100734
- PubMed
- Google Scholar
1. Gui M
2. Ma M
3. Sze-Tu E
4. Wang X
5. Koh F
6. Zhong ED
7. Berger B
8. Davis JH
9. Dutcher SK
10. Zhang R
11. Brown A
(2021) Structures of radial spokes and associated complexes important for ciliary motility
Nature Structural & Molecular Biology 28:29–37.

https://doi.org/10.1038/s41594-020-00530-0
- PubMed
- Google Scholar
1. Gui M
2. Wang X
3. Dutcher SK
4. Brown A
5. Zhang R
(2022) Ciliary central apparatus structure reveals mechanisms of microtubule patterning
Nature Structural & Molecular Biology 29:483–492.

https://doi.org/10.1038/s41594-022-00770-2
- PubMed
- Google Scholar
1. Guthals A
2. Gan Y
3. Murray L
4. Chen Y
5. Stinson J
6. Nakamura G
7. Lill JR
8. Sandoval W
9. Bandeira N
(2017) De Novo MS/MS sequencing of native human antibodies
Journal of Proteome Research 16:45–54.

https://doi.org/10.1021/acs.jproteome.6b00608
- PubMed
- Google Scholar
1. Han J
2. Schmitz AJ
3. Richey ST
4. Dai YN
5. Turner HL
6. Mohammed BM
7. Fremont DH
8. Ellebedy AH
9. Ward AB
(2021) Polyclonal epitope mapping reveals temporal dynamics and diversity of human antibody responses to H5N1 vaccination
Cell Reports 34:108682.

https://doi.org/10.1016/j.celrep.2020.108682
- PubMed
- Google Scholar
(2023) Co-evolution of immunity and seasonal influenza viruses
Nature Reviews. Microbiology 21:805–817.

https://doi.org/10.1038/s41579-023-00945-8
- PubMed
- Google Scholar
1. Ho CM
2. Li X
3. Lai M
4. Terwilliger TC
5. Beck JR
6. Wohlschlegel J
7. Goldberg DE
8. Fitzpatrick AWP
9. Zhou ZH
(2020) Bottom-up structural proteomics: cryoEM of protein complexes enriched from the cellular milieu
Nature Methods 17:79–85.

https://doi.org/10.1038/s41592-019-0637-y
- PubMed
- Google Scholar
1. Hugener J
2. Xu J
3. Wettstein R
4. Ioannidi L
5. Velikov D
6. Wollweber F
7. Henggeler A
8. Matos J
9. Pilhofer M
(2024) FilamentID reveals the composition and function of metabolic enzyme polymers during gametogenesis
Cell 187:3303–3318.

https://doi.org/10.1016/j.cell.2024.04.026
- PubMed
- Google Scholar
1. Huo J
2. Zhao Y
3. Ren J
4. Zhou D
5. Duyvesteyn HME
6. Ginn HM
7. Carrique L
8. Malinauskas T
9. Ruza RR
10. Shah PNM
11. Tan TK
12. Rijal P
13. Coombes N
14. Bewley KR
15. Tree JA
16. Radecke J
17. Paterson NG
18. Supasa P
19. Mongkolsapaya J
20. Screaton GR
21. Carroll M
22. Townsend A
23. Fry EE
24. Owens RJ
25. Stuart DI
(2020) Neutralization of SARS-CoV-2 by destruction of the prefusion spike
Cell Host & Microbe 28:445–454.

https://doi.org/10.1016/j.chom.2020.06.010
- PubMed
- Google Scholar
1. Inoue T
2. Kurosaki T
(2024) Memory B cells
Nature Reviews. Immunology 24:5–17.

https://doi.org/10.1038/s41577-023-00897-3
- PubMed
- Google Scholar
1. Jamali K
2. Käll L
3. Zhang R
4. Brown A
5. Kimanius D
6. Scheres SHW
(2024) Automated model building and protein identification in cryo-EM maps
Nature 628:450–457.

https://doi.org/10.1038/s41586-024-07215-4
- PubMed
- Google Scholar
1. Jiang YX
2. Cao Q
3. Sawaya MR
4. Abskharon R
5. Ge P
6. DeTure M
7. Dickson DW
8. Fu JY
9. Ogorzalek Loo RR
10. Loo JA
11. Eisenberg DS
(2022) Amyloid fibrils in FTLD-TDP are composed of TMEM106B and not TDP-43
Nature 605:304–309.

https://doi.org/10.1038/s41586-022-04670-9
- PubMed
- Google Scholar
(2020) B cell activation and response regulation during viral infections
Viral Immunology 33:294–306.

https://doi.org/10.1089/vim.2019.0207
- PubMed
- Google Scholar
(2015) Next-generation sequencing and protein mass spectrometry for the comprehensive analysis of human cellular and serum antibody repertoires
Current Opinion in Chemical Biology 24:112–120.

https://doi.org/10.1016/j.cbpa.2014.11.007
- PubMed
- Google Scholar
1. Lefranc MP
(2014) Immunoglobulin and T cell receptor genes: IMGT and the birth and rise of immunoinformatics
Frontiers in Immunology 5:22.

https://doi.org/10.3389/fimmu.2014.00022
- PubMed
- Google Scholar
(2015) IMGT, the international ImMunoGeneTics information system 25 years on
Nucleic Acids Research 43:D413–D422.

https://doi.org/10.1093/nar/gku1056
- PubMed
- Google Scholar
1. Leung MR
2. Zeng J
3. Wang X
4. Roelofs MC
5. Huang W
6. Zenezini Chiozzi R
7. Hevler JF
8. Heck AJR
9. Dutcher SK
10. Brown A
11. Zhang R
12. Zeev-Ben-Mordehai T
(2023) Structural specializations of the sperm tail
Cell 186:2880–2896.

https://doi.org/10.1016/j.cell.2023.05.026
- PubMed
- Google Scholar
1. Manso T
2. Folch G
3. Giudicelli V
4. Jabado-Michaloud J
5. Kushwaha A
6. Nguefack Ngoune V
7. Georga M
8. Papadaki A
9. Debbagh C
10. Pégorier P
11. Bertignac M
12. Hadi-Saljoqi S
13. Chentli I
14. Cherouali K
15. Aouinti S
16. El Hamwi A
17. Albani A
18. Elazami Elhassani M
19. Viart B
20. Goret A
21. Tran A
22. Sanou G
23. Rollin M
24. Duroux P
25. Kossida S
(2022) IMGT databases, related tools and web resources through three main axes of research and development
Nucleic Acids Research 50:D1262–D1272.

https://doi.org/10.1093/nar/gkab1136
- PubMed
- Google Scholar
1. Marks C
2. Deane CM
(2020) How repertoire data are changing antibody science
The Journal of Biological Chemistry 295:9823–9837.

https://doi.org/10.1074/jbc.REV120.010181
- PubMed
- Google Scholar
1. Meng W
2. Zhang B
3. Schwartz GW
4. Rosenfeld AM
5. Ren D
6. Thome JJC
7. Carpenter DJ
8. Matsuoka N
9. Lerner H
10. Friedman AL
11. Granot T
12. Farber DL
13. Shlomchik MJ
14. Hershberg U
15. Luning Prak ET
(2017) An atlas of B-cell clonal distribution in the human body
Nature Biotechnology 35:879–884.

https://doi.org/10.1038/nbt.3942
- PubMed
- Google Scholar
1. Mistry J
2. Finn RD
3. Eddy SR
4. Bateman A
5. Punta M
(2013) Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions
Nucleic Acids Research 41:e121.

https://doi.org/10.1093/nar/gkt263
- PubMed
- Google Scholar
1. Momont C
2. Dang HV
3. Zatta F
4. Hauser K
5. Wang C
6. di Iulio J
7. Minola A
8. Czudnochowski N
9. De Marco A
10. Branch K
11. Donermeyer D
12. Vyas S
13. Chen A
14. Ferri E
15. Guarino B
16. Powell AE
17. Spreafico R
18. Yim SS
19. Balce DR
20. Bartha I
21. Meury M
22. Croll TI
23. Belnap DM
24. Schmid MA
25. Schaiff WT
26. Miller JL
27. Cameroni E
28. Telenti A
29. Virgin HW
30. Rosen LE
31. Purcell LA
32. Lanzavecchia A
33. Snell G
34. Corti D
35. Pizzuto MS
(2023) A pan-influenza antibody inhibiting neuraminidase via receptor mimicry
Nature 618:590–597.

https://doi.org/10.1038/s41586-023-06136-y
- PubMed
- Google Scholar
1. Morin A
2. Eisenbraun B
3. Key J
4. Sanschagrin PC
5. Timony MA
6. Ottaviano M
7. Sliz P
(2013) Collaboration gets the most out of software
eLife 2:e01456.

https://doi.org/10.7554/eLife.01456
- PubMed
- Google Scholar
1. Nogal B
2. Bianchi M
3. Cottrell CA
4. Kirchdoerfer RN
5. Sewall LM
6. Turner HL
7. Zhao F
8. Sok D
9. Burton DR
10. Hangartner L
11. Ward AB
(2020) Mapping polyclonal antibody responses in non-human primates vaccinated with HIV env trimer subunit vaccines
Cell Reports 30:3755–3765.

https://doi.org/10.1016/j.celrep.2020.02.061
- PubMed
- Google Scholar
(2021) Mass spectrometry-based de novo sequencing of monoclonal antibodies using multiple proteases and a dual fragmentation scheme
Journal of Proteome Research 20:3559–3566.

https://doi.org/10.1021/acs.jproteome.1c00169
- PubMed
- Google Scholar
1. Peng W
2. den Boer MA
3. Tamara S
4. Mokiem NJ
5. van der Lans SPA
6. Bondt A
7. Schulte D
8. Haas P-J
9. Minnema MC
10. Rooijakkers SHM
11. van Zuilen AD
12. Heck AJR
13. Snijder J
(2023a) Direct mass spectrometry-based detection and antibody sequencing of monoclonal gammopathy of undetermined significance from patient serum: a case study
Journal of Proteome Research 22:3022–3028.

https://doi.org/10.1021/acs.jproteome.3c00330
- PubMed
- Google Scholar
Preprint
1. Peng W
2. Giesbers KCAP
3. Šiborová M
4. Beugelink JW
5. Pronker MF
6. Schulte D
7. Hilkens J
8. Janssen BJC
9. Strijbis K
10. Snijder J
(2023b) Reverse engineering the anti-MUC1 hybridoma antibody 139H2 by mass spectrometry-based de Novo sequencing
bioRxiv.

https://doi.org/10.1101/2023.07.05.547778
- Google Scholar
Preprint
1. Peng W
2. Šiborová M
3. Wu X
4. Du W
5. Schulte D
6. Pronker MF
7. de Haan CAM
8. Snijder J
(2025) Structural basis for postfusion-specific binding to respiratory syncytial virus f protein by the canonical antigenic site I antibody 131-2a
bioRxiv.

https://doi.org/10.1101/2025.01.04.631317
- Google Scholar
(2013) B-cell biology and development
The Journal of Allergy and Clinical Immunology 131:959–971.

https://doi.org/10.1016/j.jaci.2013.01.046
- PubMed
- Google Scholar
1. Rees AR
(2020) Understanding the human antibody repertoire
mAbs 12:1729683.

https://doi.org/10.1080/19420862.2020.1729683
- PubMed
- Google Scholar
1. Rickert KW
2. Grinberg L
3. Woods RM
4. Wilson S
5. Bowen MA
6. Baca M
(2016) Combining phage display with de novo protein sequencing for reverse engineering of monoclonal antibodies
mAbs 8:501–512.

https://doi.org/10.1080/19420862.2016.1145865
- PubMed
- Google Scholar
(2007) The molecular sociology of the cell
Nature 450:973–982.

https://doi.org/10.1038/nature06523
- PubMed
- Google Scholar
1. Savidor A
2. Barzilay R
3. Elinger D
4. Yarden Y
5. Lindzen M
6. Gabashvili A
7. Adiv Tal O
8. Levin Y
(2017) Database-independent protein sequencing (DiPS) enables full-length de Novo protein and antibody sequence determination
Molecular & Cellular Proteomics 16:1151–1161.

https://doi.org/10.1074/mcp.O116.065417
- PubMed
- Google Scholar
1. Schmidt L
2. Tüting C
3. Kyrilis FL
4. Hamdi F
5. Semchonok DA
6. Hause G
7. Meister A
8. Ihling C
9. Stubbs MT
10. Sinz A
11. Kastritis PL
(2024) Delineating organizational principles of the endogenous L-A virus by cryo-EM and computational analysis of native cell extracts
Communications Biology 7:557.

https://doi.org/10.1038/s42003-024-06204-7
- PubMed
- Google Scholar
(2022) Template-based assembly of proteomic short reads for De Novo antibody sequencing and repertoire profiling
Analytical Chemistry 94:10391–10399.

https://doi.org/10.1021/acs.analchem.2c01300
- PubMed
- Google Scholar
Preprint
1. Schulte D
2. Snijder J
(2024) A Handle on Mass Coincidence Errors in de Novo sequencing of antibodies by bottom-up proteomics
bioRxiv.

https://doi.org/10.1101/2024.02.20.581155
- Google Scholar
Software
(2024) Stitch, version swh:1:rev:f92145a5e2938da75a6c47d6b900ec704a8a174f
Software Heritage.

https://archive.softwareheritage.org/swh:1:dir:850238ba7f8e3492e84f204726f28e832bb998b9;origin=https://github.com/snijderlab/stitch;visit=swh:1:snp:e7896e1c729296b8dddcccfa9715bbdc7fc20f3b;anchor=swh:1:rev:f92145a5e2938da75a6c47d6b900ec704a8a174f
1. Sen KI
2. Tang WH
3. Nayak S
4. Kil YJ
5. Bern M
6. Ozoglu B
7. Ueberheide B
8. Davis D
9. Becker C
(2017) Automated antibody de novo sequencing and its utility in biopharmaceutical discovery
Journal of the American Society for Mass Spectrometry 28:803–810.

https://doi.org/10.1007/s13361-016-1580-0
- PubMed
- Google Scholar
1. Sousa E
2. Olland S
3. Shih HH
4. Marquette K
5. Martone R
6. Lu Z
7. Paulsen J
8. Gill D
9. He T
(2012) Primary sequence determination of a monoclonal antibody against α-synuclein using a novel mass spectrometry-based approach
International Journal of Mass Spectrometry 312:61–69.

https://doi.org/10.1016/j.ijms.2011.05.005
- Google Scholar
(2021) Epitope mapping of polyclonal antibodies by hydrogen-deuterium exchange mass spectrometry (HDX-MS)
Analytical Chemistry 93:11669–11678.

https://doi.org/10.1021/acs.analchem.1c00696
- PubMed
- Google Scholar
1. Tellier J
2. Nutt SL
(2019) Plasma cells: the programming of an antibody-secreting machine
European Journal of Immunology 49:30–37.

https://doi.org/10.1002/eji.201847517
- PubMed
- Google Scholar
1. ter Meulen J
2. van den Brink EN
3. Poon LLM
4. Marissen WE
5. Leung CSW
6. Cox F
7. Cheung CY
8. Bakker AQ
9. Bogaards JA
10. van Deventer E
11. Preiser W
12. Doerr HW
13. Chow VT
14. de Kruif J
15. Peiris JSM
16. Goudsmit J
(2006) Human monoclonal antibody combination against SARS coronavirus: synergy and coverage of escape mutants
PLOS Medicine 3:e237.

https://doi.org/10.1371/journal.pmed.0030237
- PubMed
- Google Scholar
1. Tran NH
2. Rahman MZ
3. He L
4. Xin L
5. Shan B
6. Li M
(2016) Complete de novo assembly of monoclonal antibody sequences
Scientific Reports 6:31730.

https://doi.org/10.1038/srep31730
- PubMed
- Google Scholar
(2021) Progress toward improved understanding of antibody maturation
Current Opinion in Structural Biology 67:226–231.

https://doi.org/10.1016/j.sbi.2020.11.008
- PubMed
- Google Scholar
1. Vorauer C
2. Boniche-Alfaro C
3. Murphree T
4. Matsui T
5. Weiss T
6. Fries BC
7. Guttman M
(2024) Direct mapping of polyclonal epitopes in serum by HDX-MS
Analytical Chemistry 96:16758–16767.

https://doi.org/10.1021/acs.analchem.4c03274
- PubMed
- Google Scholar
1. White HN
(2021) B-Cell memory responses to variant viral antigens
Viruses 13:565.

https://doi.org/10.3390/v13040565
- PubMed
- Google Scholar
1. Wrobel AG
2. Benton DJ
3. Hussain S
4. Harvey R
5. Martin SR
6. Roustan C
7. Rosenthal PB
8. Skehel JJ
9. Gamblin SJ
(2020) Antibody-mediated disruption of the SARS-CoV-2 spike glycoprotein
Nature Communications 11:5337.

https://doi.org/10.1038/s41467-020-19146-5
- PubMed
- Google Scholar
1. Yuan M
2. Wu NC
3. Zhu X
4. Lee CCD
5. So RTY
6. Lv H
7. Mok CKP
8. Wilson IA
(2020) A highly conserved cryptic epitope in the receptor binding domains of SARS-CoV-2 and SARS-CoV
Science 368:630–633.

https://doi.org/10.1126/science.abb7269
- PubMed
- Google Scholar
1. Zhang Q
2. Noble KA
3. Mao Y
4. Young NL
5. Sathe SK
6. Roux KH
7. Marshall AG
(2013) Rapid screening for potential epitopes reactive with a polycolonal antibody by solution-phase H/D exchange monitored by FT-ICR mass spectrometry
Journal of the American Society for Mass Spectrometry 24:1016–1025.

https://doi.org/10.1007/s13361-013-0644-7
- PubMed
- Google Scholar

Article and author information

Author details

Douwe Schulte

Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan, Utrecht, Netherlands

Contribution
Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing – review and editing

Competing interests
No competing interests declared
Marta Šiborová

Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan, Utrecht, Netherlands

Contribution
Data curation, Investigation, Writing – review and editing

Competing interests
No competing interests declared
Lukas Käll

Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, Royal Institute of Technology – KTH, Solna, Sweden

Contribution
Conceptualization, Investigation, Writing – review and editing

Competing interests
No competing interests declared
Joost Snijder

Biomolecular Mass Spectrometry and Proteomics, Bijvoet Center for Biomolecular Research and Utrecht Institute of Pharmaceutical Sciences, Utrecht University, Padualaan, Utrecht, Netherlands

Contribution
Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing – review and editing

For correspondence
j.snijder@uu.nl

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-9310-8226

Funding

Nederlandse Organisatie voor Wetenschappelijk Onderzoek (Gravitation 2013 BOO,Institute for Chemical Immunology (ICI; 024.002.009))

Douwe Schulte
Marta Šiborová
Joost Snijder

European Research Council

https://doi.org/10.13039/100019180

Douwe Schulte
Joost Snijder

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This research was funded by the Dutch Research Council NWO Gravitation 2013 BOO, Institute for Chemical Immunology (ICI; 024.002.009), and the European Research Council Executive Agency HORIZON ERC-2022-STG (FLAVIR; 101077640).

Version history

Preprint posted: June 27, 2024
Sent for peer review: November 4, 2024
Reviewed Preprint version 1: January 16, 2025
Reviewed Preprint version 2: March 18, 2025
Version of Record published: April 23, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.101322. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.