Barcode-free multiplex plasmid sequencing using Bayesian analysis and nanopore sequencing
Figures

Workflow for Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You (SAVEMONEY).
The algorithm consists of three steps: pre-survey, sample submission, and post-analysis. The pre-survey step identifies the optimal combination of plasmids that will permit suitable accuracy for the classification step of the post-analysis. Plasmids with divergent sequences are grouped together, and those with very similar sequences are classified into different groups. After sample submission and sequencing, the post-analysis component, which consists of three different steps, is performed to deconvolve the obtained results. Reads (query sequences) are first classified based on their similarity to the plasmid blueprint/map (reference sequence). Reads are then aligned against reference sequences. Finally, consensus sequences and quality scores are calculated based on base calls, quality scores from each read, and the reference sequence, using Bayesian analysis.

Examples of the pre-survey outputs.
Sequences of 14 different plasmids were analyzed with the indicated and values. Levenshtein distance between each plasmid pair is displayed in heatmaps, which were subsequently used to generate the dendrograms displayed on the right side of the heatmap. The dotted red lines in the dendrogram represent the values, enabling visualization of the results of clustering, i.e., plasmids with distances less than the red lines were classified in the same cluster. Based on this clustering results of similar plasmids, plasmids were classified into groups for sequencing submission, with groups displayed in red and blue (a), red, blue, and magenta (b), or red, blue, magenta, and cyan (c). Levenshtein distance between each plasmid is also emphasized by the colored frames within each group, and plasmids classified into the same cluster are grouped in different groups. Note that P1–P14 here are different from example plasmids used in Figures 3 and 4.
-
Figure 2—source data 1
Results of pre-survey used to make Figure 2a.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig2-data1-v1.txt
-
Figure 2—source data 2
Results of pre-survey used to make Figure 2b.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig2-data2-v1.txt
-
Figure 2—source data 3
Results of pre-survey used to make Figure 2c.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig2-data3-v1.txt

Example results after the classification step for a set of six moderately related plasmids.
(a) Results of the pre-survey against plasmid sets used in (b, c). (b) Read length and the quality score distributions. (c) Scatter plots of normalized alignment scores. Normalized alignment scores were calculated for each read over all reference plasmids and displayed as scatter plots. Density plots of the normalized alignment scores are also displayed for each reference plasmid in the diagonal panel, which is the projection of each scatter plots against horizontal axes. The y-axis ranges of these diagonal panels are shared. The vertical and the horizontal positions of dashed lines correspond to value. These data depict results of classification performed with a value of 0.5. Note that P1–P6 here are different from example plasmids used in Figures 2 and 4.
-
Figure 3—source data 1
Fastq file containing original data used to make Figure 3b and c.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig3-data1-v1.zip
-
Figure 3—source data 2
Csv file containing raw data used to make Figure 3b.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig3-data2-v1.csv
-
Figure 3—source data 3
Csv file containing raw data used to make Figure 3c.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig3-data3-v1.csv

Example results after the classification step for closely related plasmids.
(a–c) Results of the pre-survey against plasmid sets. (d–f) Scatter plots of normalized alignment scores for the plasmid pairs. The vertical and the horizontal positions of dashed lines correspond to values. These data depict results of classification performed with a value of 0.5. (g–i) Breakdowns of reads covering the regions where the plasmids differ in sequence. In the rotated heatmap at the bottom of (g), the axis labeled as ‘P1 match’ represents the number of bases matching the P1 sequence in the regions where the sequences of P1 and P2 differ, whereas the axis labeled as ‘P2 match’ represents the equivalent for P2. The values in each cell represent the number of observed reads matching the values of the two axes at that position. The subtraction of ‘P1 match’ from ‘P2 match’ is represented by the horizontal axis, which is also shared with the x-axis of the histogram on top, where the sum projection of the heatmap is displayed. In the histogram, the breakdown of classification is represented by color: blue for reads classified to P1, orange for reads classified to P2, gray for unclassified reads with a score above , and light gray for unclassified reads with a score below . Because the normalized alignment scores for the two plasmids are the same where the value on the horizontal axis is 0, reads are not classified to either plasmid; therefore, the middle bar of the histogram is colored with either gray or light gray. The same interpretation applies to (h) and (i). (j–l) Summary of the fitting results. Based on the estimated parameters displayed in Table 1, the rotated heatmaps representing the breakdowns of reads originating from P1, P3, and P5 (upper panels) and P2, P4, and P6 (lower panels) were generated. Note that P1–P6 here are different from example plasmids used in Figures 2 and 3.
-
Figure 4—source data 1
Fastq file containing original data used to make Figure 4d and g.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig4-data1-v1.zip
-
Figure 4—source data 2
Fastq file containing original data used to make Figure 4e and h.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig4-data2-v1.zip
-
Figure 4—source data 3
Fastq file containing original data used to make Figure 4f and i.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig4-data3-v1.zip
-
Figure 4—source data 4
Csv file containing raw data used to make Figure 4d and g.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig4-data4-v1.csv
-
Figure 4—source data 5
Csv file containing raw data used to make Figure 4e and h.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig4-data5-v1.csv
-
Figure 4—source data 6
Csv file containing raw data used to make Figure 4f and i.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig4-data6-v1.csv

Characteristics of base calling used for prior information.
(a) Grid showing error ratios for each base calling event. Based on the results obtained from samples analyzed by R10.4.1 flow cells with V14 library preparation chemistry by Oxford Nanopore Technologies via the Plasmidsaurus service, the frequency was analyzed for base calling of each pore (column labels) and the results of the consensus sequence (row labels) at each position. In the context of base calling, ‘–’ represents bases that were base-called in the consensus sequence but skipped in the reads from each pore. In the context of consensus sequencing, ‘–’ represents bases that do not appear in the consensus sequence but were base-called from pores. The color of the diagonal panels is saturated because of the contrast range focusing on subtle differences of the non-diagonal panels. Of note, the sum of the rows is 1, but the sum of the displayed numbers may be slightly different from 1 because the fourth decimal place is rounded in the grid. (b) Quality score distributions for each base calling event. The base calling of each pore (column labels) and the results of the consensus sequence (row labels) at each position were classified, and probability density plots and quality scores were calculated and displayed. The y-axis is shared by all panels and is scaled to focus on panels in which the true base and base calling are not the same (incorrect base calling, blue). Therefore, the density of maximum quality score is out of the range of the display area in the diagonal panels (correct base calling, orange), and full-size plots are provided to the right. Note that there are no density plots when base calling was skipped in the reads from each pore (column corresponding to the label ‘–’ in (a)).
-
Figure 5—source data 1
Text file containing raw data used to make Figure 5.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig5-data1-v1.txt

Analysis for the maximum similarity between plasmids that can be mixed and the minimum number of required reads.
(a) Density plot of the rate of reads with correct base calling. The rate was calculated at each position of plasmids and displayed using representative nanopore sequencing results. (b) Averaged quality score distribution of reads with correct rate of less than 0.7 (upper panel) and more than 0.9 (bottom panel). The corresponding regions are displayed with dashed red frames in (a). Of note, ‘omitted’ represents reads that did not cover the focused position. (c) Probability logo plot. Statistical significance (−log10[p-value]) was calculated for a 5-mer around the positions that showed correct rate lower than 0.7 in (a) using those that showed more than 0.9 as a background. Enriched residues are stacked on the top, whereas depleted residues are stacked on the bottom. (d) Estimated probability of incorrect classification. Based on the match/mismatch/deletion ratio of reads obtained in the ‘worst-case scenario’, i.e., top panel in (b), the probability of incorrect classification of a read was calculated assuming that two plasmids that differ by the indicated base(s) were mixed. (e) Estimated probability of correct/incorrect consensus base calling. Based on the quality score distribution obtained in the ‘worst-case scenario’, i.e., top panel in (b), the indicated number of reads were generated in silico, and the consensus base calling was calculated using Simple Algorithm for Very Efficient Multiplexing of Oxford Nanopore Experiments for You (SAVEMONEY). The simulation was performed 10,000 times for each condition to calculate the probability of correct/incorrect consensus base calling.
-
Figure 6—source code 1
Source code used to make Figure 6d.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig6-code1-v1.zip
-
Figure 6—source code 2
Source code used to make Figure 6e.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig6-code2-v1.zip
-
Figure 6—source data 1
Csv file containing raw data used to make Figure 6a.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig6-data1-v1.csv
-
Figure 6—source data 2
Csv file containing raw data used to make Figure 6b.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig6-data2-v1.csv
-
Figure 6—source data 3
Text file containing raw data used to make Figure 6c.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig6-data3-v1.txt
-
Figure 6—source data 4
Csv file containing raw data used to make Figure 6e.
- https://cdn.elifesciences.org/articles/88794/elife-88794-fig6-data4-v1.csv
Tables
The fitting results summarized in Figure 4j–l.
The values for the fitted parameters are displayed in ‘Error rate’ and ‘Total reads’ columns. The rotated heatmaps in Figure 4j–l were generated based on these estimated parameters. For P1, P3, and P5, the sum of values in heatmap cells whose location on the horizontal axis are above 0, below 0, and 0 are shown as ‘Correctly classified reads’, ‘Wrongly classified reads’, and ‘Reads not classified’ columns, respectively. For P2, P4, and P6, below 0, above 0, and 0 are shown as ‘Correctly classified reads’, ‘Wrongly classified reads’, and ‘Reads not classified’ columns, respectively. Finally, the values in ‘Rate of incorrect classification’ columns for each plasmid were calculated by dividing the values of ‘Wrongly classified reads’ for the other plasmids in the same set by the total number of reads estimated to be classified to the focusing plasmid, which is different from values displayed in the ‘Total reads’ column. Specific equations are provided in the footnote to the table.
Error rate | Total reads | Correctly classified reads | Wrongly classified reads | Reads not classified | Rate of incorrect classification | ||
---|---|---|---|---|---|---|---|
Set 1 | P1 | 0.0188 | 163.607 | 160.527a1 | 1.027b1 | 2.054 | 0.002939c1 |
P2 | 75.393 | 73.973a2 | 0.473b2 | 0.946 | 0.013691c2 | ||
Set 2 | P3 | 0.0155 | 124.826 | 123.531a3 | 0.010b3 | 1.285 | 0.000089c3 |
P4 | 137.223 | 136.000a4 | 0.011b4 | 1.413 | 0.000074c4 | ||
Set 3 | P5 | 0.0213 | 1131.757 | 1131.249a5 | 0.170b5 | 0.338 | 0.000143c5 |
P6 | 1077.833 | 1077.349a6 | 0.162b6 | 0.322 | 0.000158c6 |
-
c1=b2/(a1+b2); c2=b1/(a2+b1); c3=b4/(a3+b4); c4=b3/(a4+b3); c5=b6/(a5+b6); c6=b5/(a6+b5).
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Recombinant DNA reagent | pCDH1-lyn10-mCherry-LOVPLD* (plasmid) | This paper | P1 in Figure 2; Plasmid used in PMID:38559292 with silent mutation | |
Recombinant DNA reagent | pCDH1-lyn10-mCherry-LOVPLD (plasmid) | PMID:38559292 | P2 in Figure 2 | |
Recombinant DNA reagent | pCDH1-lyn10-mCherry-LOVPLD** (plasmid) | This paper | P3 in Figure 2; Plasmid used in PMID:38559292 with silent mutation | |
Recombinant DNA reagent | pCDH1-lyn10-mCherry-LOVPLD*** (plasmid) | This paper | P4 in Figure 2; Plasmid used in PMID:38559292 with silent mutation | |
Recombinant DNA reagent | pET-17b-mod2_6xHis-PLDs48-HiBiT (plasmid) | This paper | P5 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod2_6xHis-PLD-HiBiT (plasmid) | This paper | P6 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod2_6xHis-PLDs4-HiBiT (plasmid) | This paper | P7 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod_6xHis-PLDs48**-HiBiT (plasmid) | This paper | P8 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod_6xHis-PLDs48*-HiBiT (plasmid) | This paper | P9 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod_6xHis-PLDs48-HiBiT (plasmid) | This paper | P10 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod_6xHis-PLDs27L484F-HiBiT (plasmid) | This paper | P11 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod_6xHis-PLD-HiBiT (plasmid) | This paper | P12 in Figure 2 | |
Recombinant DNA reagent | pET-17b-mod_6xHis-PLDs4A326T-HiBiT (plasmid) | This paper | P13 in Figure 2, P1 in Figure 3 | |
Recombinant DNA reagent | pET-17b-mod_6xHis-PLDs4-HiBiT (plasmid) | This paper | P14 in Figure 2, P2 in Figure 3 | |
Recombinant DNA reagent | pmNeonGreen-N1 (plasmid) | Other | P1 in Figure 3; a gift from the Lammerding Laboratory, Cornell University, Ithaca, NY, USA | |
Recombinant DNA reagent | mCherry-Spo20 (plasmid) | Other | P2 in Figure 3; a gift from the Frohman Laboratory, Stony Brook University, Stony Brook, NY, USA | |
Recombinant DNA reagent | GFP-PASS (plasmid) | Other | P3 in Figure 3; a gift from the Du Laboratory, The University of Texas Health Science Center at Houston, Houston, TX, USA | |
Recombinant DNA reagent | iRFP-PASS (plasmid) | PMID:31999306 | P4 in Figure 3 | |
Recombinant DNA reagent | pcDNA3_P18-CIBN-P2A-CRY2-mCherry-PLD(1-17) (plasmid) | PMID:37217787 | P5 in Figure 3 | |
Recombinant DNA reagent | pcDNA3_CRY2-mCherry-PLD(2-27)-P2A-CIBN-CAAX (plasmid) | PMID:37217787 | P6 in Figure 3 | |
Recombinant DNA reagent | PLD-mCherry-Rab7 (plasmid) | Other | P3 in Figure 3; constructed by Reika Tei (Baskin Lab) | |
Recombinant DNA reagent | dPLD-mCherry-Rab7 (plasmid) | Other | P4 in Figure 3; constructed by Reika Tei (Baskin Lab) | |
Recombinant DNA reagent | pGFPN1-PL5(143–271)-EGFP-S161D | PMID:39209962 | P5 in Figure 3 | |
Recombinant DNA reagent | pGFPN1-PL5(143–271)-EGFP | PMID:35952650 | P6 in Figure 3 | |
Software, algorithm | SAVEMONEY | This paper | version 0.3.4 |
Additional files
-
Supplementary file 1
All sequences of plasmids used in this study.
- https://cdn.elifesciences.org/articles/88794/elife-88794-supp1-v1.zip
-
MDAR checklist
- https://cdn.elifesciences.org/articles/88794/elife-88794-mdarchecklist1-v1.pdf