Research Article

Freshwater monitoring by nanopore sequencing

European Bioinformatics Institute, Wellcome Genome Campus, United Kingdom
Department of Plant Sciences, University of Cambridge, United Kingdom
Department of Earth Sciences, University of Cambridge, United Kingdom
Department of Engineering, University of Cambridge, United Kingdom
Department of Biochemistry, University of Cambridge, United Kingdom
Wellcome Sanger Institute, Wellcome Trust Genome Campus, United Kingdom
Department of Physics, University of Cambridge, United Kingdom
Department of Physiology, Development & Neuroscience, University of Cambridge, United Kingdom
Department of Veterinary Medicine, University of Cambridge, United Kingdom

Jan 19, 2021

Open access
Copyright information

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

While traditional microbiological freshwater tests focus on the detection of specific bacterial indicator species, including pathogens, direct tracing of all aquatic DNA through metagenomics poses a profound alternative. Yet, in situ metagenomic water surveys face substantial challenges in cost and logistics. Here, we present a simple, fast, cost-effective and remotely accessible freshwater diagnostics workflow centred around the portable nanopore sequencing technology. Using defined compositions and spatiotemporal microbiota from surface water of an example river in Cambridge (UK), we provide optimised experimental and bioinformatics guidelines, including a benchmark with twelve taxonomic classification tools for nanopore sequences. We find that nanopore metagenomics can depict the hydrological core microbiome and fine temporal gradients in line with complementary physicochemical measurements. In a public health context, these data feature relevant sewage signals and pathogen maps at species level resolution. We anticipate that this framework will gather momentum for new environmental monitoring initiatives using portable devices.

eLife digest

Many water-dwelling bacteria can cause severe diseases such as cholera, typhoid or leptospirosis. One way to prevent outbreaks is to test water sources to find out which species of microbes they contain, and at which levels.

Traditionally, this involves taking a water sample, followed by growing a few species of ‘indicator bacteria’ that help to estimate whether the water is safe. An alternative technique, called metagenomics, has been available since the mid-2000s. It consists in reviewing (or ‘sequencing’) the genetic information of most of the bacteria present in the water, which allows scientists to spot harmful species. Both methods, however, require well-equipped laboratories with highly trained staff, making them challenging to use in remote areas.

The MinION is a pocket-sized device that – when paired with a laptop or mobile phone – can sequence genetic information ‘on the go’. It has already been harnessed during Ebola, Zika or SARS-CoV-2 epidemics to track the genetic information of viruses in patients and environmental samples. However, it is still difficult to use the MinION and other sequencers to monitor bacteria in water sources, partly because the genetic information of the microbes is highly fragmented during DNA extraction.

To address this challenge, Urban, Holzer et al. set out to optimise hardware and software protocols so the MinION could be used to detect bacterial species present in rivers. The tests focussed on the River Cam in Cambridge, UK, a waterway which faces regular public health problems: local rowers and swimmers often contract waterborne infections, sometimes leading to river closures.

For six months, Urban, Holzer et al. used the MinION to map out the bacteria present across nine river sites, assessing the diversity of species and the presence of disease-causing microbes in the water. In particular, the results showed that optimising the protocols made it possible to tell the difference between closely related species – an important feature since harmful and inoffensive bacteria can sometimes be genetically close. The data also revealed that the levels of harmful bacteria were highest downstream of urban river sections, near a water treatment plant and river barge moorings. Together, these findings demonstrate that optimising MinION protocols can turn this device into a useful tool to easily monitor water quality.

Around the world, climate change, rising urbanisation and the intensification of agriculture all threaten water quality. In fact, access to clean water is one of the United Nations sustainable development goals for 2030. Using the guidelines developed by Urban, Holzer et al., communities could harness the MinION to monitor water quality in remote areas, offering a cost-effective, portable DNA analysis tool to protect populations against deadly diseases.

Introduction

The global assurance of safe drinking water and basic sanitation has been recognised as a United Nations Millennium Development Goal (Bartram et al., 2005), particularly in light of the pressures of rising urbanisation, agricultural intensification, and climate change (Haddeland et al., 2014; Schewe et al., 2014). Waterborne diseases represent a particular global threat, with zoonotic diseases such as typhoid fever, cholera, or leptospirosis resulting in hundreds of thousands of deaths each year (Prüss et al., 2002; Prüss-Ustün et al., 2019).

To control for risks of infection by waterborne diseases, microbial assessments can be conducted. While traditional microbial tests focus on the isolation of specific bacterial indicator organisms through selective media outgrowth in a diagnostic laboratory, this cultivation process is all too often time consuming, infrastructure-dependent and lacks behind in automatisation (Salazar and Sunagawa, 2017; Tringe and Rubin, 2005). Environmental metagenomics, the direct tracing of DNA from environmental samples, constitutes a less organism-tailored, data-driven monitoring alternative. Such approaches have been demonstrated to provide robust measurements of relative taxonomic species composition as well as functional diversity in a variety of environmental contexts (Almeida et al., 2019; Bahram et al., 2018; Tara Oceans coordinators et al., 2015), and overcome enrichment and resolution biases common to culturing (Salazar and Sunagawa, 2017; Tringe and Rubin, 2005). However, they usually depend on expensive stationary equipment, specialised operational training and substantial time lags between fieldwork, sample preparation, raw data generation and access. Combined, there is an increasing demand for freshwater monitoring frameworks that unite the advantages of metagenomic workflows with high cost effectiveness, fast technology deployability, and data transparency (Gardy and Loman, 2018).

In recent years, these challenges have been revisited with the prospect of mobile DNA analysis. The main driver of this is the ‘portable’ MinION device from Oxford Nanopore Technologies (ONT), which enables real-time DNA sequencing using nanopores (Jain et al., 2016). Nanopore read lengths can be comparably long, currently up to ~2*10⁶ bases (Payne et al., 2019), which is enabled by continuous electrical sensing of sequential nucleotides along single DNA strands. In connection with a laptop for the translation of raw voltage signal into nucleotides, nanopore sequencing can be used to rapidly monitor long DNA sequences in remote locations. Although there are still common concerns about the technology's base-level accuracy, mobile MinION setups have already been transformative for real-time tracing and rapid data sharing during bacterial and viral pathogen outbreaks (Boykin et al., 2019; Chan et al., 2020; Faria et al., 2018; Faria et al., 2017; Kafetzopoulou et al., 2019; Quick et al., 2015; Quick et al., 2016). In the context of freshwater analysis, a MinION whole-genome shotgun sequencing protocol has recently been leveraged for a comparative study of 11 rivers (Reddington et al., 2020). This report highlights key challenges which emerge in serial monitoring scenarios of a relatively low-input DNA substrate (freshwater), for example large sampling volumes (2–4 l) and small shotgun fragments (mean < 4 kbp). We reasoned that targeted DNA amplification may be a suitable means to bypass these bottlenecks and assess river microbiomes with nanopore sequencing.

Here, we report a simple, cost-effective workflow to assess and monitor microbial freshwater ecosystems with targeted nanopore DNA sequencing. Our benchmarking study involves the design and optimisation of essential experimental steps for multiplexed MinION usage in the context of local environments, together with an evaluation of computational methods for the bacterial classification of nanopore sequencing reads from metagenomic libraries. To showcase the resolution of sequencing-based aquatic monitoring in a spatiotemporal setting, we combine DNA analyses with physicochemical measurements of surface water samples collected at nine locations within a confined ~12 km reach of the River Cam passing through the city of Cambridge (UK) in April, June, and August 2018.

Results

Experimental design and computational workflows

Using a bespoke workflow, nanopore full-length (V1-V9) 16S ribosomal RNA (rRNA) gene sequencing was performed on all location-barcoded freshwater samples at each of the three time points (Figure 1; Supplementary file 1; Materials and methods). River isolates were multiplexed with negative controls (deionised water) and mock community controls composed of eight bacterial species in known mixture proportions.

Figure 1 with 1 supplement see all

Download asset Open asset

Freshwater microbiome study design and experimental setup.

(a) Schematic map of Cambridge (UK), illustrating sampling locations (colour-coded) along the River Cam. Geographic coordinates of latitude and longitude are expressed as decimal fractions according to the global positioning system. (b) Laboratory workflow to monitor bacterial communities from freshwater samples using nanopore sequencing (Materials and methods).

To obtain valid taxonomic assignments from freshwater sequencing profiles using nanopore sequencing, twelve different classification tools were compared through several performance metrics (Figure 2; Figure 2—figure supplement 1; Materials and methods). Our comparison included established classifiers such as RDP (Wang et al., 2007), Kraken (Wood and Salzberg, 2014), and Centrifuge (Kim et al., 2016), as well as more recently developed methods optimised for higher sequencing error rates such as IDTAXA (Murali et al., 2018) and Minimap2 (Li, 2018). An Enterobacteriaceae overrepresentation was observed across all replicates and classification methods, pointing towards a consistent Escherichia coli amplification bias potentially caused by skewed taxonomic specificities of the selected 16S primer pair 27F and 1492R (Frank et al., 2008; Figure 2b). Root mean square errors (RMSE) between observed and expected bacteria of the mock community differed slightly across all classifiers (Figure 2c). Robust quantifications were obtained by Minimap2 alignments against the SILVA v.132 database (Quast et al., 2013), for which 99.68% of classified reads aligned to the expected mock community taxa (mean sequencing accuracy 92.08%; Figure 2—figure supplement 2c). Minimap2 classifications reached the second lowest RMSE (excluding Enterobacteriaceae), and relative quantifications were highly consistent between mock community replicates. Benchmarking of the classification tools on one aquatic sample further confirmed Minimap2's reliable performance in a complex bacterial community (Figure 2d), although other tools such as MAPseq (Matias Rodrigues et al., 2017), SPINGO (Allard et al., 2015), or IDTAXA also produced highly concordant results – despite variations in memory usage and runtime over several orders of magnitude (Figure 2—figure supplement 1b).

Figure 2 with 2 supplements see all

Download asset Open asset

Benchmarking of classification tools with nanopore full-length 16S sequences.

(a) Schematic of mock community quantification performance testing. (b) Observed vs. expected read fraction of bacterial families present in 10,000 nanopore reads randomly drawn from mock community sequencing data. Example representation of Minimap2 (kmer length 15) quantifications with (upper) and without (lower) *Enterobacteriaceae* (Materials and methods). (c) Mock community classification output summary for twelve classification tools tested against the same 10,000 reads. Root mean squared errors observed and expected bacterial read fractions are provided with (RMSE) and without *Enterobacteriaceae* (RMSE reduced). (d) Classification output summary for 10,000 reads randomly drawn from an example freshwater sample (Materials and methods). ‘Overlapping’ fractions (red) represent agreements of a classification tool with the majority of tested methods on the same reads, while ‘non-overlapping’ fractions (light blue) represent disagreements. Dark green sets highlight rare taxon assignments not featured in any of the 10,000 majority classifications, while dark blue bars show unclassified read fractions.

Diversity analysis and river core microbiome

Using Minimap2 classifications within our bioinformatics consensus workflow (Figure 1—figure supplement 1; Materials and methods), we then inspected sequencing profiles of three independent MinION runs for a total of 30 river DNA isolates and six controls. This yielded ~8.3 million sequences with exclusive barcode assignments (Figure 3a; Supplementary file 2). Overall, 82.9% (n = 6,886,232) of raw reads could be taxonomically assigned to the family level (Figure 3b). To account for variations in sample sequencing depth, rarefaction with a cut-off at 37,000 reads was applied to all samples. While preserving ~90% of the original family level taxon richness (Mantel test, R = 0.814, p = 2.1*10⁻⁴; Figure 3—figure supplement 1a–b), this conservative thresholding resulted in the exclusion of 14 samples, mostly from the June time point, for subsequent high-resolution analyses. The 16 remaining surface water samples revealed moderate levels of microbial heterogeneity (Figure 3b; Figure 3—figure supplement 1c): microbial family alpha diversity ranged between 0.46 (June-6) and 0.92 (April-7) (Simpson index), indicating low-level evenness with a few taxonomic families that account for the majority of the metagenomic signal.

Figure 3 with 1 supplement see all

Download asset Open asset

Bacterial diversity of the River Cam.

(a) Nanopore sequencing output summary. Values in the centre of the pie charts depict total numbers of classified nanopore sequences per time point. Percentages illustrate representational fractions of locations and control barcodes (negative control and mock community). (b) Read depth and bacterial classification summary. Upper bar plot shows the total number of reads, and the number of reads classified to any taxonomic level, to at least bacterial family level, to the ten most abundant bacterial families across all samples, or to other families. Rarefaction cut-off displayed at 37,000 reads (dashed line). Lower bar plot features fractions of the ten most abundant bacterial families across the samples with more than 100 reads. Colours in bars for samples with less than 37,000 reads are set to transparent.

Hierarchical clustering of taxon profiles showed a dominant core microbiome across all aquatic samples (clusters C2 and C4, Figure 4a). The most common bacterial families observed were Burkholderiaceae (40.0%), Spirosomaceae (17.7%), and NS11-12 marine group (12.5%), followed by Arcobacteraceae (4.8%), Sphingomonadaceae (2.9%), and Rhodobacteraceae (2.5%) (Figure 4b). Members of these families are commonly associated with aquatic environments; for example, major fractions of Burkholderiaceae reads originated from genera such as Limnohabitans, Rhodoferax, Polynucleobacter, or Aquabacterium (Figure 4—figure supplement 1), which validates the suitability of this nanopore metagenomics workflow. Hierarchical clustering additionally showed that two biological replicates collected at the same location and time point (April samples 9.1 and 9.2), grouped with high concordance; this indicates that spatiotemporal trends are discernible even within a highly localised context.

Figure 4 with 1 supplement see all

Download asset Open asset

Core microbiome of the River Cam.

(a) Hierarchical clustering of bacterial family abundances across freshwater samples after rarefaction, together with the mock community control. Four major clusters of bacterial families occur, with two of these (C2 and C4) corresponding to the core microbiome of ubiquitously abundant families, one (C3) corresponding to the main mock community families and one (C1) corresponding to the majority of rare accessory taxa. (b) Detailed river core microbiome. Violin plots summarise fractional representation of bacterial families from clusters C2 and C4 (log₁₀ scale of relative abundance [%] across all samples, n_April = 7, n_June = 2, n_August = 7), sorted by median total abundance. Vertical dashed lines depict 0.1% proportion.

Besides the dominant core microbiome, microbial profiles showed a marked arrangement of time dependence, with water samples from April grouping more distantly to those from June and August. Principal component analysis (PCA) illustrates the seasonal divergence among the three sampling months (Figure 5a; Figure 5—figure supplement 1). The strongest differential abundances along the seasonal axis of variation (PC3) derived from Carnobacteriaceae (Figure 5b), a trend also highlighted by taxon-specific log-normal mixture model decomposition between the two seasons (April vs. June/August; p < 0.01; Materials and methods). Indeed, members of this bacterial family have been primarily isolated from cold substrates (Lawson and Caldwell, 2014).

Figure 5 with 1 supplement see all

Download asset Open asset

Spatiotemporal axes of taxonomic diversity in the River Cam.

(a) PCA of bacterial composition across locations, indicating community dissimilarities along the main time (PC3) and spatial (PC4) axes of variation; dots coloured according to time points. Kruskal-Wallis test on PC3 component contributions, with post-hoc Mann-Whitney U rank test (April vs. August): p = 2.2*10⁻³. (b) Contribution of individual bacterial families to the PCs in (a). Error bars represent the standard deviation of these families across four independent rarefactions.

Hydrochemistry and seasonal profile of the River Cam

While a seasonal difference in bacterial composition can be expected due to increasing water temperatures in the summer months, additional changes may have also been caused by alterations in river hydrochemistry and flow rate (Figure 6a; Figure 6—figure supplement 1; Supplementary file 1). To assess this effect in detail, we measured the pH and a range of major and trace cations in all river water samples using inductively coupled plasma-optical emission spectroscopy (ICP-OES), as well as major anions using ion chromatography (Materials and methods). As with the bacterial composition dynamics, we observed significant temporal variation in water chemistry, superimposed on a spatial gradient of generally increasing sodium and chloride concentrations along the river reach (Figure 6b–c). This spatially consistent effect is likely attributed to wastewater and agricultural discharge inputs in and around Cambridge city. A comparison of the major element chemistry in the River Cam transect with the world's 60 largest rivers further corroborates the likely impact of anthropogenic pollution in this fluvial ecosystem (Gaillardet et al., 1999; Figure 6d; Materials and methods).

Figure 6 with 1 supplement see all

Download asset Open asset

Geological and hydrochemical profile of the River Cam and its basin.

(a) Outline of the Cam River catchment surrounding Cambridge (UK), and its corresponding lithology. Overlay of bedrock geology and superficial deposits (British Geological Survey data: DiGMapGB-50, 1:50,000 scale) is shown as visualised by GeoIndex. Bedrock is mostly composed of subtypes of Cretaceous limestone (chalk), gault (clay, sand), and mudstone. Approximate sampling locations are colour-coded as in Figure 1. (b) Principal component analysis of measured pH and 13 inorganic solute concentrations of this study's 30 river surface water samples. PC1 (~49% variance) displays a strong, continuous temporal shift in hydrochemistry. (c) Parameter contributions to PC1 in (b), highlighting a reduction in water hardness (Ca²⁺, Mg²⁺) and increase in pH towards the summer months (June and August). (d) Mixing diagram with Na⁺-normalised molar ratios, representing inorganic chemistry loads of the world's 60 largest rivers; open circles represent polluted rivers with total dissolved solid (TDS) concentrations > 500 mg l⁻¹. Cam River ratios are superimposed as ellipses from ten samples per month (50% confidence, respectively). Separate data points for all samples from August are also shown and colour-coded, indicating the upstream-to-downstream trend of Na⁺ increase (also observed in April and June). End-member signatures show typical chemistry of small rivers draining these lithologies exclusively (carbonate, silicate and evaporite).

Maps of potential bacterial pathogens at species level resolution

Freshwater sources throughout the United Kingdom have been notorious for causing bacterial infections such as leptospirosis (Public Health England, 2016; Public Health England, 2019). In line with the physicochemical profile of the River Cam, we therefore next determined the spatiotemporal enrichment of potentially important functional bacterial taxa through nanopore sequencing. We retrieved 55 potentially pathogenic bacterial genera through integration of species known to affect human health (Jin et al., 2018; Wattam et al., 2017), and also 13 wastewater-associated bacterial genera (Global Water Microbiome Consortium et al., 2019; Supplementary file 3). Of these, 21 potentially pathogenic and 8 wastewater-associated genera were detected across all of the river samples (Figure 7; Materials and methods). Many of these signals were stronger downstream of urban sections, within the mooring zone for recreational and residential barges (location 7; Figure 1a) and in the vicinity of sewage outflow from a nearby wastewater treatment plant (location 8). The most prolific candidate pathogen genus observed was Arcobacter, which features multiple species implicated in acute gastrointestinal infections (Kayman et al., 2012).

Figure 7

Download asset Open asset

Potentially pathogenic and wastewater treatment related bacteria in the River Cam.

Boxplots on the left show the abundance distribution across locations per bacterial genus. Error bars represent Q1 – 1.5*IQR (lower), and Q3 + 1.5*IQR (upper), respectively; Q1: first quartile, Q3: third quartile, IQR: interquartile range. The central table depicts the categorisation of subsets of genera as waterborne bacterial pathogens (WB), drinking water pathogens (DWP), potential drinking water pathogens (pDWP), human pathogens (HP), and core genera from wastewater treatment plants (WW) (dark grey: included, light grey: excluded) (Supplementary file 3). The right-hand circle plot shows the distribution of bacterial genera across locations of the River Cam. Circle sizes represent overall read size fractions, while circle colours (sigma scheme) represent the standard deviation from the observed mean relative abundance within each genus.

In general, much of the taxonomic variation across all samples was caused by sample April-7 (PC1 explains 27.6% of the overall variance in bacterial composition; Figure 5—figure supplement 1a–b). Its profile was characterised by an unusual dominance of Caedibacteraceae, Halomonadaceae and others (Figure 5—figure supplement 1c). Isolate April-8 also showed a highly distinct bacterial composition, with some families nearly exclusively occurring in this sample (outlier analysis; Materials and methods). The most predominant bacteria in this sewage pipe outflow are typically found in wastewater sludge or have been shown to contribute to nutrient pollution from effluents of wastewater plants, such as Haliangiaceae, Nitospiraceae, Rhodocyclaceae, and Saprospiracea (Nielsen et al., 2012; Global Water Microbiome Consortium et al., 2019; Figure 7).

Using multiple sequence alignments between nanopore reads and pathogenic species references, we further resolved the phylogenies of three common potentially pathogenic genera occurring in our river samples, Legionella, Salmonella, and Pseudomonas (Figure 8a–c; Materials and methods). While Legionella and Salmonella diversities presented negligible levels of known harmful species, a cluster of reads in downstream sections indicated a low abundance of the opportunistic, environmental pathogen Pseudomonas aeruginosa (Figure 8c).

Figure 8

Download asset Open asset

High-resolution phylogenetic clustering of candidate pathogenic genera in the River Cam.

Phylogenetic trees illustrating multiple sequence alignments of exemplary River Cam nanopore reads (black branches) classified as (a) *Legionella*, (b) *Salmonella*, (c) *Pseudomonas*, or (d) *Leptospira*, together with known reference species sequences ranging from pathogenic to saprophytic taxa within the same genus (coloured branches). Reference species sequences are numbered in clockwise orientation around the tree (Supplementary file 4). Nanopore reads highlighted in light violet background display close clustering with pathogenic isolates of (b) *Salmonella spp.* and (c) *Pseudomonas aeruginosa*.

Along the course here investigated, we also found significant variations in relative abundances of the Leptospira genus, which was recently described to be enriched in wastewater effluents in Germany (Numberger et al., 2019; Figure 8d). Indeed, the peak of River Cam Leptospira reads fell into an area of increased sewage influx (~0.1% relative abundance; Figure 7). The Leptospira genus contains several potentially pathogenic species capable of causing life-threatening leptospirosis through waterborne infections, however, also features close-related saprophytic and ‘intermediate’ taxa (Vincent et al., 2019; Wynwood et al., 2014). To resolve its complex phylogeny in the River Cam surface, we aligned Leptospira reads from all samples together with many reference sequences assigned to pre-classified pathogenic, saprophytic and other environmental Leptospira species (Figure 8d; Supplementary file 4; Materials and methods). Despite the presence of nanopore sequencing errors (Figure 2—figure supplement 2c) and correspondingly inflated read divergence, we could pinpoint spatial clusters and a distinctly higher similarity between our amplicons and saprophytic rather than pathogenic Leptospira species. These findings were subsequently validated by targeted, Leptospira species-specific qPCR (Supplementary file 5; Materials and methods), confirming that R9.4.1 nanopore sequencing quality is already high enough to yield indicative results for bacterial monitoring workflows at the species level.

Discussion

Using a cost-effective, easily adaptable and scalable framework, we provide the first spatiotemporal nanopore sequencing atlas of bacterial microbiota throughout the course of a river. Our results suggest that this workflow allows for robust assessments of both, the core microbiome of an example fluvial ecosystem and heterogeneous bacterial compositions in the context of supporting physical (temperature, flow rate) and hydrochemical (pH, inorganic solutes) parameters. We show that the technology's current sequencing accuracy of ~92% allows for the designation of significant human pathogen community shifts along rural-to-urban river transitions, as illustrated by downstream increases in the abundance of pathogen candidates.

Our assessment of bioinformatics workflows for taxonomic classification highlights current challenges with error-prone nanopore sequences. A number of recent reports feature bespoke 16S read classification schemes centred around a single software (Acharya et al., 2019; Benítez-Páez et al., 2016; Kerkhof et al., 2017; Nygaard et al., 2020), and others integrated outputs from two methods (Cuscó et al., 2018). Through systematic benchmarking of twelve different classification tools, using matched mock community and river water datasets with respect to the SILVA v.132 reference database, we lay open key differences in terms of these methods' read (mis)classification rates, consensus agreements, speed and memory performance metrics. For example, our results indicate that very fast implementations like Kraken 2 or Centrifuge yield less accurate classifications than slightly slower and more memory-demanding frameworks such as Minimap2 (Figure 2; Figure 2—figure supplement 1).

Using Minimap2, 16.2% of freshwater-derived sequencing reads were assigned to a bacterial species on average, thereby primarily encouraging automated analyses on the genus (65.6% assigned) or family level (76.6% assigned). As nanopore sequencing quality continues to increase through refined pore chemistries, basecalling algorithms and consensus sequencing workflows (Calus et al., 2018; Karst et al., 2021; Latorre-Pérez et al., 2020; Rang et al., 2018; Santos et al., 2020; Zurek et al., 2020), future bacterial taxonomic classifications are likely to improve and advance opportunities for species discovery.

We show that nanopore amplicon sequencing data can resolve the core microbiome of a freshwater body, as well as its temporal and spatial fluctuations. Common freshwater bacteria account for the vast majority of taxa in the River Cam; this includes Sphingomonadaceae, which had also been previously found at high abundance in source water from the same river (Rowe et al., 2016). Our findings suggest that the differential abundances of Carnobacteriaceae most strongly contribute to seasonal loadings in the River Cam. Carnobacteriaceae have been previously associated with a range of low-temperature environments (Lawson and Caldwell, 2014), and we found these taxa to be more abundant in colder April samples (mean 11.3°C, vs. 15.8°C in June and 19.1°C in August). This might help to further establish this family as an indicator for bacterial community shifts along with temperature fluctuations, albeit the influence of co-occurring hydrochemical trends such as water hardness, dissolved carbon or flow speed changes should also be noted (Figure 6b–d; Figure 6—figure supplement 1d).

Most routine freshwater surveillance frameworks focus on semi-quantitative diagnostics of only a limited number of target taxa, such as pathogenic Salmonella, Legionella and faecal coliforms (Ramírez-Castillo et al., 2015; Tan et al., 2015), whereas metagenomics approaches can give a complete and detailed overview of environmental microbial diversity. Besides nanopore shotgun-sequencing (Reddington et al., 2020), our proof-of-principle analysis highlights that targeted full-length 16S rRNA gene MinION sequencing is a suitable complement to hydrochemical controls in pinpointing relatively contaminated freshwater sites, some of which in case of the River Cam had been previously highlighted for their pathogen diversity and abundance of antimicrobial resistance genes (Rowe et al., 2017; Rowe et al., 2016). Nanopore amplicon sequencing has here allowed us to reliably distinguish closely related pathogenic and non-pathogenic bacterial species of the common Legionella, Salmonella, Pseudomonas, and Leptospira genera. For Leptospira bacteria, which are of particular interest to communal stakeholders of the River Cam, we validated nanopore sequencing results through the gold standard qPCR workflow of Public Health England (Supplementary file 5). In order to also study the potential viability and functional implications of sequenced pathogen candidates for public health, we encourage future studies to combine nanopore based freshwater metagenomics with targeted follow-up measurements of living pathogens by established microbiological approaches, including species-specific isolation and subsequent culturing.

A number of experimental intricacies should be addressed towards nanopore freshwater sequencing with our approach, mostly by scrutinising water DNA extraction yields, PCR biases and molar imbalances in barcode multiplexing (Figure 3a; Figure 2—figure supplement 2a–b; Supplementary file 2). Similar to challenges with other organic substrates, microbial raw DNA extraction protocols require careful pre-testing and optimisation towards the physicochemical composition of a given freshwater source, in order to avoid both taxonomic enrichment biases and drop-offs in total yield. One example lies in the optimisation of the filtrate volume – in this study, membrane DNA extraction from 400 mL River Cam water was sufficient to yield valuable insights, while as much as 10,000 mL were used in a previous study of the same river (Rowe et al., 2016). Moreover, potentially dissolved inhibitory compounds for DNA extraction, sample cooling and storage chains should be thoroughly considered for larger and remote river monitoring projects. We witnessed that yield variations may bear negative effects on the molar balance of barcoded nanopore sequencing runs, as illustrated by elevated sample dropouts in June 2018, emphasising the need for highly accurate concentration measurement and scaling when dozens of input DNA sources are pooled. Our study further highlights that MinION (R9.4.1) flow cell throughput can fluctuate by an order of magnitude, altogether causing the exclusion of measurements upon application of a conservative read threshold. We reason that real-time selective nanopore sequencing could serve as a powerful means to improve barcode balances in the context of multiplexed 16S analyses (Loose et al., 2016), albeit such approaches are yet undergoing computational optimisations (Kovaka et al., 2020; Payne et al., 2020).

Our results show that it would already be theoretically feasible to obtain meaningful river microbiota from > 100 barcoded samples on a single nanopore flow cell, thereby enabling water monitoring projects involving large collections at costs below £20 per sample (Supplementary file 6). In line with this, ONT has already released several commercial 96-barcode multiplexing kits for PCR- and non-PCR-based applications, as well as the smaller ‘Flongle’ flow cell with considerably reduced cost as compared to the traditional MinION model. On the other hand, shotgun nanopore sequencing approaches may bypass pitfalls associated with amplicon sequencing, namely taxon-specific primer biases (Frank et al., 2008), 16S rRNA copy number fluctuations between species (Darby et al., 2013) or the omission of functionally relevant sequence elements. In combination with sampling protocol adjustments, shotgun nanopore sequencing could moreover be used for the serial monitoring of eukaryotic microorganisms and viruses in freshwater ecosystems (Reddington et al., 2020).

Since the commercial launch of the MinION in 2015, a wide set of microbial nanopore sequencing applications in the context of rRNA gene (Benítez-Páez et al., 2016; Cuscó et al., 2018; Kerkhof et al., 2017; Nygaard et al., 2020) and shotgun (Leggett et al., 2020; Nicholls et al., 2019; Reddington et al., 2020; Stewart et al., 2019) metagenomics have attracted the interest of a growing user community. Two independent case studies have recently provided decomposition analyses of faecal bacterial pathogens in MinION libraries derived from river and spring waters in Montana, USA (Hamner et al., 2019) and Kathmandu Valley, Nepal (Acharya et al., 2019). Although it is to be expected that short-read metagenomics technology continues to provide valuable environmental insights, as illustrated through global cataloguing efforts of ocean (Tara Oceans coordinators et al., 2015) and wastewater (Global Water Microbiome Consortium et al., 2019) microbiomes, due to their large sizes and fixed costs these traditional platforms remain unfeasible for the monitoring of remote environments – especially in low-resource settings. We reason that the convenience of MinION handling and complementary development of portable DNA purification methods (Boykin et al., 2019; Gowers et al., 2019) will allow for such endeavours to become increasingly accessible to citizens and public health organisations around the world, ultimately democratising the opportunities and benefits of DNA sequencing.

Tool	Version	Commands
BLASTN (Altschul et al., 1990; Camacho et al., 2009)	v.2.9.0+	blastn -task ‘blastn’ -db silva.fa -query Cam16S.fa -out Cam16S.out -outfmt '6'
Centrifuge (Kim et al., 2016)	v.1.0.4	centrifuge -x centrifuge_silva -U Cam16S.fq -S Cam16S.out --report-file Cam16S.report
IDTAXA (Murali et al., 2018)	Implemented in R DECIPHER v.2.10.2 (Wright, 2016)	load(‘SILVA_SSU_r132_March2018.RData’) IdTaxa(Cam16S.fa, trainingSet, strand = ‘both’, threshold = 0)
Kraken 2 (Wood et al., 2019; Wood and Salzberg, 2014)	v.2.0.7	kraken2 --db kraken2_silva --output Cam16S.out --report Cam16S.report Cam16S.fa
MAPseq (Matias Rodrigues et al., 2017)	v.1.2.3	mapseq Cam16S.fa silva.fa > Cam16S.out
MegaBLAST (Camacho et al., 2009; Morgulis et al., 2008)	v.2.9.0+	blastn -task ‘megablast’ -db silva.fa -query Cam16S.fa -out Cam16S.out -outfmt '6'
Minimap2 (Li, 2018)	v.2.13-r852-dirty	minimap2 -ax map-ont -L silva.mmi Cam16S.fa > Cam16S.sam
Mothur (Schloss et al., 2009)	v.1.43.0	align.seqs(candidate = Cam16S.fa, template = mothur.silva.nr_v132.align, processors = 1, ksize = 6, align = needleman)
QIIME 2 (Bolyen et al., 2019)	v.2019.7	qiime feature-classifier classify-consensus-blast --i-query Cam16S.qza --i-reference-reads silva.qza --i-reference-taxonomy silva_tax.qza --o-classification Cam16S.out
RDP (Wang et al., 2007)	Implemented in R DADA2 v.1.12.1 (Callahan et al., 2016)	assignTaxonomy(seqs = Cam16S.fa, refFasta = silva_nr_v132_train_set.fa.gz’, tryRC = T, outputBootstraps = T, minBoot = 0)
SINTAX (Edgar, 2016)	Implemented in VSEARCH v.2.13.3 (Rognes et al., 2016)	vsearch -sintax Cam16S.fa -db silva.udb -tabbedout Cam16S.out -strand both -sintax_cutoff 0.5
SPINGO (Allard et al., 2015)	v.1.3	spingo -d silva.fa -k 8 -a -i Cam16S.fa > Cam16S.out

Share this article

Cite this article

Freshwater microbiome study design and experimental setup.

Benchmarking of classification tools with nanopore full-length 16S sequences.

Bacterial diversity of the River Cam.

Core microbiome of the River Cam.

Spatiotemporal axes of taxonomic diversity in the River Cam.

Geological and hydrochemical profile of the River Cam and its basin.

Potentially pathogenic and wastewater treatment related bacteria in the River Cam.

High-resolution phylogenetic clustering of candidate pathogenic genera in the River Cam.

Author details

Lara Urban

Contribution

Contributed equally with

For correspondence

Competing interests

Andre Holzer

Contribution

Contributed equally with

For correspondence

Competing interests

J Jotautas Baronas

Contribution

Competing interests

Michael B Hall

Contribution

Competing interests

Philipp Braeuninger-Weimer

Contribution

Competing interests

Michael J Scherm

Contribution

Competing interests

Daniel J Kunz

Contribution

Competing interests

Surangi N Perera

Contribution

Competing interests

Daniel E Martin-Herranz

Contribution

Competing interests

Edward T Tipper

Contribution

Competing interests

Susannah J Salter

Contribution

Competing interests

Maximilian R Stammnitz

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Further reading