Figures and data

Study design and analysis pipeline.
(A) Blood samples from all participants from 4 villages of the Upper River Region of The Gambia were collected up to 16 times over 2.5 years. The peak of clinical malaria cases occurs at the end and right after the rainy season. Made with Natural Earth. (B) Overall, 522 blood samples (307 fingerprick and 215 venous blood) were genotyped and/or whole genome sequenced, resulting in 425 high-quality barcodes and 199 high-quality genomes. Additionally, 6 drug resistance markers were successfully genotyped and/or called from whole genomes, in a total of 438 isolates. (C) High-quality barcodes and genomes were sampled in 4 villages over 16 time points between December 2014 and May 2017.

Low parasite inbreeding level in 4 villages in The Gambia inferred from inter-individual genetic relatedness.
The genetic relatedness of parasites was assessed from barcodes sampled between December 2014 and December 2016, keeping just one barcode per continuous infection, which resulted in 284 remaining barcodes. (A) Distribution of IBD values between barcodes (left panel), and with a cap at 500 pairs to highlight related barcodes at lower frequency (right panel). (B) Relatedness network of 284 isolates with barcodes represented as nodes and IBD values represented as edges. Barcodes are grouped into clusters using the compound spring embedder layout algorithm from Cytoscape (version 3.10.1).

Combined effects of spatial and temporal distances on parasite relatedness.
The proportions of related barcodes (IBD ≥ 0.5) between each pair of households are binned into time intervals of various lengths such that the number of observations in each bin is similar. Each box shows the 25-75 % interquartile range of the percentage of related barcodes, with horizontal bar as the median. Groups of related barcodes were compared with Welch t-tests (*: p-value < 0.05, ****: p-value < 0.00005, ns: non-significant). (A) All pairs of isolates were grouped together in the same time interval. (B) Pairs of isolates were grouped by their relative spatial distance.

Effect of seasonality on parasite inbreeding level.
(A) Proportion of related barcodes (IBD ≥ 0.5) between all sample collections from December 2014 to December 2016. Each square represents a set of pairs of time points within the same seasonal pair. There were 5 successive transmission seasons during the study, hence 15 unique seasonal pairs. Annotations are present for seasonal pairs corresponding to a season compared with itself (‘within high’ and ‘within low’) and to a season compared with the preceding/succeeding one (‘low to high’ and ‘high to low’). (B) Proportion of related barcodes between pairs of sample collections within the same season (‘within high’ and ‘within low’) and one season apart when the high transmission season precedes the low transmission season and conversely (respectively ‘high to low’ and ‘low to high’). Genetic similarities were compared between the ‘low to high’ group and all other groups with Welch t-tests (*: p-value < 0.05, ***: p-value < 0.0005).

Prevalence of 5 drug resistance-related haplotypes for each time point with at least 10 observations over the study time period.
The 5 variants induce non-synonymous changes in AAT1 S528L, CRT K76T, DHFR S108N, DHPS A437G and MDR1 N86Y which are known to reduce the susceptibility to multiple antimalarials. The prevalence of the Kelch13 C580Y mutation was null and not shown here. For all markers except DHPS A437G, the proportions of all variants remained stable between the start and end of the study period. Error bars represent the 95 % Wilson’s confidence intervals.

Continuous P. falciparum infections with the same dominant genotype.
A total of 32 individuals were infected with highly related barcodes (IBD ≥ 0.9) at two or more time points. Individuals are ranked by their duration of continuous infection from the longest to the shortest, with a black line linking identical ‘dominant’ genotypes (black dots) that are the farthest away from each other. Additional genotypes (grey dots) are different (IBD < 0.9) from the dominant genotype but may still be related (IBD ≥ 0.5). For two individuals (ranked 24th and 25th), two different parasite strains were present concurrently, represented by curved lines. Barcodes and P. falciparum infection status are shown up to 90 days prior or after the inferred continuous infection.

Steps of the combined barcode-and genome-analysis pipeline using 522 P. falciparum-positive blood isolates.

Comparison of heterozygous locus calls between each pair of molecular barcode (obtained by genotyping) and their corresponding genomic barcode (built from allelic frequencies).
Genome loci were considered heterozygous if their within-sample Minor Allele Frequencies (MAF) were above a threshold ranging from 0 to 0.5. The threshold of MAF of 0.2 was finally retained because it yields a high number of matches and a low number of mismatches of heterozygous locus call, making the genomic barcodes as close as possible to their molecular counterparts.

Comparison of molecular barcodes loci (built from genotyped SNPs) and consensus barcodes loci (combining molecular and WGS loci) with WGS loci for two groups of samples clustered by their collection dates.
(A) For each sample with both a barcode and a genome, pairs of corresponding loci are colour-coded as ‘matching’ (green), ‘mismatching’ (red), ‘heterozygous call matching’ (light green), ‘heterozygous-homozygous call mismatching’ (light red) or ‘incomparable when at least one call is unknown’ (grey). Barcode loci are sorted based on their agreement with WGS loci (proportion of matching loci over all comparable loci) from low to high.WGS loci were considered heterozygous if the within sample MAF was above 0.2 (Figure S2). (B) Average agreement between molecular barcode loci and WGS loci coloured by their average agreement in samples collected after May 2016 (< 0.8 coloured in red, ≥ 0.8 coloured in green). Barcode loci are sorted by their chromosomal location. Overall, loci match between the two methods for samples collected up to May 2016 (all loci have an agreement above 0.79). However, 21 genotyped loci show a consistent mismatch (all loci have an agreement below 0.7) with WGS loci for all samples collected after May 2016. These 21 genotyped loci were incorrectly genotyped because of a failure of one of the multiplexes during the genotyping procedure. To increase the number of available loci of barcodes, all unknown and incorrectly genotyped loci were replaced by WGS loci, resulting in a ‘consensus barcode’.

Number of high-quality barcodes (425 isolates) and genomes (199 isolates) successfully sequenced over 16 time points between December 2014 and May 2017.

Number of homozygous SNPs per barcode.
Among 425 barcodes, 216 were improved using corresponding genomic calls (named ‘consensus barcodes’ on this figure), resulting in an increase in the number of homozygous SNPs (average: 76 SNPs, median: 82 SNPs). The remaining 209 molecular barcodes had a lower number of homozygous SNPs (average: 54 SNPs, median: 53 SNPs). Overall, the 425 consensus and molecular barcodes (referred to as ’consensus barcodes’ in the main text) had an average of 65 SNPs and a median of 64 SNPs.

Correlation between Fws values and percentage of heterozygous loci.
(A) The percentage of heterozygous loci in barcodes ranges from 0 to more than 40 %, highlighting their high variability among P. falciparum strains. The percentage of heterozygous loci shows a strong negative correlation with their corresponding Fws value (R2 = 0.83, p-value < 10-15). Considering isolates with Fws values available, only 15 % (20/132) of monoclonal isolates (heterozygous site proportion below 0.005) have a Fws value below 0.95 and 6 % (4/64) of polyclonal isolates (heterozygous site proportion above 0.005) have a Fws value above 0.95. (B) The percentage of heterozygous loci in genomes range from 0 to less than 15 %, which is expected as the 27,577 SNPs had various population-level MAF. The percentage of heterozygous loci shows a strong negative correlation with their corresponding Fws value (R2 = 0.92, p-value < 10-15). Computing the proportion of heterozygous loci in both genome and barcode data (84 SNPs) enables to accurately estimate the level of clonality of an isolate.

Proportion of polyclonal isolates over time estimated by Fws ( Fws < 0.95) and the proportion of heterozygous loci (more than 0.5 % of available sites) on barcodes and genomes.
The percentage of polyclonal isolates is stable over time for all methods (around 40 % for Fws, 39 % using heterozygous loci of genomes, 34 % using heterozygous loci of barcodes). Error bars represent the 95 % Wilson’s confidence intervals.

IBD calculated from consensus barcodes highly correlates with IBD from genomes.
Identity By Descent was calculated pairwise between the 425 high-quality consensus barcodes (‘barcode-IBD’) with at least 30 comparable sites and 10 informative positions (79564/90100 pairs remaining) and between the 199 high-quality genomes (‘genome-IBD’) with at least 100 informative positions (19700/19701 pairs remaining). The accuracy of the classification of related (IBD ≥ 0.5) and unrelated (IBD < 0.5) isolates from barcode-IBD was assessed using genome-IBD as the gold standard. Overall, the barcode-IBD classification was in a strong agreement with the genome-IBD classification (Cohen’s kappa of 0.839) and had high values of specificity (0.997), sensitivity (0.841) and precision (0.843). Among the 364 pairs of samples with a barcode-IBD above 0.5, 314 were indeed related (genome-IBD above 0.5) while 50 were actually not related (genome-IBD below 0.5), although 24 showed genome-IBD value between 0.4 to 0.5. Regarding the 18626 unrelated pairs according to barcode-IBD classification, only 58 were called as related by the genome-IBD classification.

Agreement between pairs of haplotypes obtained both by molecular genotyping (‘barcodes’) and from Whole Genome Sequence calling (‘genomes’).
Six drug resistance markers were screened, AAT1 S528L, CRT K76T, DHFR S108N, DHPS A437G, Kelch13 C580Y and MDR1 N86Y. Drug resistance haplotypes were obtained in 428 barcodes (1261 haplotypes) and 198 genomes (1000 haplotypes) for a merged total of 438 isolates (1716 haplotypes). The vast majority (89 %) of the 545 pairs of haplotypes obtained with the two methods are identical, which indicates that both methods are accurate. While there are multiple partial matches (one haplotype is mixed according to a method and either sensitive or resistant by the other) with mixed haplotypes called from genomes, no partial match with mixed haplotypes called from barcodes were found, showing that whole genome sequencing is more sensitive than molecular genotyping.

Durations of infection by the same parasite strain, grouped by host age and coloured by gender.
The number of short (less than 3 months) and long infections (more than 3 months) were compared between genders (female versus male) and age groups (‘5-15 years old’ versus ‘older than 15 years’). No significant difference of the duration of infection was found between genders, which had very similar percentages of short infections with 38 % and 48 % of infections for female and male respectively (χ2 = 0.30, p-value > 0.5). Although long infections were more common in the ‘5-15 years old’ age group (71 % of all infections) compared to the ‘older than 15 years age’ group (45 % of all infections), this was not statistically significant (χ2 = 2.34, p-value < 0.13), potentially due to relatively small sample size.