Common virulence gene expression in adult first-time infected malaria patients and severe cases

  1. J Stephan Wichers
  2. Gerry Tonkin-Hill
  3. Thorsten Thye
  4. Ralf Krumkamp
  5. Benno Kreuels
  6. Jan Strauss
  7. Heidrun von Thien
  8. Judith AM Scholz
  9. Helle Smedegaard Hansson
  10. Rasmus Weisel Jensen
  11. Louise Turner
  12. Freia-Raphaella Lorenz
  13. Anna Schöllhorn
  14. Iris Bruchhaus
  15. Egbert Tannich
  16. Rolf Fendel
  17. Thomas D Otto
  18. Thomas Lavstsen
  19. Tim W Gilberger
  20. Michael F Duffy
  21. Anna Bachmann  Is a corresponding author
  1. Molecular Biology and Immunology, Bernhard Nocht Institute for Tropical Medicine, Germany
  2. Centre for Structural Systems Biology, Germany
  3. Biology Department, University of Hamburg, Germany
  4. Wellcome Sanger Institute, United Kingdom
  5. Epidemiology and Diagnostics, Bernhard Nocht Institute for Tropical Medicine, Germany
  6. German Center for Infection Research (DZIF), Partner Site Hamburg-Lübeck-Borstel-Riems, Germany
  7. Department of Tropical Medicine, Bernhard Nocht Institute for Tropical Medicine, Germany, Germany
  8. Department of Medicine, College of Medicine, Malawi
  9. Department of Medicine, University Medical Center Hamburg-Eppendorf, Germany
  10. CMP, University of Copenhagen, Denmark
  11. Institute of Tropical Medicine, University of Tübingen, Germany
  12. German Center for Infection Research (DZIF), Partner Site Tübingen, Germany
  13. Institute of Infection, Immunity and Inflammation, University of Glasgow, United Kingdom
  14. Department of Microbiology and Immunology, University of Melbourne, Australia
8 figures, 1 table and 13 additional files

Figures

Figure 1 with 1 supplement
Subgrouping of patients into first-time infected and pre-exposed individuals based on antibody levels against P. falciparum.

In order to further characterize the patient cohort, plasma samples (n = 32) were subjected to Luminex analysis with the Plasmodium falciparum (P. falciparum) antigens AMA1, MSP1, and CSP, known to induce a strong antibody response in humans. With exception of patient #21, unsupervised clustering of the PCA-reduced data clearly discriminates between first-time infected (naive) and pre-exposed patients with higher antibody levels against tested P. falciparum antigens and also assigns plasma samples from patients with an unknown immune status into naive and pre-exposed clusters (A). Classification of patient #21 into the naive subgroup was confirmed using different serological assays assessing antibody levels against P. falciparum on different levels: a merozoite-directed antibody-dependent respiratory burst (mADRB) assay (Kapelski et al., 2014) (B), a parasitophorous vacuolar membrane-enclosed merozoite structure (PEMS)-specific ELISA (C), and a 262-feature protein microarray covering 228 well-known P. falciparum antigens detecting reactivity with individual antigens, and the antibody breadth of IgG (upper panel) and IgM (lower panel) (D). The boxes represent medians with IQR; the whiskers depict minimum and maximum values (range), with outliers located outside the whiskers. Serological assays revealed significant differences between patient groups (Mann-Whitney U test). Reactivity of patient plasma IgG and IgM with individual antigens in the protein microarray is presented as the volcano plot, highlighting the significant hits in red. Box plots represent antibody breadths by summarizing the number of recognized antigens out of 262 features tested. Data from all assays were used for an unsupervised random forest approach (E). The variable importance plot of the random forest model shows the decrease in prediction accuracy if values of a variable are permuted randomly. The decrease in accuracy was determined for each serological assay, indicating that the mADRB, ELISA, and Luminex assays are most relevant in the prediction of patient clusters (F). Venn chart showing the patient subgroups used for differential expression analysis (G). Patients with a known immune status based on medical reports were marked in all plots with filled circles in blue (naive) and grey (pre-exposed), and samples from patients with an unknown immune status are shown as open circles. Patient #21 is shown as a filled circle in grey with a cross, and patient #26 is represented by an open circle with a cross. ELISA: enzyme-linked immunosorbent assay; IQR: interquartile range; PCA: principal component analysis.

Figure 1—figure supplement 1
Early immune response in mild and severe malaria within the naive patient cluster.

Antibody reactivity against individual antigens within the three subgroups ‘naive with mild symptoms,’ ‘naive with severe symptoms,' and ‘pre-exposed with mild symptoms.' Sera from all volunteers were assessed on protein microarrays and data normalized to control spots containing no antigen (no DNA control spots). Median reactivities of the mild infected malaria-naive, severely infected malaria-naive as well as the mild infected with pre-exposure to malaria are represented as bar charts. IgG data is given for all 262 Plasmodium falciparum (P. falciparum) proteins spotted on the microarray representing 228 unique antigens (A). To estimate differences in immune response in mild and severe malaria within the malaria-naive population, normalized IgG (B) and IgM (C) antibody responses were compared in the two subpopulations. Differentially recognized antigens (p<0.05 and fold change >2) are depicted in red.

Figure 2 with 4 supplements
Overview of the methodology and differential core gene expression analysis.

Summary diagram of the approaches taken to analyze the differential expression of core and var genes. In principle, all samples were analyzed by sequencing of the RNA using next generation sequencing (NGS) and by sequencing of expressed sequence tags (EST) from the DBLα domain (A). Gene set enrichment analysis (GSEA) of gene ontology (GO) terms and KEGG pathways indicate gene sets deregulated in first-time infected malaria patients. GO terms related to antigenic variation and host cell remodeling are significantly overrepresented in the down-regulated gene set; only the KEGG pathway 03410 ‘base excision repair’ shows a significant up-regulation in malaria-naive patients (B). Log fold changes (logFC) for the 15 Plasmodium falciparum (P. falciparum) genes assigned to the KEGG pathway 03410 ‘base excision repair’ are plotted with the six significant hits marked with * p<0.05 and **p<0.01 (C).

Figure 2—figure supplement 1
Estimated stage proportions for each sample.

Patient samples consist of a combination of different parasite stages. To estimate the proportion of different life cycle stages in each sample, a constrained linear model was fit using data from López-Barragán et al., 2011. The proportions of rings (8 hours post infection (hpi)), early trophozoites (19 hpi), late trophozoites (30 hpi), schizonts (42 hpi), and gametocyte stages shown in the columns of the bar plots must add to one for each sample. Shown are the comparisons between first-time infected (naive; blue) and pre-exposed samples (grey) (A) and severe (red) and non-severe cases (grey) (B). A bias towards the early trophozoite appears in the non-severe malaria sample group, which was confirmed by calculating the age in hours post infection (hpi) for each parasite sample. The boxes represent medians with IQR; the whiskers depict minimum and maximum values (range), with outliers located outside the whiskers (C, D). IQR, interquartile range.

Figure 2—figure supplement 2
Summary diagram of the approaches taken to analyze the RNA-seq data.

Diagram created in Lucidchart (http://www.lucidchart.com/).

Figure 2—figure supplement 3
The base excision repair (KEGG:03410) in P. falciparum.

Orthologues present in Plasmodium falciparum (P. falciparum) are indicated by gene IDs, and log fold changes (logFC) are indicated by colour codes (red: up-regulated; blue: down-regulated) (A). Summary of logFC in gene expression in first-time infected relative to pre-exposed patients and p-values for the logFC (B).

Figure 2—figure supplement 4
RNA quality.

The Bioanalyzer automated RNA electrophoresis system was used to characterize the total RNA quality prior to library synthesis. The calculated RNA integrity number (RIN) value is provided, although this measurement is questionable for samples from mixed species. From the four rRNA peaks visible in all samples, the inner peaks represent Plasmodium falciparum (P. falciparum) 18S and 28S rRNA, and the outer peaks are of human origin.

Figure 3 with 1 supplement
Summary of PfEMP1 transcripts, domains, and homology blocks that were found more or less frequently in malaria-naive and severely ill patients.

A schematic presentation of all var gene groups with their associated binding phenotypes and typical PfEMP1 domain compositions. The N-terminal head structure confers mutually exclusive receptor-binding phenotypes: EPCR (beige: CIDRα1.1/4–8), CD36 (turquoise: CIDRα2–6), CSA (yellow: VAR2CSA), and yet unknown phenotypes (brown: CIDRβ/γ/δ; dark red: CIDRα1.2/3 from VAR1, VAR3). Group A includes the conserved subfamilies VAR1 and VAR3, EPCR-binding variants, and those with unknown binding phenotypes conferred by CIDRβ/γ/δ domains. Group B PfEMP1 can have EPCR-binding capacities, but most variants share a four-domain structure, with group C-type variants capable of CD36 binding. Dual binders can be found within groups A and B, with a DBLβ domain after the first CIDR domain responsible for ICAM-1 (DBLβ1/3/5) or gC1qr binding (DBLβ12) (A). Transcripts, domains, and homology blocks according to Rask et al., 2010 as well as domain predictions from the DBLα-tag approach were found to be significantly differently expressed (p-value<0.05) between patient groups of both comparisons: first-time infected (blue) versus pre-exposed (black) cases and severe (red) versus non-severe (black) cases (B). ATS: acidic terminal sequence; CIDR: cysteine-rich interdomain region; CSA: chondroitin sulphate A; DBL: Duffy binding-like; DC: domain cassette; EPCR: endothelial protein C receptor; gC1qr: receptor for complement component C1q; ICAM-1: intercellular adhesion molecule 1; NTS: N-terminal segment; PAM: pregnancy-associated malaria; TM: transmembrane domain.

Figure 3—figure supplement 1
Differential expression of the var1 variants 3D7 and IT and var2csa between patient groups.

RNA-seq reads from each patient were normalized against the number of mappable reads to the 3D7 genome and aligned to the var1-3D7 and var1-IT variants as well as var2csa. The resulting bigwig files were displayed in Artemis (Carver et al., 2012). Individual samples are coloured according to the following patient groups: first-time infected in blue (A), severe in red (B), and the respective pre-exposed or non-severe samples in grey.

Expression differences between parasites from first-time infected and pre-exposed patients at the level of var gene transcripts, domains, and homology blocks determined by NGS.

RNA-seq reads of each patient sample were matched to de novo assembled var contigs with varying lengths, domains, and homology block compositions. Shown are significantly differently expressed var gene contigs (A, B) as well as PfEMP1 domain subfamilies (C–F) and homology blocks (G, H) from Rask et al., 2010, with an adjusted p-value of <0.05. Data are displayed as heat maps showing expression levels either in log-transformed normalized Salmon read counts (A) or in log transcripts per million (TPM) (C, G) for each individual sample. Box plots show median log-transformed normalized Salmon read counts (B) or TPM (D, F, H) and interquartile range (IQR) for each group of samples. Individual domains from inter-strain conserved tandem arrangements of domains, so-called domain cassettes (DCs), found significantly higher expressed in samples from first-time infected (blue arrow) and pre-exposed patients (grey arrow), are indicated in bold (E). The N-terminal head structure (NTS-DBLα-CIDRα/β/γ/δ) confers a mutually exclusive binding phenotype either to EPCR-, CD36-, CSA-, or an unknown receptor. Expression values of the N-terminal domains were summarized for each patient, and differences in the distribution among patient groups were tested using the Mann-Whitney U test (F). Normalized Salmon read counts for all assembled transcripts and TPM for PfEMP1 domains and homology blocks are available in Supplementary file 8.

Expression differences between parasites from severe and non-severe cases at the level of var gene transcripts, domains, and homology blocks determined by NGS.

RNA-seq reads of each patient sample were matched to de novo assembled var contigs with varying lengths, domains, and homology block compositions. Shown are significantly differently expressed var gene contigs (A, B) as well as PfEMP1 domain subfamilies (C–F) and homology blocks from Rask et al., 2010, with an adjusted p-value of <0.05 in severe (red) and non-severe patient samples (grey) (A, B). Data are displayed as heat maps showing expression levels either in log-transformed normalized Salmon read counts (A) or in log transcripts per million (TPM) (C, G) for each individual sample. Box plots show median log-transformed normalized Salmon read counts (B) or TPM (D, F, H) and interquartile range (IQR) for each group of samples. Individual domains from inter-strain conserved tandem arrangements of domains, so-called domain cassettes (DCs), found significantly higher expressed in severe (red arrow) and non-severe cases (grey arrow), are indicated in bold (E). The N-terminal head structure (NTS-DBLα-CIDRα/β/γ/δ) confers a mutually exclusive binding phenotype either to EPCR-, CD36-, CSA-, or an unknown receptor. Expression values of the N-terminal domains were summarized for each patient, and differences in the distribution among patient groups were tested using the Mann-Whitney U test (F). Normalized Salmon read counts for all assembled transcripts and TPM for PfEMP1 domains and homology blocks are available in Supplementary file 8.

Figure 6 with 2 supplements
Verification of RNA-seq results using DBLα-tag sequencing.

Amplified DBLα-tag sequences were blasted against the ~2400 genomes on varDB (Otto, 2019) to obtain subclassification into DBLα0/1/2 and prediction of adjacent head structure N-terminal segment (NTS) and cysteine-rich interdomain region (CIDR) domains and their related binding phenotype. Proportion of each NTS and DBLα subclass as well as CIDR domains grouped according to the binding phenotype (CIDRα1.1/4–8: EPCR-binding, CIDRα2–6: CD36-binding, CIDRβ/γ/δ: unknown binding phenotype/rosetting) was calculated and shown separately on the left, and the number of total reads and individual sequence cluster with n ≥ 10 sequences are shown on the right. Differences in the distribution among first-time infected (blue) and pre-exposed individuals (grey) (A) as well as severe (red) and non-severe cases (grey) (B) were tested using the Mann-Whitney U test. The boxes represent medians with IQR; the whiskers depict minimum and maximum values (range) with outliers located outside the whiskers.

Figure 6—figure supplement 1
Comparison of DBLα-tag sequencing with RNA-seq analysis.

DBLα-tag sequencing and RNA-seq data compared in Bland-Altman plots for all patients summarized (A) and for each individual patient (B), where the mean log expression of each gene is indicated on the x-axis and the log ratio between normalized DBLα-tag (% of reads) and RNA-seq values (% of RPKM from all contigs containing both DBLα-tag primer-binding sites) on the y-axis. The mean (equal to bias) of all ratios (line) and the confidence interval (CI) of 95% (dotted lines) are indicated. Data points with negative values for one of the approaches are displayed in dependence of their mean log expression on top (DBLα-tag sequence clusters not detected by RNA-seq) or bottom (RNA-seq contigs not found within DBLα-tag sequence cluster) of the graph.

Figure 6—figure supplement 2
Quantile regression analysis of Varia outputs.

Quantile regression was applied to look for differences between patient groups on the level of domain main classes (left) and subdomains (right). Shown are median differences with 95% confidence intervals of domains with values that unequal 0. Domains with positive values tend to be higher expressed in naive (A) and severe patients (B).

Correlation of var gene expression with antibody levels against head structure CIDR domains.

Patient plasma samples (n = 32) were subjected to Luminex analysis with 35 PfEMP1 head structure cysteine-rich interdomain region (CIDR) domains. The panel includes endothelial protein C receptor (EPCR)-binding CIDRα1 domains (n = 19), CD36-binding CIDRα2–6 domains (n = 12), and CIDR domains with unknown binding phenotypes (CIDRγ3: n = 1, CIDRδ1: n = 3) as well as the minimal binding region of VAR2CSA (VAR2). Box plots showing mean fluorescence intensities (MFI) extending from the 25th to the 75th percentile with a line at the median indicate the higher reactivity of the pre-exposed (A) and non-severe cases (B) with all PfEMP1 domains tested. Significant differences were observed for recognition of CIDRα2–6, CIDRδ1, and CIDRγ3; VAR2CSA recognition differed only between severe and non-severe cases (Mann-Whitney U test). Furthermore, the breadth of IgG recognition (%) of CIDR domains for the different patient groups was calculated and shown as a heat map (C).

Author response image 1

Tables

Table 1
Patient data.
First-time infected (naive)
(n = 15)
Pre-exposed
(n = 17)
Severe malaria
(n = 8)
Non-severe malaria
(n = 24)
Female sex
[n (%)]
6 (40%)3 (18%)3 (38%)6 (25%)
Patient age in years [median (IQR)]34 (26–53)38 (31–45)47 (27–59)35 (31–46)
Hb g/dl
[median (IQR)]*
13.1 (12.1–14.6)12.2 (11.8–13.1)12.1 (11.6–13.0)13.2 (12.0–14.3)
Parasitaemia %
[median (IQR)]
7.0 (4.0–23.5)2.0 (1.0–3.0)23.5 (10.0–36.3)2.5 (1.0–3.9)
Number of MSP1 genotypes
[n (%)]
1: 10 (66%)
2: 1 (7%)
3: 3 (20%)
4: 1 (7%)
1: 12 (71%)
2: 3 (18%)
3: 2 (12%)
4: 0 (0%)
1: 5 (63%)
2: 1 (13%)
3: 1 (13%)
4: 1 (13%)
1: 17 (71%)
2: 3 (13%)
3: 4 (17%)
4: 0 (0%)
Total reads
[median (IQR)]
41,341,958
(37,804,417–43,659,324)
41,259,082
(36,921,362–43,904,892)
42,458,431
(38,520,154–49,561,881)
41,050,568
(36,920,201–44,030,863)
P. falciparum reads
[median (IQR)]
35,940,843
(34,099,395–39,090,313)
37,065,150
(28,707,096–38,070,441)
37,980,501
(35,195,959–45,563,701)
35,559,157
(29,711,534–37,774,576)
Number of assembled var contigs (>500 bp)
[median (IQR)]
220.5
(169.3–320.8)
165.5
(121.3–251.5)
292
(210–404)
174
(121–259)
Parasite age [median (IQR)]9.4
(8.0–10.3)
9.8
(8.0–10.6)
8.2
(8.0–9.8)
9.8
(8.2–11.4)
  1. Geographic origin of the parasite isolates: Ghana (n = 10), Nigeria (n = 6), other Sub-Saharan African countries (n = 15), unknown (n = 1) 

    *n = 21.

Additional files

Source data 1

Sequences from all assembled var contigs with a length >500 bp.

https://cdn.elifesciences.org/articles/69040/elife-69040-data1-v2.txt
Supplementary file 1

Data from Luminex, mADRB, ELISA, and protein microarray.

Seroprevalence of head structure CIDR domains determined by applying a cut off from Danish controls (mean +2 STD) to the Luminex data.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp1-v2.xlsx
Supplementary file 2

Characteristics and classification of adult malaria patients.

Parasitaemia, signs of organ failure, and subgrouping of each individual patient.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp2-v2.docx
Supplementary file 3

Raw read counts by sample for H. sapiens, P. falciparum, var exon one and percentage of reads that mapped either to P. falciparum or var exon one as well as the number of assembled var contigs > 500 bp in length.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp3-v2.xlsx
Supplementary file 4

Differentially expressed genes excluding var genes (all gene analyses) between first-time infected and pre-exposed patient samples.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp4-v2.xlsx
Supplementary file 5

Differentially expressed genes excluding var genes (all gene analyses) between severe and non-severe patient samples.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp5-v2.xlsx
Supplementary file 6

Features of the assembled var fragments annotated in accordance with Rask et al., 2010 and Tonkin-Hill et al., 2018.

The reading frame used for translation is given after the contig ID, and the position of each annotation is provided by starting and ending amino acid followed by the p-value from the blast search against the respective database. For annotations in accordance with Tonkin-Hill et al., 2018, either the short ID or ‘NA’ (not applicable) is listed at the end. Short IDs are only available for significant differently expressed domains and blocks between severe and non-severe cases (Tonkin-Hill et al., 2018).

https://cdn.elifesciences.org/articles/69040/elife-69040-supp6-v2.xlsx
Supplementary file 7

Summary of var gene fragments assembled for each patient isolate showing length, raw read counts, RPKM, blast hits, domains, and block annotations in accordance with Rask et al., 2010.

The RPKM for the contigs was calculated as the number of mapped reads and normalized by the number of mapped reads against all transcripts in each isolate, respectively. Therefore, RPKM expression values are only valid for comparison within a single sample since RNA-seq reads were mapped only to the contigs of the respective patient isolate using BWA-MEM (Li, 2013). Further, the amount of blast hits with 500 bp or 80% of overlap against the ~2400 samples from varDB (Otto, 2019) with an identity cut off of 98%. Further hits of 1 kb (>98% identity) against the var genes from the 15 reference genomes (Otto et al., 2018a) are listed. The last two columns show the annotations from Rask et al., 2010 associated to each contig.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp7-v2.xlsx
Supplementary file 8

Log-transformed normalized Salmon read counts for assembled var transcripts, TPM for collapsed domains, and homology blocks from each patient isolate.

Normalized counts and transcripts per million (TPM) values calculated for transcripts, domains, and blocks with expression in at least three patient isolates with more than five read counts.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp8-v2.xlsx
Supplementary file 9

Differently expressed var transcripts, domains, and homology blocks between first-time infected and pre-exposed patient samples.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp9-v2.xlsx
Supplementary file 10

Differently expressed var transcripts, domains, and homology blocks between severe and non-severe patient samples.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp10-v2.xlsx
Supplementary file 11

Data from DBLα-tag sequencing.

https://cdn.elifesciences.org/articles/69040/elife-69040-supp11-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/69040/elife-69040-transrepform-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. J Stephan Wichers
  2. Gerry Tonkin-Hill
  3. Thorsten Thye
  4. Ralf Krumkamp
  5. Benno Kreuels
  6. Jan Strauss
  7. Heidrun von Thien
  8. Judith AM Scholz
  9. Helle Smedegaard Hansson
  10. Rasmus Weisel Jensen
  11. Louise Turner
  12. Freia-Raphaella Lorenz
  13. Anna Schöllhorn
  14. Iris Bruchhaus
  15. Egbert Tannich
  16. Rolf Fendel
  17. Thomas D Otto
  18. Thomas Lavstsen
  19. Tim W Gilberger
  20. Michael F Duffy
  21. Anna Bachmann
(2021)
Common virulence gene expression in adult first-time infected malaria patients and severe cases
eLife 10:e69040.
https://doi.org/10.7554/eLife.69040