Zooanthroponotic transmission of SARS-CoV-2 and host-specific viral mutations revealed by genome-wide phylogenetic analysis

  1. Sana Naderi
  2. Peter E Chen
  3. Carmen Lia Murall
  4. Raphael Poujol
  5. Susanne Kraemer
  6. Bradley S Pickering
  7. Selena M Sagan  Is a corresponding author
  8. B Jesse Shapiro  Is a corresponding author
  1. Department of Microbiology & Immunology, McGill University, Canada
  2. Département de sciences biologiques, Université de Montréal, Canada
  3. Public Health Agency of Canada, Canada
  4. Research Centre, Montreal Heart Institute, Canada
  5. National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Canada
  6. Department of Veterinary Microbiology and Preventative Medicine, College of Veterinary Medicine, Iowa State University, United States
  7. Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Canada
  8. Department of Biochemistry, McGill University, Canada
  9. McGill Genome Centre, Canada
  10. McGill Centre for Microbiome Research, Canada
4 figures, 3 tables and 9 additional files

Figures

Overview of available Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) genome sequences from different animal species.

(A) Barplot of the number of genome sequences available in GISAID (on February 28, 2022) sampled from each animal species. Only species with 50 or more sequences were included in the current study: cat, dog, mink, and deer. (B) Heatmap of animal-associated mutations identified in previous publications. Darker colors indicate mutations found in a greater number of studies. Each row corresponds to one of the species included in this study, and columns correspond to mutations along the SARS-CoV-2 reference genome. Mutations identified as family-wise significant (p<0.05) in our genome-wide association studies are indicated with an orange asterisk. A detailed list of publications reporting these mutations is in Supplementary file 8. Only single nucleotide variants, not insertions or deletions, are included. The heatmap illustrates the results of several key prior studies but does not represent a comprehensive meta-analysis.

Transmission events inferred from non-human animals to humans.

Panels a-d display a representative tree for every species with animal-to-human transmissions marked on the tree. More detailed versions of these trees are in . Trees are rooted with the Wuhan reference genome (from one of the first sampled human COVID-19 patients).

Figure 2—source data 1

Detailed representative phylogeny of cat- and human-derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.

In order to make the tree topology clear, branch lengths are not to scale.

https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data1-v1.zip
Figure 2—source data 2

Detailed representative phylogeny of dog- and human-derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.

In order to make the tree topology clear, branch lengths are not to scale.

https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data2-v1.zip
Figure 2—source data 3

Detailed representative phylogeny of mink- and human-derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.

In order to make the tree topology clear, branch lengths are not to scale. The colored boxes to the right of the tree show the allelic state of the three mink-associated genome-wide association studies (GWAS) hits in each terminal branch of the phylogeny, with dark red indicating the animal-associated alternate allele.

https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data3-v1.zip
Figure 2—source data 4

Detailed representative phylogeny of deer- and human derived Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) sequences.

In order to make the tree topology clear, branch lengths are not to scale. The colored boxes to the right of the tree show the allelic state of the seven deer-associated genome-wide association studies (GWAS) hits that appeared in all ten replicate GWAS runs, with dark red indicating the animal-associated alternate allele.

https://cdn.elifesciences.org/articles/83685/elife-83685-fig2-data4-v1.zip
Transmission events from animals-to-humans are rarely detected, except from mink.

The distribution of inferred transmission counts (across 10 replicate trees) in each animal species, in both bootstrap-filtered and unfiltered trees are shown in A the animal-to-human direction, and B the human-to-animal direction. Points are plotted with jitter to avoid overlap.

Manhattan plots summarizing genome-wide association studies (GWAS) hits in each animal species.

In every panel, the x-axis represents the nucleotide position in the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) reference genome and the y-axis represents the -log10 of the pointwise p-values averaged over replicates. ORFs are shown as alternating shaded bars along the x-axis. Statistically, significant hits with family-wise corrected p-values of lower than 0.05 are shown in red (non-synonymous) or blue (synonymous), while non-statistically significant p-values are in black.

Tables

Table 1
Average inferred transmission events between humans and animals.
Average inferred number of transitions(filtered – unfiltered)Mink(n=1038)Deer(n=134)Cat(n=78)Dog(n=39)
Animal-to-human38.0–112.30.7–1.44.4–4.41.4–1.7
Human-to-animal42.3–65.238.3–55.258.5–68.331.5–35.7
Table 2
Single-nucleotide variants associated with mink by genome-wide association studies (GWAS).

“Pos” refers to the nucleotide position in the reference genome. Homoplasy counts in focal animals (cases), humans (controls), and p-values are averaged across replicates in which the site’s family-wise p-values were <0.05. Where applicable, amino acid positions refer to the polyprotein with mature protein positions in parenthesis. The ‘local transmission odds ratio’ is the result of a Fisher’s exact test of the likelihood that the alternate base (animal-associated minor allele) was enriched in the local human population where the mink sequences bearing the alternate base were sampled (Methods). n.s., not significant. Odds ratio p-value: *<0.05, **<0.01, ***<0.001.

Pos.Ref. baseAlt. baseAmino acid changeGeneHomoplasy count in focal animalHomoplasy count in humansp-value (pointwise)p-value (familywise)Siginificant in N replicatesLocal transmission odds ratio
26047UGL219VORF3a600.00140.0365103.93 ***
12795GAG4177E
(nsp9 G37E)
ORF1ab/pp1ab/nsp9/replicase600.00150.036867.53 ***
23064ACN501TSpike/S1/RBD/binds ACE26.400.00100.025850.48 ***
Table 3
Single-nucleotide variants associated with deer by genome-wide association studies (GWAS).

“Pos” refers to the nucleotide position in the reference genome. Homoplasy counts in focal animals (cases), humans (controls), and p-values are averaged across replicates in which the site’s family-wise p-values were <0.05. Where applicable, amino acid positions refer to the polyprotein with mature protein positions in parenthesis. IG, Intergenic. The ‘local transmission odds ratio’ is the result of a Fisher’s exact test of the likelihood that the alternate base (animal-associated minor allele) was enriched in the local human population where the deer sequences bearing the alternate base were sampled (Methods). n.s., not significant. Odds ratio p-value: *<0.05, **<0.01, ***<0.001.

Pos.Ref. baseAlt. baseAmino acid changeGeneHomoplasy count in focal animalHomoplasy count in humansp-value(pointwise)p-value(familywise)Significant in N replicatesLocal transmission odds ratio
7303CUI2346I
(nsp3 I1524I)
ORF1a/pp1ab/pp1a/nsp317.81.29.99E-069.99E-06102.51***
9430CUI3055I
(nsp4 I292I)
ORF1a/pp1ab/pp1a/nsp415.26.29.99E-069.99E-06102.20***
14960AUN4899I
(nsp12 N507I)
ORF1ab/pp1ab/nsp12/RdRp7.80.19.99E-061.09E-05100**
20259CUF6665F
(nsp15 F213F)
ORF1ab/pp1ab/nsp154.80.13.39E-050.001310n.s.
28016CUF41FORF8407.59E-050.0061106.09***
12073CUD3936D
(nsp7 D67D)
ORF1a/pp1ab/pp1a/nsp75.21.14.29E-050.002510n.s.
29679CUIG3’UTR51.88.59E-050.0055103.17***
5184CUP1640L
(nsp3 P822L)
ORF1a/pp1ab/pp1a/nsp34.61.60.00020.011582.61***
29750CUIG3’UTR/S2M52.60.00020.010373.12***
7318CUF2351F
(nsp3 F1533F)
ORF1a/pp1ab/pp1a/nsp340.30.00010.011463.80***
16466CUP5401L
(nsp13 P77L)
ORF1ab/pp1ab/nsp13/Hel514.99E-050.001954.09***
7267CUF2334F
(nsp3 F1516F)
ORF1a/pp1ab/pp1a/nsp34.40.89.39E-050.007952.79***
210GUIG5’UTR/SL5a40.50.00010.013643.98***
6730CUN2155N
(nsp3 N1337N)
ORF1a/pp1ab/pp1a/nsp340.750.00020.016841.81**
27752CUT120IORF7a40.750.00020.016944.03***
11152CUV3629V
(nsp6 V60V)
ORF1a/pp1ab/pp1a/nsp640.70.00020.015330.80**
5822CUL1853F (nsp3
L1035F)
ORF1a/pp1ab/pp1a/nsp340.50.00010.01182n.s.
9711CUS3149F
(nsp4 S386F)
ORF1a/pp1ab/pp1a/nsp440.58.49E-050.011820.56**
9679CUF3138F (nsp4 F375F)ORF1a/pp1ab/pp1a/nsp4409.49E-050.006722.32***
7029CUS2255F
(nsp3 S1437F)
ORF1a/pp1ab/pp1a/nsp340.50.00020.014920.22***
29738CAIG3’UTR/S2M403.99E-050.00591n.s.
26767UCI82TORF5/M408.99E-050.005714.09***
203CUIG5’UTR/SL5a410.00030.019115.94***
12820AGL4185L
(nsp9 L45L)
ORF1a/pp1ab/pp1a/nsp9513.99E-050.000914.52***
4540CUY1425Y
(nsp3 Y607Y)
ORF1a/pp1ab/pp1a/nsp3410.00020.023912.80***
29666CUL37FORF10410.00020.021911.54***

Additional files

Supplementary file 1

GISAID accession numbers of all sequences used in this study.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp1-v1.zip
Supplementary file 2

Number of viral sequences passing quality filters.

The counts show the initial number of sequences downloaded from GISAID from each animal species, and the remaining number after each consecutive quality filter. The ‘quality control’ count shows the number of sequences after removing those with incomplete sampling dates and/or >500 ambiguous bases (Ns). The ‘post-alignment pruning’ shows the count after removing sequences shorter than 29,000 bases and/or with an insertion absent in all other sequences (introducing a gap in the alignment). The ‘divergent tree branches’ shows the count after removing sequences that introduce long branches into the phylogeny (Methods). Ranges of counts indicate variation across tree replicates.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp2-v1.docx
Supplementary file 3

Table of transmission counts for all candidate species, in both animal-to-human and human-to-animal direction, for both bootstrap-filtered and unfiltered cases.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp3-v1.docx
Supplementary file 4

Human-derived sequence counts bearing each of the significant GWAS hits identified in deer inside and outside regions where deer sequences containing each mutation are found.

Odds ratio and the p-values are reported following a Fisher’s exact test. GWAS hits with OR <1 or not significantly different from 1 are highlighted in green.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp4-v1.docx
Supplementary file 5

Human-derived sequence counts bearing each of the significant GWAS hits identified in mink inside and outside regions where mink sequences containing each mutation are found.

Odds ratio and the p-values are reported following a Fisher’s exact test. GWAS hits with OR <1 or not significantly different from 1 are highlighted in green.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp5-v1.docx
Supplementary file 6

Number of times Mink GWAS hits appear along human-to-mink transmission branches.

The counts are summed across all branches and all 10 tree replicates.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp6-v1.docx
Supplementary file 7

Number of times Deer GWAS hits appear along human-to-deer transmission branches.

The counts are summed across all branches and all 10 tree replicates. Yellow-colored rows are mutations that never appear on a human-to-animal transmission branch over all deer replicates. The orange-colored row corresponds to a nucleotide position mutated along human-to-animal transmission branches, but the substitution was never identical to the animal-associated allele identified by GWAS at that position.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp7-v1.docx
Supplementary file 8

SARS-CoV-2 mutations were previously associated with non-human animal species in the literature.

Insertions and deletions are not considered. For each of the studies, mutations reported in the main text or main tables are included. For the (Pickering et al., 2022) study, substitutions found in deer in their study and also appear at least once in the database of previously reported deer sequences (listed in the paper’s Supplementary file 2) are included.

https://cdn.elifesciences.org/articles/83685/elife-83685-supp8-v1.docx
MDAR checklist
https://cdn.elifesciences.org/articles/83685/elife-83685-mdarchecklist1-v1.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Sana Naderi
  2. Peter E Chen
  3. Carmen Lia Murall
  4. Raphael Poujol
  5. Susanne Kraemer
  6. Bradley S Pickering
  7. Selena M Sagan
  8. B Jesse Shapiro
(2023)
Zooanthroponotic transmission of SARS-CoV-2 and host-specific viral mutations revealed by genome-wide phylogenetic analysis
eLife 12:e83685.
https://doi.org/10.7554/eLife.83685