Association of Toll-like receptor 7 variants with life-threatening COVID-19 disease in males: findings from a nested case-control study
Abstract
Background:
Recently, loss-of-function variants in TLR7 were identified in two families in which COVID-19 segregates like an X-linked recessive disorder environmentally conditioned by SARS-CoV-2. We investigated whether the two families represent the tip of the iceberg of a subset of COVID-19 male patients.
Methods:
This is a nested case-control study in which we compared male participants with extreme phenotype selected from the Italian GEN-COVID cohort of SARS-CoV-2-infected participants (<60 y, 79 severe cases versus 77 control cases). We applied the LASSO Logistic Regression analysis, considering only rare variants on young male subsets with extreme phenotype, picking up TLR7 as the most important susceptibility gene.
Results:
Overall, we found TLR7 deleterious variants in 2.1% of severely affected males and in none of the asymptomatic participants. The functional gene expression profile analysis demonstrated a reduction in TLR7-related gene expression in patients compared with controls demonstrating an impairment in type I and II IFN responses.
Conclusions:
Young males with TLR7 loss-of-function variants and severe COVID-19 represent a subset of male patients contributing to disease susceptibility in up to 2% of severe COVID-19.
Funding:
Funded by private donors for the Host Genetics Research Project, the Intesa San Paolo for 2020 charity fund, and the Host Genetics Initiative.
Clinical trial number:
Introduction
Coronavirus disease 2019 (COVID-19), a potentially severe systemic disease caused by coronavirus SARS-CoV-2, is characterized by a highly heterogeneous phenotypic presentation, with the large majority of infected individuals experiencing only mild or no symptoms. However, severe cases can rapidly evolve toward a critical respiratory distress syndrome and multiple organ failure (Wu and McGoogan, 2020). COVID-19 still represents an enormous challenge for the world's healthcare systems almost 1 year after the first appearance in December 2019 in Wuhan, Huanan, Hubei Province of China. Although older age and the presence of cardiovascular or metabolic comorbidities have been identified as risk factors predisposing to severe disease (Hägg et al., 2020), these factors alone do not fully explain differences in severity (Stokes et al., 2020). Stokes EK et al. reported that male patients show more severe clinical manifestations than females with a statistically significant (p<0.00001) higher prevalence of hospitalizations (16% versus 12%), ICU admissions (3% versus 2%), and deaths (6% versus 5%) (Stokes et al., 2020). These results are in line with other reports indicating that gender may influence disease outcome (Garg et al., 2020; Goodman et al., 2020).
These findings suggest a role of host predisposing genetic factors in the pathogenesis of the disease, which may be responsible for different clinical outcomes as a result of different antiviral defense mechanisms as well as specific receptor permissiveness to virus and immunogenicity.
Recent evidence suggests a fundamental role of interferon genes in modulating immunity to SARS-CoV-2; in particular, rare variants have recently been identified in the interferon type I pathway that are responsible for inborn errors of immunity in a small proportion of patients and auto-antibodies against type I interferon genes in up to 10% of severe COVID-19 cases (Zhang et al., 2020; Bastard et al., 2020).
Toll-like receptors (TLRs) are crucial components in the initiation of innate immune responses to a variety of pathogens, causing the production of pro-inflammatory cytokines (TNF-α, IL-1, and IL-6) and type I and II Interferons (IFNs), that are responsible for innate antiviral responses. In particular, the innate immunity is very sensitive in detecting potential pathogens, activating downstream signaling to induce transcription factors in the nucleus, promoting synthesis and release of type I and type II IFNs in addition to a number of other proinflammatory cytokines, and leading to a severe cytokine release syndrome which may be associated with a fatal outcome. Interestingly, among the different TLRs, TLR7 recognizes several single-stranded RNA viruses including SARS-CoV-2 (Poulas et al., 2020). We previously showed that another RNA virus, hepatitis C virus (HCV), is able to inhibit CD4 T cell function via Toll-like receptor 7 (TLR7) (Mele et al., 2017). Recently, van der Made et al., 2020 have reported two independent families in which COVID-19 segregates like an X-linked recessive monogenic disorder conditioned by SARS-CoV-2 as an environmental factor.
Here, we performed a nested case-control study within our prospectively recruited GEN-COVID cohort with the aim to determine whether the two families described by van der Made et al. represent an ultra-rare situation or the tip of the iceberg of a larger subset of young male patients.
Materials and methods
Patients and samples
Request a detailed protocolA subset of 156 young (<60 years) male COVID-19 patients was selected from the Italian GEN-COVID cohort of 1,178 SARS-CoV-2-infected participants (https://sites.google.com/dbm.unisi.it/gen-covid) (Daga et al., 2021). The study (GEN-COVID) was consistent with Institutional guidelines and approved by the University Hospital (Azienda Ospedaliero-Universitaria Senese) Ethical Review Board, Siena, Italy (Prot n. 16929, dated March 16, 2020). We performed a nested case-control study (STREGA reporting guideline was used to support reporting of this study). Cases were selected according to the following inclusion criteria: i. male gender; ii. young age (<60 years); iii endotracheal intubation or CPAP/biPAP ventilation (79 participants). As controls, 77 participants were selected using the sole criterion of being oligo-asymptomatic not requiring hospitalization. Cases and controls represented the extreme phenotypic presentations of the GEN-COVID cohort. Exclusion criteria for both cases and controls were: i. SARS-CoV-2 infection not confirmed by PCR; ii. non-white ethnicity. Materials and methods details are listed in the Online Repository. A similar cohort from the second wave, composed of 83 young male COVID-19 patients, was used to expand the cohort.
Statistical methods
Request a detailed protocolWe adopted the LASSO logistic regression, one of the most common Machine Learning algorithms for classification, that provides a feature selection method within the classification task able to enforce both the sparsity and the interpretability of the results (Tibshirani, 1996). In fact, the coefficients of the logistic regression model are directly related to the importance of the corresponding features, and LASSO regularization shrinks close to zero the coefficients of features that are not relevant in predicting the response, reducing overfitting and giving immediate interpretability of the model predictions in terms of few feature importance.
The principal components analysis (PCA) was applied prior to the LASSO logistic regression in order to remove samples that were clear outliers with respect to the first three principal components from the following analyses (deviating more than five standard deviations from the average).
A 10-fold cross-validation method was applied in order to test the performances. It provides the partition of the dataset into 10 batches, then nine batches are exploited for the training of the LASSO logistic regression and the remaining batch as a test, by repeating this procedure 10 times. The performance metrics are averaged on the 10 testing sets in order to avoid overfitting. The confusion matrix is built by summing up the predictions of the 10 testing folds. During the fitting procedure, the class unbalancing is tackled by penalizing the misclassification of the minority class with a multiplicative factor inversely proportional to the class frequencies.
In order to evaluate the significance of the association between TLR7 variants and COVID severity, the Fisher’s Exact Test was used.
For the quantitative PCR assay, the fold changes in mRNA expression level per gene were compared between the individual patients and controls using an unpaired t test on the log-transformed fold changes. p Values < 0.05 were considered statistically significant.
In vitro peripheral blood mononuclear cell (PBMC) experiments
Request a detailed protocolPeripheral blood mononuclear cells (PBMC) were isolated by Ficoll‐Hypaque (GE Healthcare Bio-Sciences AB) density gradient centrifugation as previously described (Mantovani et al., 2019). 5 × 105 PBMC from COVID-19 patients 6 months after recovery and six unaffected male and female controls were stimulated for 4 hr with the TLR7 agonist imiquimod at 5 μg/mL or cell culture medium. Total RNA extraction was performed with RNeasy Plus Mini kit and gDNA eliminator mini spin columns (QIAGEN, Hilden, Germany), following the manufacturer's instructions. First-strand cDNA was synthesized from total RNA using High-Capacity cDNA Reverse Transcription Kit following the manufacturer's instructions (Thermo Fisher Scientific, Waltham, Massachusetts, United States). The Advanced Universal SYBR Green Supermix (BioRad, Redmond, WA, United States) was used. All reactions were performed in triplicates using the CFX96 Real-Time machine detection system (BioRad, Redmond, WA, United States) and each sample was amplified in duplicate. The following primers were used:
TLR7 | Fw Primer | 5’-CATCAAGAGGCTGCAGATTAAA-3’ |
Rv Primer | 5’-GAAAAGATGTTGTTGGCCTCA-3’ | |
IFN-γ | Fw Primer | 5’-TGACCAGAGCATCCAAAAGA-3’ |
Rv Primer | 5’-CTCTTCGACCTCGAAACAGC-3’ | |
IRF7 | Fw Primer | 5’-CCATCTTCGACTTCAGAGTCTTC-3’ |
Rv Primer | 5’-TCTAGGTGCACTCGGCACAG-3’ | |
ISG15 | Fw Primer | 5’-GACAAATGCGACGAACCTCT-3’ |
Rv Primer | 5’-GAACAGGTCGTCCTGCACAC-3’ | |
IFN-a | Fw Primer | 5’-GACTCCATCTTGGCTGTGA-3’ |
Rv Primer | 5’-TGATTTCTGCTCTGACAACCT-3’ | |
HRPT1 | Fw Primer | 5’-TGACACTGGCAAAACAATGCA-3’ |
Rv Primer | 5’-GGTCCTTTTCACCAGCAAGCT-3’ |
A total of 2.5 × 105 PBMC from COVID-19 patients and healthy controls were maintained in RPMI-1640 supplemented with 10% of FCS, 1% antibiotic antimycotic solution, 1% L-glutamine and 1% Sodium Pyruvate (Sigma-Aldrich, St. Louis, MO, USA) and stimulated in vitro for 4 hr with Lipopolysaccharide (LPS) at 1 μg/ml or cell culture medium and the Protein Transport Inhibitor GolgiStop (BD Biosciences, San Diego, CA, USA). After washing, PBMC were stained for surface cell marker using mouse anti-CD14PerCP-Cy5.5 (BD Biosciences) and anti-CD3BV605 (BD Biosciences) monoclonal antibody (mAb). Cells were fixed with BD Cytofix/Cytoperm and permeabilized with the BD Perm/Wash buffer (BD Biosciences) according to the manufacturer's instructions, in the presence of anti-IL6BV421 (BD Biosciences) mAb. Ex-vivo TLR7 intracellular expression was evaluated in PBMC from patients and controls by flow cytometry. 2,5 × 105 PBMC were stained for surface markers using anti-CD19BV605, anti-CD14PerCP-Cy5.5 and anti-CD3BV421 (BD Biosciences) mAbs. Cells were fixed and permeabilized in the presence of anti-TLR7 Alexa Fluor 488 (R and D System, Minneapolis, MN, USA) mAb or isotype control as described above. After staining cells were washed, immediately fixed in CellFix solution (BD Biosciences) and analysed. Cell acquisition was performed on a 12-color FACSCelesta (BD Biosciences, San Diego, CA, USA) instrument. Data analysis was performed with the Kaluza 2.1 software (Beckman Coulter).
Protein stability prediction
Request a detailed protocolThe protein structure of Human Toll Like Receptor, UniProtKB ID Q9NYK1 [https://www.uniprot.org/uniprot/Q9NYK1], was obtained by homology modeling using Swiss Model tool (Waterhouse et al., 2018). The selected template protein with 97% of sequence identity was the Crystal structure of monkey TLR7 with PDB ID 5GMF [https://www.rcsb.org/structure/5GMF]. The two Val to Asp missense mutations were analysed by using different protein stability predictors like Polyphen-2 (Adzhubei et al., 2010), SIFT (Ng and Henikoff, 2003), and DynaMut (Rodrigues et al., 2018).
Transfection experiments of TLR7 variants
Request a detailed protocolPCR based site-directed mutagenesis was performed in pUNO-hTLR7 plasmid (Invivogen), kindly provided by Ugo D’Oro (GSK Vaccines, Siena, Italy) (Iavarone et al., 2011), to generate specific plasmids for each TLR7 variant, including those considered neutral (mutagenic primers available on request).
All point mutations except for p.Arg920Lys were confirmed by Sanger sequencing. HEK293 cells were maintained in DMEM supplemented with 10% FBS, 1% L-Glutamine and 1% penicillin/streptomycin at 37°C with 5% CO2. Transient transfections were performed using Lipofectamine 2000 (Invitrogen) according to manufacturer’s instructions: 3 × 105 cell/well were seeded the day before, and then transfected with 2 μg of DNA. After 24 hr, the cells were stimulated with Imiquimod at 1 μg/ml for 4 hr and then total RNA was extracted with RNeasy Mini Kit (QIAGEN, Hilden, Germany). For each sample, cDNA was synthesized from 1 μg of total RNA using QantiTect Reverse Transcription kit (QIAGEN, Hilden, Germany) according to manufacturer’s instructions. The expression of IFN-a in stimulated and unstimulated cells was evaluated by qRT-PCR using the same procedure as described for PBMCs.
Results and discussion
We applied LASSO logistic regression analysis, after correcting for Principal Components, to a synthetic boolean representation of the entire set of genes of the X chromosome on the extreme phenotypic ends of the male subset of the Italian GEN-COVID cohort (https://sites.google.com/dbm.unisi.it/gen-covid) (Daga et al., 2021). The GEN-COVID study was consistent with Institutional guidelines and approved by the University Hospital (Azienda Ospedaliero-Universitaria Senese) Ethical Review Board, Siena, Italy (Prot n. 16929, dated March 16, 2020). Only rare variants (≤1% in European Non-Finnish population) were considered in the boolean representation: the gene was set to one if it included at least a missense, splicing, or loss-of-function rare variant, and 0 otherwise. Fisher Exact test was then used for the specific data validation.
Toll-like receptor 7 (TLR7) was picked up as one of the most important susceptibility genes by LASSO Logistic Regression analysis (Figure 1). We then queried the COVID-19 section of the Network of Italian Genome (NIG) database (http://www.nig.cineca.it/, specifically, http://nigdb.cineca.it) that houses the entire GEN-COVID cohort represented by more than 1000 WES data of COVID-19 patients and SARS-CoV-2 infected asymptomatic participants (Bastard et al., 2020). By selecting for young (<60 year-old) males, we obtained rare (MAF ≤ 1%) TLR7 missense variants predicted to impact on protein function (CADD > 12.28) in 5 out of 79 male patients (6.3%) with life-threatening COVID-19 (hospitalized intubated and hospitalized CPAP/BiPAP) and in none of the 77 SARS-CoV2 infected oligo-asymptomatic male participants.
We then investigated a similar cohort coming from the Italian second wave composed of male patients under 60 years of age without comorbidities (56 cases and 27 controls) was used to expand the cohort. All participants were white European. We found a TLR7 variant in one of 56 cases (1.7%) and in none of 27 controls. Overall, the association between the presence of TLR7 rare variants and severe COVID-19 was significant (p=0.037 by Fisher Exact test, Table 1).
We then investigated the presence of TLR7 rare variants in the entire male cohort of 561 COVID-19 patients (261 cases and 300 controls) regardless of age. We found TLR7 rare missense variants in three additional patients over 60 years of age, including two cases (who shared the p.Ala1032Thr variant) and one control (C1), bearing the p.Val222Asp variant, predicted to have a low impact on protein function (CADD of 5.36) (Table 2).
In order to functionally link the presence of the identified TLR7 missense variants and the effect on the downstream type I IFN-signaling, we performed a gene expression profile analysis in peripheral blood mononuclear cells (PBMCs) isolated from patients following recovery, after stimulation with the TLR7 agonist imiquimod, as reported by van der Made et al., 2020. To explore all TLR7 variants identified, we examined PBMCs from the control and all cases except P4 and P6 because them were not available. However, P4 and P5 shared the same variant. This analysis showed a statistically significant decrease of all TLR7-related genes for two variants (Ser301Pro and Ala1032Thr) identified in cases P3, P7, and P8 compared with healthy controls (Ctl) demonstrating a complete impairment of TLR7 signaling pathways in response to TLR7 stimulation (Figure 2, panel A and Table 2). The variant Val219Ile (P1) showed a hypomorphic effect determining a statistically significant decrease in mRNA levels only for IRF7 (directly activated by TLR7) and IFN-γ (Figure 2, panel A). Two Ala to Val variants identified in severely affected patients, Ala288Val and Ala448Val, were functionally neutral, that is not predicted to impair the TLR7 signaling pathways. This was confirmed by biochemical and structural analysis on the crystal structure of TLR7 protein (https://www.uniprot.org/uniprot/Q9NYK1). The prediction performed with different computational approaches showed both variants as benign with no effects on structural stabilization. Interestingly, the p.Val222Asp variant (C1) proved to be functionally neutral, in keeping with it being identified in the control and not in cases (Figure 2, panel A).
TLR7 expression was evaluated in monocytes and B cells from patients and healthy controls by flow cytometry. Patients and controls expressed the TLR7 protein at the intracellular level. The functional capacity of PBMCs was evaluated after stimulation with the TLR4 agonist lipopolysaccharide (LPS). Of note, LPS-induced production of IL6 by monocytes was similar in patients and controls (data not shown).
In order to validate the functional effect of TLR7 variants, we have performed transfection experiments in HEK293 cells, cloning a dedicated TLR7 plasmid for each of them. Transfection experiments were performed in HEK293 cells that do not express endogenous TLR7 (Chehadeh and Alkhabbaz, 2013) and expression of TLR7 protein was examined by flow cytometry 24 hr after transfection, showing expression of TLR7 protein at the intracellular level in all cases (Figure 2, panel B). We then evaluated the expression of IFN-a in imiquimod stimulated and unstimulated cells by qRT-PCR employing the same assay described for PBMCs, confirming the results obtained in PBMCs for the screened variants (Figure 2, panel C).
Segregation analysis was available for two cases, P3 and P8 (Figure 3). In the two pedigrees, the disease nicely segregated as an X-linked disorder conditioned by environmental factors, that is SARS-CoV-2 (Figure 3, panel B). This was also supported by functional analysis on all TLR7-related genes (Figure 3, panel A). For example, expression profile analysis for IRF7 gene in male mutated patient P8 confirmed a statistically significant reduction compared to the wild-type brother (Figure 3, panel A). Of note, only the infected mutated male had severe COVID-19, whereas the infected not mutated brother (II-2 of P8) was asymptomatic (Figure 3, panel C).
Our results showed that the two families reported by van der Made et al., 2020. with loss-of-function variants in males with severe COVID-19 with a mean age of 26 years represent a subset of COVID-19 male patients. Specifically, missense deleterious variants in the X-linked recessive TLR7 gene may represent the cause of disease susceptibility to COVID-19 in up to 2% of severely affected young male cases (3/135, 2.2%). The same result was obtained for the entire male cohort, irrespective of age, with TLR7 deleterious variants in 5/261 cases (1.9%). Since not all identified variants were functionally effective, the true percentage could be slightly lower in young males. Overall, males with rare missense variants shown here developed COVID-19 at a mean age of 56.5 years, considerably later than 26 years, in agreement with a predicted smaller impact on the protein than the loss of function deleterious variants reported by van der Made et al., 2020. Similarly, the identified rare missense TLR7 variants impaired the mRNA expression of TLR7 as well as the downstream pathway. The observation reported here may lead to consider TLR7 screening in severely affected male patients in order to start personalized interferon treatment for those with this specific genetic disorder.
Data availability
Sequencing data have been deposited in CINECA through http://www.nig.cineca.it/, specifically, http://nigdb.cineca.it., in the COVID-19 section through http://nigdb.cineca.it./registration/login.php. There are no restrictions on data access. Only registration is needed.
References
-
Employing a systematic approach to biobanking and analyzing clinical and genetic data for advancing COVID-19 researchEuropean Journal of Human Genetics 1:1–15.https://doi.org/10.1038/s41431-020-00793-7
-
Hospitalization rates and characteristics of patients hospitalized with Laboratory-Confirmed coronavirus disease 2019 - COVID-NET, 14 states, march 1-30, 2020MMWR. Morbidity and Mortality Weekly Report 69:458–464.https://doi.org/10.15585/mmwr.mm6915e3
-
Age, frailty, and comorbidity as prognostic factors for Short-Term outcomes in patients with coronavirus disease 2019 in geriatric careJournal of the American Medical Directors Association 21:1555–1559.https://doi.org/10.1016/j.jamda.2020.08.014
-
A point mutation in the amino terminus of TLR7 abolishes signaling without affecting ligand bindingThe Journal of Immunology 186:4213–4222.https://doi.org/10.4049/jimmunol.1003585
-
SIFT: predicting amino acid changes that affect protein functionNucleic Acids Research 31:3812–3814.https://doi.org/10.1093/nar/gkg509
-
DynaMut: predicting the impact of mutations on protein conformation, flexibility and stabilityNucleic Acids Research 46:W350–W355.https://doi.org/10.1093/nar/gky300
-
Coronavirus disease 2019 case surveillance - United states, January 22-May 30, 2020MMWR. Morbidity and Mortality Weekly Report 69:759–765.https://doi.org/10.15585/mmwr.mm6924e2
-
Regression shrinkage and selection via the lassoJournal of the Royal Statistical Society: Series B 58:267–288.https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
-
SWISS-MODEL: homology modelling of protein structures and complexesNucleic Acids Research 46:W296–W303.https://doi.org/10.1093/nar/gky427
Article and author information
Author details
Funding
Private Donors for Host Genetics Research Project (D.L. n 18 of March 17)
- Alessandra Renieri
Intesa San Paolo for 2020 charity fund (N.B.2020/0119)
- Alessandra Renieri
Ministero dell’Istruzione, dell’Università e della Ricerca (Dipartimenti di Eccellenza 2018-2020)
- Alessandra Renieri
Regione Toscana (Bando Ricerca COVID-19 Toscana)
- Alessandra Renieri
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This study is part of the GEN-COVID Multicenter Study, https://sites.google.com/dbm.unisi.it/gen-covid, the Italian multicenter study aimed at identifying the COVID-19 host genetic bases. Specimens were provided by the COVID-19 Biobank of Siena, which is part of the Genetic Biobank of Siena, member of BBMRI-IT, of Telethon Network of Genetic Biobanks (project no. GTB18001), of EuroBioBank, and of RD-Connect. We thank the CINECA consortium for providing computational resources and the Network for Italian Genomes (NIG) http://www.nig.cineca.it for its support. We thank private donors for the support provided to AR (Department of Medical Biotechnologies, University of Siena) for the COVID-19 host genetics research project (D.L n.18 of March 17, 2020). We also thank the COVID-19 Host Genetics Initiative (https://www.covid19hg.org/), MIUR project ‘Dipartimenti di Eccellenza 2018–2020’ to the Department of Medical Biotechnologies University of Siena, Italy, and ‘Bando Ricerca COVID-19 Toscana’ project to Azienda Ospedaliero-Universitaria Senese. We also thank Intesa San Paolo for the 2020 charity fund dedicated to the project N B/2020/0119 ‘Identificazione delle basi genetiche determinanti la variabilità clinica della risposta a COVID-19 nella popolazione italiana’.
Ethics
Clinical trial registration NCT04549831.
Human subjects: The GEN-COVID study was consistent with Institutional guidelines and approved by the University Hospital (Azienda Ospedaliero-Universitaria Senese) Ethical Review Board, Siena, Italy (Prot n. 16929, dated March 16, 2020).
Copyright
© 2021, Fallerini et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 5,757
- views
-
- 741
- downloads
-
- 149
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Epidemiology and Global Health
- Medicine
- Microbiology and Infectious Disease
eLife has published the following articles on SARS-CoV-2 and COVID-19.
-
- Genetics and Genomics
An unprecedented amount of SARS-CoV-2 data has been accumulated compared with previous infectious diseases, enabling insights into its evolutionary process and more thorough analyses. This study investigates SARS-CoV-2 features as it evolved to evaluate its infectivity. We examined viral sequences and identified the polarity of amino acids in the receptor binding motif (RBM) region. We detected an increased frequency of amino acid substitutions to lysine (K) and arginine (R) in variants of concern (VOCs). As the virus evolved to Omicron, commonly occurring mutations became fixed components of the new viral sequence. Furthermore, at specific positions of VOCs, only one type of amino acid substitution and a notable absence of mutations at D467 were detected. We found that the binding affinity of SARS-CoV-2 lineages to the ACE2 receptor was impacted by amino acid substitutions. Based on our discoveries, we developed APESS, an evaluation model evaluating infectivity from biochemical and mutational properties. In silico evaluation using real-world sequences and in vitro viral entry assays validated the accuracy of APESS and our discoveries. Using Machine Learning, we predicted mutations that had the potential to become more prominent. We created AIVE, a web-based system, accessible at https://ai-ve.org to provide infectivity measurements of mutations entered by users. Ultimately, we established a clear link between specific viral properties and increased infectivity, enhancing our understanding of SARS-CoV-2 and enabling more accurate predictions of the virus.