Abstract
Living with COVID-19 requires continued vigilance against the spread and emergence of variants of concern (VOCs). Rapid and accurate saliva diagnostic testing, alongside basic public health responses, is a viable option contributing to effective transmission control. Nevertheless, our knowledge regarding the dynamics of SARS-CoV-2 infection in saliva is not as advanced as our understanding of the respiratory tract. Here we analyzed longitudinal viral load data of SARS-CoV-2 in saliva samples from 144 patients with mild COVID-19 (a combination of our collected data and published data). Using a mathematical model, we successfully stratified infection dynamics into three distinct groups with clear patterns of viral shedding: viral shedding durations in the three groups were 11.5 days (95% CI: 10.6 to 12.4), 17.4 days (16.6 to 18.2), and 30.0 days (28.1 to 31.8), respectively. Surprisingly, this stratified grouping remained unexplained despite our analysis of 47 types of clinical data, including basic demographic information, clinical symptoms, results of blood tests, and vital signs. Additionally, we quantified the expression levels of 92 micro-RNAs in a subset of saliva samples, but these also failed to explain the observed stratification, although the mir-1846 level may have been weakly correlated with peak viral load. Our study provides insights into SARS-CoV-2 infection dynamics in saliva, highlighting the challenges in predicting the duration of viral shedding without indicators that directly reflect an individual’s immune response, such as antibody induction. Given the significant individual heterogeneity in the kinetics of saliva viral shedding, identifying biomarker(s) for viral shedding patterns will be crucial for improving public health interventions in the era of living with COVID-19.
Main Text
Coronavirus disease 2019 (COVID-19) vaccinations, which are effective in preventing infection by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and severe COVID-19 illness, have enabled the gradual and safe removal of COVID-19 restrictions on everyday life over the past year. However, we still face the emergence of variants of concerns (VOCs), which is a major worry in the era of “living with COVID-19”. As a result, to prevent major outbreaks, basic public health responses such as testing, isolation, and quarantine are demanded and important. In particular, rapid and accurate diagnostic tests are essential for controlling ongoing transmission. Salivary diagnostic testing is a convenient tool for early and efficient diagnosis of COVID-19 because it is easy for health care professionals and patients to administer1,2.
The oral cavity is an important target for SARS-CoV-23, and viral particles in the lower and upper respiratory tract can reach the oral cavity through liquid droplets1. Saliva droplets are thus a potential route of SARS-CoV-2 transmission1,3. Although SARS-CoV-2 infection dynamics within the respiratory tract are well characterized thanks to data from oropharyngeal and nasopharyngeal swabs4–9, infection dynamics in saliva are poorly understood. Recent reports3,5 have suggested that saliva and other tissues are highly compartmentalized and that SARS-CoV-2 infection dynamics are qualitatively different in different tissues. However, although individuallevel heterogeneity in virus dynamics (especially individual infectiousness) has been evaluated, it remains unknown how individual-level viral shedding patterns in saliva are stratified, and which factor(s) determine the patterns5,10. Thus, to shape a country’s early response to future VOCs, such as through isolation and screening guidelines based on salivary diagnostic testing, it is critical to address these points with the use of a high quality and quantity of saliva specimens annotated with basic clinical patient data.
So far, our ability to understand and characterize whole SARS-CoV-2 infection dynamics has been hindered by several limitations of the clinical data that made it impossible to capture either the early or the late phase of infection or to annotate individual-level clinical information. To overcome these limitations, here we quantified and stratified longitudinal virus dynamics in saliva 3 samples from 144 mildly symptomatic participants of two different but complementary cohorts5,11,12. We successfully identified three groups with significantly different durations of viral shedding (i.e., the mean durations were 11.5 days, 17.4 days, and 30.0 days, respectively), implying a large inter-participant heterogeneity in virus infection dynamics. However, when we analyzed a total of 47 variables, including basic demographic information, daily clinical symptoms, results of blood tests, and vital signs, none explained the stratified grouping.
We also retrospectively explored whether salivary micro-RNAs were associated with the stratification by using stored residual saliva specimens. Micro-RNAs are non-coding RNAs that regulate numerous cellular processes by modulating protein levels through direct binding to mRNA (coding-RNA), thereby influencing translation efficiency or mRNA abundance. Various microRNAs are recognized for their impact on the viral replication ability and immune response in viruses like EBV, HCV, and HIV etc13. Many studies are underway to investigate micro-RNAs as potential targets for virus diagnosis and treatment. However, the relationship between micro-RNAs and the patterns of virus shedding of the SARS-CoV-2 virus in body remains unknown. We quantified the expression levels of 92 micro-RNAs and found that no micro-RNA significantly explained the stratified groups, although the mir-1846 level may have been weakly correlated with the peak viral load. Our findings provide important insights into the complexities of viral shedding patterns in saliva and suggest that predicting the heterogeneity of viral dynamics using basic clinical and micro-RNA data may be a challenging task. These insights are critical for developing accurate diagnostic tools, effective treatments, and prevention strategies for COVID-19.
Results
Description of cohort and study design
We used longitudinal saliva viral load data obtained from cohorts of the nelfinavir (NFV) clinical trial (jRCT207120002311,12) and the University of Illinois at Urbana-Champaign5. All infections were either mild or asymptomatic. All participants in these cohorts confirmed that they had never been previously infected with SARS-CoV-2, and none were vaccinated against SARS-CoV-2 at the time of enrolment. Of 182 participants from these two studies, 144 symptomatic participants, excluding 22 asymptomatic participants and 16 participants with incomplete observational data, were considered in our analysis (Fig 1A). In addition, we annotated the saliva viral load results of 90 participants from the NFV clinical trial with their sex, age, daily symptoms, blood test results, and vital signs, as summarized in Fig 1B, Table 1, and Supplementary Table 1. The data from the two studies complemented one another: the longitudinal data from the University of Illinois contained data on the early phase of infection (before symptom onset), while the data from the NFV clinical trial contained data on the late phase of viral RNA load (more than 14 days after symptom onset) (Fig 1C). The time-series viral load in the saliva samples for all individual used in the analysis are plotted in Fig 1D. We also analyzed 60 stored saliva specimens from 30 participants of the NFV clinical trial for micro-RNA analysis.
Quantifying and stratifying SARS-CoV-2 infection dynamics in saliva
We employed a previously developed mathematical model describing SARS-CoV-2 infection dynamics [i.e., Eqs.(1-2)] to evaluate interparticipant heterogeneity (details are provided in Methods), and reconstructed the best-fit virus dynamics in saliva of 144 symptomatic participants (Extended Data Fig 1 and Supplementary Table 2). We also applied the more realistic mathematical model used in Ke et al.5 [i.e., Eqs.(3-6)], but the fitting by these two mathematical models reconstructed our dataset equally (Supplementary Fig 1 and Supplementary Table 3). This is because the longitudinal data from the NFV clinical trial included the late phase of the viral RNA load, which was not included in the data from the University of Illinois; therefore, Eqs.(1-2) are sufficient for the purpose of reconstruction of viral load. To avoid complexities and assumptions on parameters of Eqs.(3-6) as described in Ke et al.5, we used our simple model of Eqs.(1-2) for further analysis.
Next, to stratify the time-course pattern of viral shedding, we first applied unsupervised random forest clustering to the individual “reconstructed” virus dynamics of 144 participants (e.g., Extended Data Fig 1). However, this analysis failed to divide the time-course pattern into different clusters (data not shown). To overcome this problem, we quantified the peak, duration, up-slope (i.e., growth rate), and down-slope (i.e., decay rate) of the reconstructed dynamics as “features” of the virus dynamics (Supplementary Table 4). Interestingly, the unsupervised random forest clustering based on these features identified 3 groups (i.e., G1: N=46, G2: N=61, and G3: N=37) in which the time-course patterns were clearly discriminated. This finding suggested that there is a heterogeneity of virus infection dynamics in saliva (see Methods). Fig 2A indicates a twodimensional Uniform Manifold Approximation and Projection (UMAP) embedding of the three stratified groups. Using a different color for each group (gray for G1, magenta for G2, and blue for G3), we also plotted the estimated individual viral load (Fig 2B), and the highlighted timecourse pattern of each group by the Partial Least-Squares Discriminant Analysis (PLS-DA) (Fig 2C).
The distributions of the four features are described in Fig 2D; a statistically significant between-group difference was found in the duration of viral shedding. The mean durations were 11.5 days (95% CI: 10.6 to 12.4), 17.4 days (16.6 to 18.2), and 30.0 days (28.1 to 31.8) for G1, G2, and G3, respectively. In our previous report10, we consistently confirmed that there were at least 3 groups showing different durations of viral shedding in upper respiratory specimens.
Because of previous work concluding that there are no significant differences in virus infection dynamics between NFV-treated and untreated participants12, we analyzed all data together regardless of treatment (see Methods). To further confirm whether NFV affects the stratification of the time-course pattern of viral shedding, we compared the number of individuals belonging to each group (i.e., G1, G2 and G3) between NFV-treated and untreated participants (including the members of the University of Illinois cohort) and found no trend for the stratification (p=0.784 by the Fisher’s exact test: Fig 2E).
Another possibility that explains the different viral duration observed here may be a difference in VOC genotypes. To test this, we used data from 55 participants in our NFV clinical trial who had been characterized according to which VOCs [i.e., B.1.1.7 (Alpha), B.1.672.2 or AY.29 (Delta), and other variants] they had been infected with (see Methods), in addition to data from 54 participants of the University of Illinois cohort. However, we observed no trend in the number of individuals belonging to each group among the VOCs (p=0.728 by the Fisher’s exact test: Fig 2F), which is consistent with the conclusion in Ke et al.5.
Basic clinical data may not explain heterogeneity in individual viral shedding
Using data from the NFV clinical trial, we annotated the saliva viral loads of 90 participants with basic demographic information, daily symptoms, blood test results, and vital signs (Fig 1A, Table 1 and Supplementary Table 1). We also annotated the saliva viral loads of 52 participants from the University of Illinois with daily symptoms (Supplementary Table 1).
To identify factors that were significantly correlated with the viral shedding patterns in saliva specimens obtained from the NFV clinical trial, we first examined the 39 variables summarized in Table 1. Each factor was compared between the three groups by ANOVA and the p-values were corrected by the False Discovery Rate (FDR). However, we found no clinical data that differed significantly (i.e., corrected p-value of ANOVA of less than 0.05) among the stratified groups (Fig 3A). To avoid overfitting by bootstrap aggregating (bagging), we also trained a random forest classifier (see Methods), a tree-based machine learning algorithm suitable for tabular data14, to predict the group from the clinical data of 90 individuals in the NFV clinical trial cohort and obtained ROC-AUCs of 60%, 49%, and 36% for predicting G1, G2, and G3, respectively (Fig 3B). We were not able to achieve a high ROC-AUC for predicting the shedding patterns based on the basic clinical data. We also attempted to make prediction based on clinical data which exhibited relatively low p-values in the ANOVA analysis. However, we were unable to achieve a high prediction accuracy with this approach (data not shown).
Next, we asked whether the stratification of the study population is associated with clinical symptoms of COVID-19 that could be caused by active replication of SARS-CoV-2. In general, the clinical symptoms of COVID-19 are cough, fever, shortness of breath, muscle pain, sore throat, confusion, chest pain, headache, rhinorrhea, diarrhea, and nausea and vomiting. In our study, individual-level symptom data were available as 8 categories in the cohorts from both the NFV clinical trial and the University of Illinois (Supplementary Table 1). We tried to use a random forest classifier to investigate whether symptom data could predict the stratified groups and obtained ROC-AUCs of 57%, 54%, and 48% for predicting G1, G2, and G3, respectively (Fig 3C). In fact, SARS-CoV-2 human challenge clearly showed no quantitative correlation between the individuals’ time-series pattern of viral load and symptoms15.
Additionally, we investigated the relationship between each feature of viral dynamics (i.e., duration of viral shedding, peak viral load, up-slope, and down-slope) and the clinical data by using the Pearson’s correlation coefficient (Extended Data Table 1). Overall, correlation coefficients were low (0.06 on average) with high p-values, which suggests that no feature was likely to be explained by these clinical data.
Relationship between salivary microRNAs and viral shedding patterns in COVID-19 patients
Various proteins in saliva have antiviral effects. It is also expected that some micro-RNAs in saliva may impair SARS-CoV-2 replication1 and could thus act as a biomarker to predict COVID-19 disease progression16. We here used the stored residual saliva specimens from the NFV clinical trial to identify micro-RNA(s) associated with the stratified groups (i.e., G1, G2 and G3). We note that, because all residual saliva specimens are annotated with the individual participant and we know which participants belong to which stratified group, we can select and compare saliva specimens from G1, G2, and G3 without bias. This implies that we can impartially select participants in equal numbers from each group, unaffected by other factors. Specifically, we collected 60 stored saliva specimens from the NFV clinical trial to perform micro-RNA analysis for 30 participants. We picked two samples for each participant to evaluate the role of micro-RNAs during both the peak and the late phase (i.e., 30 samples for each phase): the nearest sample from the estimated peak and the most distant sample above the detection limit in the late phase (Fig 4A). We normalized micro-RNA expression among the samples by using the DESeq2 Bioconductor package. We summarize the information on the micro-RNAs we obtained from saliva specimens in Supplementary Table 5.
Similar to the analysis using clinical data, we compared the expression levels of 92 micro-RNAs between the three stratified groups. Because the micro-RNA data were non-parametric, we used the Kruskal-Wallis ANOVA for analysis and corrected the p-values by FDR. However, we failed to find micro-RNA levels that differentiated the stratified groups (i.e., with a corrected p-value of Kruskal-Wallis ANOVA of less than 0.05) in the three trials using the data from the peak phase, the late phase, and both phases (e.g., Fig 3B for the total 60 samples). We also trained a random forest classifier to predict each group from the micro-RNA levels for the 60 total samples, and obtained ROC-AUCs of 48%, 57%, and 42%, respectively. Again, we did not obtain enough ROC-AUCs to predict stratified groups by using the collected micro-RNA data.
Furthermore, we investigated the relationship between the 4 features of viral dynamics and micro-RNA levels. Here we used the Spearman’s correlation coefficient (Extended Data Table 2). Overall, we did not find strong correlations between micro-RNA levels and features (Spearman’s correlation coefficients on average of 0.002, 0.024, −0.001, and −0.001 for duration of viral shedding, peak viral load, up-slope, and down-slope, respectively, for the 60 total samples). Only the mir-1846 level exhibited a weak negative correlation (−0.53 Spearman’s correlation coefficient with 0.01 p-value) with the peak viral load (Extended Data Figure 2). We confirmed similar trends even when we analyzed the micro-RNA level for the peak and late phases separately.
Discussion
Being able to quickly and efficiently diagnose COVID-19 is essential in monitoring the pandemic. Because the sampling process for saliva is noninvasive, and because it is inexpensive and minimizes the risk for transmissions to health care workers1, saliva sampling has excellent potential and advantages over other sampling methods from biological specimens such as the lower and upper respiratory tract2,17. Given the significant individual heterogeneity in the saliva viral shedding5,18, identifying biomarker(s) for viral shedding patterns will be crucial for improving public health interventions in the era of living with COVID-19. To improve our understanding of SARS-CoV-2 infection dynamics in saliva to enable application of saliva testing in the fight against COVID-19, we quantified and stratified longitudinal virus dynamics in saliva samples from 144 mildly symptomatic individuals from the cohorts of the NFV clinical trial11 and the University of Illinois at Urbana-Champaign5. In addition to the large heterogeneity in virus infection dynamics, we identified three groups (i.e., G1, G2 and G3) with different viral shedding patterns (Fig 2D).
Immunocompromised patients have been reported to have a prolonged duration of viral shedding, lasting over three months, underscoring the critical role of host immune responses in controlling viral infections19–22. Although oral immune responses remain poorly understood, Huang et al. recently confirmed by using single-cell RNA sequencing of the human minor salivary glands and gingiva that SARS-CoV-2 infection can trigger sustained, localized immune responses in saliva3. In this study, we observed significant differences in the down-slopes of viral shedding in saliva among participants in different groups, with a more rapid decline in G1. This decline is likely attributed to a stronger immune response to SARS-CoV-2 in G1 participants than in participants in G2 and G3, as reflected in the death rate of infected cells due to the immune response (Fig. 2D). Lower levels of viral replication have also been observed among infected participants with high baseline levels of mucosal IgA (but not IgG), as reported elsewhere23. Recently, we demonstrated that rapid anti-spike secretory IgA antibody responses can contribute to reducing viral shedding durations and amounts in nasopharyngeal mucosa24. These findings highlight the importance of biomarkers that directly reflect an individual’s immune response, such as antibody induction, in predicting viral shedding patterns. Therefore, quantifying the time-series pattern of mucosal IgA and its correlation with saliva viral load may provide crucial insights into the stratification of SARS-CoV-2 infection dynamics.
For the purpose of predicting viral shedding patterns during the early stage of infection, we first explored the association of 39 basic clinical variables, 8 daily symptoms, and the levels of 92 micro-RNAs with the stratified groups. However, none of the factors were significant (Table 1, Fig 3A, Fig 4B, Supplementary Table 1, Supplementary Table 2 and Supplementary Fig 2). In contrast, we showed that mir-1846, which is an exogenous micro-RNA that is specifically classified as an Oryza sativa micro-RNA (osa-microRNA)25, may exhibit a weak negative correlation. Exogenous micro-RNAs enter the human body primarily through food and can affect human metabolism by interacting and binding with human genes. mir-1846 is reported to interact with two human genes25 that are known to be associated with the progression of melanoma, various cancers, and leukemia. This suggests that mir-1846 levels may be linked to human immunity. Few studies have investigated the role of mir-1846 in humans, but our findings suggest the need for further investigations into the impact of this micro-RNA level on human immunity. Our research sheds light on the intricate patterns of viral shedding in saliva.
Our approach has several limitations that must be considered in our next study: First, our analysis was limited to participants with symptomatic infection and excluded those with asymptomatic infection (22 asymptomatic individuals out of a total of 182 individuals, i.e., 12% of participants) because we integrated datasets with different time scales from different cohorts. Although our data do not include participants infected with omicron variants, others have reported that the omicron variant may cause a higher proportion of asymptomatic infection26. Thus, evaluating the effect of asymptomatic infection will be important to update our stratification, especially for recent (or future emerging) VOCs. Second, micro-RNAs participate in the post-transcriptional regulation of gene expression; however, they do not provide direct insights into immune cell dynamics. Given the reported association between the duration of viral shedding and mucosal immunity as discussed above, it appears imperative to analyze modalities that are directly linked to the immune response in the future. Another potential limitation of this study is the timing of saliva specimen sampling, although we took great care to select and compare specimens from G1, G2, and G3 without bias. As a result of our clinical trial design (jRCT207120002311,12), participants were enrolled after the onset of symptoms, thereby restricting saliva specimen collection exclusively to the post-symptom phase. Unfortunately, we lack samples from the pre-infection, pre-symptomatic, and early infection phases. Consequently, the absence of individual-level baseline values for micro-RNA means that inter-participant heterogeneity in micro-RNA levels may obscure signals related to distinct viral infection dynamics in saliva.
In conclusion, our study revealed that the dynamics of SARS-CoV-2 infection in saliva can be classified into three groups based mainly on the duration of viral shedding. However, accurately predicting the variability in viral dynamics remains a challenging task, because it requires a more comprehensive understanding of the complex shedding patterns in saliva, as well as detailed clinical and molecular data. The identification of a sensitive, simple, and rapid biomarker for saliva viral shedding will be imperative for future COVID-19 outbreak control.
Methods
Ethics statement
The NFV clinical trial was approved by the institutional review board of Nagasaki University Hospital (approval number: I20-001) and is registered with the Japan Registry of Clinical Trials (jRCT2071200023). All participants provided written, informed consent for secondary use of clinical information and samples. The present study was approved by the ethics committee of Nagoya University (approval number: hc 22-01).
Saliva viral load data
Longitudinal saliva viral load data of participants with symptomatic and asymptomatic COVID-19 (122 cases) were obtained from the NFV clinical trial11. Briefly, the NFV clinical trial was a prospective, randomized, open-label, blinded-endpoint, parallel-group trial conducted between July 2020 and October 2021 at 11 university and teaching hospitals in Japan. This study consisted of a 14-day treatment period and a 14-day follow-up period, with no significant differences in the time to viral clearance between patients who received standard-of-care plus NFV administration and those who had the standard-of-care alone11,12. Therefore, the participants with COVID-19 were analyzed together here. In addition, we obtained similar saliva viral load data (60 cases) from the cohort of the University of Illinois at Urbana-Champaign5. This cohort contained all faculty, staff, and students at the University of Illinois at Urbana-Champaign, who undergo at least twice weekly quantitative PCR-RT testing during fall of 2020 and spring of 2021. Among those 182 cases, we focused only on symptomatic participants. Also, we excluded the participants who had less than three measured viral loads that were not limit detections (i.e., 90 cases from the NFV clinical trial and 54 cases from the University of Illinois were used in this study). This decision was based on the understanding that reasonable estimates cannot be derived when the number of data points less than the number of parameters in the mathematical model. The limit of detection for viral load data from the University of Illinois is 1.08 copies/ml. However, the limit of detection for viral load data from NFV clinical trial was unclear. Considering this, we assumed 1.08 copies/ml as the limit of detection for all viral load data.
Viral genome sequencing
The cDNA had been synthesized from RNA of SARS-CoV-2-positive saliva samples. Reverse transcription, multiplex PCR reaction, and Illumina library prep were conducted using a protocol published previously27. The pooled library was first purified by AMPureXP at 0.8 x concentration and then again at 1.2 x concentration. The purified library was sequenced for 151 cycles at both paired-ends in Illumina iSeq100. Sequence analysis was performed using the nf-core/viralcon pipeline (10.5281/zenodo.3901628).
Quantifying biomarkers in saliva
Total RNA from saliva was extracted with MagMAX CORE Nucleic Acid Purification kits (Applied Biosystems, Foster City, CA). Micro-RNAs were detected using Illumina Hiseq x Ten (Illumina, Inc, San Diego, CA) with data processing by ribodepletion (Genewiz-Azenta, South Plainfield, NJ). To remove technical sequences, the pass filter data in the fastq format were processed by Trimmomatic (v0.30) to be high-quality clean data. Following quality trimming, micro-RNAs were identified and checked using miRDeep228. Normalization of micro-RNA expression among samples and differential expression analysis were carried out using the DESeq2 Bioconductor package.
Basic clinical data
Basic clinical data including basic demographic characteristics of the study participants, symptoms, and findings of physical examinations and laboratory tests were obtained according to the study protocol11. We here used information on age, daily symptoms, results of blood tests, and vital signs of the symptomatic participants in the NFV clinical trial (summarized in Table 1 and Supplementary Table 1).
Mathematical modeling
To describe SARS-CoV-2 infection dynamics in saliva specimens, we here mainly used the following mathematical model developed in our recent studies4,6,10:
The variables f(t) and V(t) are the fraction of uninfected target cells and the total amount of virus, respectively, and the parameters β, γ, and δ are the rate constant for virus infection, the maximum rate constant for viral replication, and the death rate of infected cells, respectively.
In addition to comparing the simple model [i.e., a target-cell-limited model; Eqs.(1-2)], we also used the following “immune effector cell model” developed in Ke et al.5 for the saliva viral load (see Supplementary Fig 1):
The variables T(t), E(t), and i(t) are the total number uninfected target cells, cells in the eclipse phase of infection, and productively infected cells, respectively. The parameters 1/k, π, and c are the average duration of the eclipse phase, the virus production rate, and the clearance rate of viruses, respectively. The death rate of infected cells is assumed to be time-dependent to mimic the killing of infected cells by immune effector cells: δ(t) = δ1 for t < t1 and δ(t) = δ1 + δ2 for t1 < t. For a detailed explanation of the immune effector cell model, the reader is referred to Ke et al.5.
Parameter estimation
A nonlinear mixed-effects modelling approach incorporates fixed effects as well as random effects that describe the inter-participant variability in parameters. Including random effects amounts to a partial pooling of the data of all participants to improve estimates of the parameters applicable across the cases. In fact, in our study, the saliva specimens of cases from the cohort of the University of Illinois and the NFV clinical trial were obtained mainly during the “early” (i.e., from the date of infection to no more than 14 days post symptom onset) and “late” (i.e., no more than 28 days from around symptom onset) phases of SARS-CoV-2 infection, respectively. Therefore, cohorts from these two studies can reveal the whole time-course pattern of SARS-CoV-2 infection dynamics in a complementary manner through the random effects of a nonlinear mixed-effects modelling.
In our analyses, the variable V(t) in Eq.(2) corresponds to the viral load in saliva for SARS-CoV-2. To fit the patient’s viral load data, we used a program MONOLIX 2021R2 (www.lixoft.com), implement maximum likelihood estimation of parameters in nonlinear mixed effect model. The nonlinear mixed effect model allows a fixed effect as well as interpatient variability. This method estimates each parameter θi(= θ x eηi) for each individual where θ is a fixed effect, and ηi is a random effect, and which obeys a Gaussian distribution with mean 0 and standard deviation Ω. Here we used lognormal distributions as prior distributions of parameters to guarantee the positiveness (i.e., negative values do not biologically make sense). In parameter estimation, as time 0 in the original dataset represents the time of symptom onset, we also estimated the time from infection to symptom onset (corresponding to τ in Supplementary Table 2 and 3) along with other parameters. The fixed effect parameters and random effect parameters were estimated by use of the stochastic approximation Expectation/Maximization (SAEM) algorithm and empirical Bayes method, respectively. A right-truncated normal distribution was used in the likelihood function to account for the left censoring of the viral load data (i.e., when the viral load is not detectable)29. We changed the initial values multiple times to avoid a local minimum of AIC and confirmed the robustness of parameter estimation.
Unsupervised clustering and stratification of SARS-CoV-2 infection dynamics
Unsupervised random forest clustering was performed on the selected features of the virus infection dynamics, that is, the peak viral load, duration of viral shedding, up-slope, and down-slope (rfUtilities package in R). The use of random forest allows us to avoid overfitting by bootstrap aggregating (bagging) and to achieve better generalization performance30. After a random forest dissimilarity (i.e., the distance matrix between all pairs of samples) was obtained, it was visualized with Uniform Manifold Approximation and Projection (UMAP) in a twodimensional plane and was stratified with spectral clustering (Python scikit-learn). The optimal number of clusters was determined by the eigengap heuristic method.
Random forest classifiers for characterizing stratified groups
Random forest classifiers were trained to predict either of the three stratified groups (G1-G3) using randomForest packages in R. The receiver operating characteristic (ROC) curve for each classifier was drawn from out-of-bag (OOB) samples using the pROC package in R. For example, the ROC for G1 is for the random forest classifier predicting “G1” versus “G2 or G3”. We list the variables for the supervised random forest in Table 1, Supplementary Table 1, and Supplementary Table 2.
Statistical analysis
When necessary, the variables were compared among different groups using Fisher’s exact test (for categorical variables), analysis of variance (ANOVA, for numerical variables from clinical data with more than two groups), and Kruskal-Wallis ANOVA (for variables from micro-RNA data with more than two groups). Corrections of p-values for multiple testing were performed by the False Discovery Rate (FDR). Also, the variables were investigated for their relationship with features of viral load using the Pearson’s correlation coefficient (for variables from clinical data) and the Spearman’s correlation coefficient (for variables from micro-RNA data). All statistical analyses were performed using R (version 4.1.3).
Acknowledgements
This study was supported in part by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2022R1C1C2003637) (to K.S.K.); Scientific Research (KAKENHI) B 23H03497 (to S.I.); Grant-in-Aid for Transformative Research Areas 22H05215 (to S.I.); Grant-in-Aid for Challenging Research (Exploratory) 22K19829 (to S.I.); AMED CREST 19gm1310002 (to S.I.); AMED Research Program on Emerging and Re-emerging Infectious Diseases 22fk0108509 (to S.I.), 23fk0108684 (to S.I.), 23fk0108685 (to S.I.); AMED Research Program on HIV/AIDS 22fk0410052 (to S.I.); AMED Program for Basic and Clinical Research on Hepatitis 22fk0210094 (to S.I.); AMED Program on the Innovative Development and the Application of New Drugs for Hepatitis B 22fk0310504h0501 (to S.I.); AMED Strategic Research Program for Brain Sciences 22wm0425011s0302; AMED JP22dm0307009 (to K.A.); JST MIRAI JPMJMI22G1 (to S.I.); Moonshot R&D JPMJMS2021 (to K.A. and S.I.) and JPMJMS2025 (to S.I.); Institute of AI and Beyond at the University of Tokyo (to K.A.); Shin-Nihon of Advanced Medical Research (to S.I.); SECOM Science and Technology Foundation (to S.I.); The Japan Prize Foundation (to S.I.).
Competing financial interests
The authors declare no conflicts of interest associated with this manuscript.
Institutional review board statement
This study was approved by the ethics committees of Nagoya University (hc22-01).
Extended data figures
Supplementary Information
References
- 1.Oral saliva and COVID-19Oral Oncol 108https://doi.org/10.1016/j.oraloncology.2020.104821
- 2.Saliva or Nasopharyngeal Swab Specimens for Detection of SARS-CoV-2N Engl J Med 383:1283–1286https://doi.org/10.1056/NEJMc2016359
- 3.SARS-CoV-2 infection of the oral cavity and salivaNat Med 27:892–903https://doi.org/10.103
- 4.A quantitative model used to compare within-host SARS-CoV-2, MERS-CoV, and SARS-CoV dynamics provides insights into the pathogenesis and treatment of SARS-CoV-2PLoS Biol 19https://doi.org/10.1371/journal.pbio.3001128
- 5.Daily longitudinal sampling of SARS-CoV-2 infection reveals substantial heterogeneity in infectiousnessNat Microbiol 7:640–652https://doi.org/10.1038/s41564-022-01105-z
- 6.Designing isolation guidelines for COVID-19 patients with rapid antigen testsNat Commun 13https://doi.org/10.1038/s41467-022-32663-9
- 7.Potency and timing of antiviral therapy as determinants of duration of SARS-CoV-2 shedding and intensity of inflammatory responseSci Adv 6https://doi.org/10.1126/sciadv.abc7112
- 8.Quantifying the relationship between SARS-CoV-2 viral load and infectiousnessElife 10https://doi.org/10.7554/eLife.69302
- 9.Modeling SARS-CoV-2 viral kinetics and association with mortality in hospitalized patients from the French COVID cohortProc Natl Acad Sci U S A 118https://doi.org/10.1073/pnas.2017962118
- 10.Detection of significant antiviral drug effects on COVID-19 with reasonable sample sizes in randomized controlled trials: A modeling studyPLoS Med 18https://doi.org/10.1371/journal.pmed.1003660
- 11.Efficacy and safety of nelfinavir in asymptomatic and mild COVID-19 patients: a structured summary of a study protocol for a multicenter, randomized controlled trialTrials 22https://doi.org/10.1186/s13063-021-05282-w
- 12.A Multicenter Randomized Controlled Trial To Evaluate the Efficacy and Safety of Nelfinavir in Patients with Mild COVID-19Microbiol Spectr 11https://doi.org/10.1128/spectrum.04311-22
- 13.MicroRNA Regulation of RNA Virus Replication and PathogenesisTrends Mol Med 23:80–93https://doi.org/10.1016/j.molmed.2016.11.003
- 14.Why do tree-based models still outperform deep learning on tabular data?arXiv preprint arXiv:2207.08815
- 15.Safety, tolerability and viral kinetics during SARS-CoV-2 human challenge in young adultsNat Med 28:1031–1041https://doi.org/10.1038/s41591-022-01780-9
- 16.MiR-200c-3p expression may be associated with worsening of the clinical course of patients with COVID-19Mol Biol Res Commun 10:141–147https://doi.org/10.22099/mbrc.2021.40555.1631
- 17.Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: an observational cohort studyLancet Infect Dis 20:565–574https://doi.org/10.1016/s1473-3099(20)30196-1
- 18.Quantifying the impact of immune history and variant on SARS-CoV-2 viral kinetics and infection rebound: a retrospective cohort studymedRxiv https://doi.org/10.1101/2022.01.13.22269257
- 19.COVID-19 in an immunocompromised host: persistent shedding of viable SARS-CoV-2 and emergence of multiple mutations: a case reportInt J Infect Dis 114:178–182https://doi.org/10.1016/j.ijid.2021.10.045
- 20.Prolonged viral shedding of SARS-CoV-2 in two immunocompromised patients, a case reportBMC Infect Dis 21https://doi.org/10.1186/s12879-021-06429-5
- 21.Prolonged viral shedding of SARS-CoV-2 in an immunocompromised patientJ Infect Chemother 27:387–389https://doi.org/10.1016/j.jiac.2020.12.001
- 22.Prolonged shedding of SARS-CoV-2 in an elderly liver transplant patient infected by COVID-19: a case reportAnn Palliat Med 10:7003–7007https://doi.org/10.21037/apm-20-996
- 23.Anti-Spike Mucosal IgA Protection against SARS-CoV-2 Omicron InfectionN Engl J Med 387:1333–1336https://doi.org/10.1056/NEJMc2209651
- 24.Infectious virus shedding duration reflects secretory IgA antibody response latency after SARS-CoV-2 infectionProc Natl Acad Sci U S A 120https://doi.org/10.1073/pnas.2314808120
- 25.In silico prediction of human genes as potential targets for rice miRNAsComput Biol Chem 87https://doi.org/10.1016/j.compbiolchem.2020.107305
- 26.High Rate of Asymptomatic Carriage Associated with Variant Strain OmicronmedRxiv https://doi.org/10.1101/2021.12.20.21268130
- 27.nCoV-2019 sequencing protocol for illuminaprotocols.io https://doi.org/10.17504/protocols.io.bnn7mdhn
- 28.miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal cladesNucleic Acids Res 40:37–52https://doi.org/10.1093/nar/gkr688
- 29.Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics modelComputational Statistics & Data Analysis 51:1562–1574
- 30.The elements of statistical learning: data mining, inference, and predictionSpringer
- 1.Daily longitudinal sampling of SARS-CoV-2 infection reveals substantial heterogeneity in infectiousnessNat Microbiol 7:640–652https://doi.org/10.1038/s41564-022-01105-z
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Copyright
© 2024, Park et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 277
- downloads
- 14
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.