Using intrahost single nucleotide variant data to predict SARS-CoV-2 detection cycle threshold values

Lea Duesterwald, Marcus Nguyen, Paul Christensen, S. Wesley Long, Randall J. Olsen, James M. Musser, James J. Davis

Research output: Contribution to journalArticlepeer-review

Abstract

Over the last four years, each successive wave of the COVID-19 pandemic has been caused by variants with mutations that improve the transmissibility of the virus. Despite this, we still lack tools for predicting clinically important features of the virus. In this study, we show that it is possible to predict the PCR cycle threshold (Ct) values from clinical detection assays using sequence data. Ct values often correspond with patient viral load and the epidemiological trajectory of the pandemic. Using a collection of 36,335 high quality genomes, we built models from SARS-CoV-2 intrahost single nucleotide variant (iSNV) data, computing XGBoost models from the frequencies of A, T, G, C, insertions, and deletions at each position relative to the Wuhan-Hu-1 reference genome. Our best model had an R2 of 0.604 [0.593–0.616, 95% confidence interval] and a Root Mean Square Error (RMSE) of 5.247 [5.156–5.337], demonstrating modest predictive power. Overall, we show that the results are stable relative to an external holdout set of genomes selected from SRA and are robust to patient status and the detection instruments that were used. This study highlights the importance of developing modeling strategies that can be applied to publicly available genome sequence data for use in disease prevention and control.

Original languageEnglish (US)
Article numbere0312686
Pages (from-to)e0312686
JournalPLoS ONE
Volume19
Issue number10
DOIs
StatePublished - Oct 2024

Keywords

  • COVID-19/virology
  • Genome, Viral
  • SARS-CoV-2/genetics
  • Pandemics
  • Humans
  • Polymorphism, Single Nucleotide
  • Viral Load

ASJC Scopus subject areas

  • General

Fingerprint

Dive into the research topics of 'Using intrahost single nucleotide variant data to predict SARS-CoV-2 detection cycle threshold values'. Together they form a unique fingerprint.

Cite this