TY - JOUR
T1 - The phylogenetic relationship within SARS-CoV-2s
T2 - An expanding basal clade
AU - Shen, Steve
AU - Zhang, Zhao
AU - He, Funan
N1 - Funding Information:
We thank scientific communities all over the world for their selfless effort in this pandemic. Special gratitude to GISAID and NCBI for the SARS-CoV-2 data they provide. All authors consented the right for publication.
Publisher Copyright:
© 2020 Elsevier Inc.
PY - 2021/4
Y1 - 2021/4
N2 - The COVID-19 pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) whose origin is still shed in mystery. In this study, we developed a method to search the basal SARS-CoV-2 clade among collected SARS-CoV-2 genome sequences. We first identified the mutation sites in the SARS-CoV-2 whole genome sequence alignment. Then by the pairwise comparison of the numbers of mutation sites among all SARS-CoV-2s, the least mutated clade was identified, which is the basal clade under parsimony principle. In our first analysis, we used 168 SARS-CoV-2 sequences (GISAID dataset till 2020/03/04) to identify the basal clade which contains 33 identical viral sequences from seven countries. To our surprise, in our second analysis with 367 SARS-CoV-2 sequences (GISAID dataset till 2020/03/17), the basal clade has 51 viral sequences, 18 more sequences added. The much larger NCBI dataset shows that this clade has expanded with 85 unique sequences by 2020/04/04. The expanding basal clade tells a chilling fact that the least mutated SARS-CoV-2 sequence was replicating and spreading for at least four months. It is known that coronaviruses have the RNA proofreading capability to ensure their genome replication fidelity. Interestingly, we found that the SARS-CoV-2 without its nonstructural proteins 13 to 16 (Nsp13-Nsp16) exhibits an unusually high mutation rate. Our result suggests that SARS-CoV-2 has an unprecedented RNA proofreading capability which can intactly preserve its genome even after a long period of transmission. Our selection analyses also indicate that the positive selection event enabling SARS-CoV-2 to cross species and adapt to human hosts might have been achieved before its outbreak.
AB - The COVID-19 pandemic is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) whose origin is still shed in mystery. In this study, we developed a method to search the basal SARS-CoV-2 clade among collected SARS-CoV-2 genome sequences. We first identified the mutation sites in the SARS-CoV-2 whole genome sequence alignment. Then by the pairwise comparison of the numbers of mutation sites among all SARS-CoV-2s, the least mutated clade was identified, which is the basal clade under parsimony principle. In our first analysis, we used 168 SARS-CoV-2 sequences (GISAID dataset till 2020/03/04) to identify the basal clade which contains 33 identical viral sequences from seven countries. To our surprise, in our second analysis with 367 SARS-CoV-2 sequences (GISAID dataset till 2020/03/17), the basal clade has 51 viral sequences, 18 more sequences added. The much larger NCBI dataset shows that this clade has expanded with 85 unique sequences by 2020/04/04. The expanding basal clade tells a chilling fact that the least mutated SARS-CoV-2 sequence was replicating and spreading for at least four months. It is known that coronaviruses have the RNA proofreading capability to ensure their genome replication fidelity. Interestingly, we found that the SARS-CoV-2 without its nonstructural proteins 13 to 16 (Nsp13-Nsp16) exhibits an unusually high mutation rate. Our result suggests that SARS-CoV-2 has an unprecedented RNA proofreading capability which can intactly preserve its genome even after a long period of transmission. Our selection analyses also indicate that the positive selection event enabling SARS-CoV-2 to cross species and adapt to human hosts might have been achieved before its outbreak.
KW - Basal clade
KW - COVID-19
KW - Parsimony principle
KW - Phylogenetic relationship
KW - RNA proofreading capability
KW - SARS-CoV-2
UR - http://www.scopus.com/inward/record.url?scp=85099624576&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099624576&partnerID=8YFLogxK
U2 - 10.1016/j.ympev.2020.107017
DO - 10.1016/j.ympev.2020.107017
M3 - Article
C2 - 33242581
AN - SCOPUS:85099624576
SN - 1055-7903
VL - 157
JO - Molecular Phylogenetics and Evolution
JF - Molecular Phylogenetics and Evolution
M1 - 107017
ER -