Abstract
A novel coronavirus, SARS-CoV-2, has caused over 85 million cases and over 1.8 million deaths worldwide since it occurred twelve months ago in Wuhan, China. Here we conceptualized the time-series evolutionary and expansion dynamics of SARS-CoV-2 by taking a series of cross-sectional view of viral genomes from early outbreak in January in Wuhan to early phase of global ignition in early April, and finally to the subsequent global expansion by late December 2020. By scrutinizing cases from early outbreak, we found a viral genotype from the Seafood Market in Wuhan featured with two concurrent mutations has become the overwhelmingly dominant genotype (95.3%) of the pandemic. By analyzing 4,013 full-length SARS-CoV-2 genomes from different continents by early April, we were able to visualize the genomic diversity over a 14-week timespan since the outbreak in Wuhan. 2,954 unique nucleotide substitutions were identified with 31 of the 4,013 genomes remaining as ancestral type, and 952 (32.2%) mutations recurred in more than one genome. 11 major viral genotypes with unique geographic distributions were identified. As the pandemic has been unfolding for more than one year, we also used the same approach to analyze 261,323 full-length SARS-CoV-2 genomes from the world since the outbreak in Wuhan (i.e. including all the available viral genomes in the GISAID database as of 25 December 2020) in order to recapitulate our findings in a real-time fashion and to present a full catalogue of SARS-CoV-2 mutations. We demonstrated the viral genotypic dynamics from different geographic locations over one-year timespan reveal transmission routes and indicate subsequent expansion. This study, to our knowledge, is heretofore the largest and most comprehensive genomic study of SARS-CoV-2. It indicates the viral genotypes can be utilized as molecular barcodes in combination with epidemiologic data to monitor the spreading routes of the pandemic and evaluate the effectiveness of control measures. Moreover, the dynamics of viral mutational spectrum in the study may help the early identification of new strains in patients to reduce further spread of infection, and guide the development of molecular diagnosis and vaccines against COVID-19, and last but not the least help assess their accuracy and efficacy.