Genomic Epidemiology of SARS-CoV-2 From Mainland China With Newly Obtained Genomes From Henan Province

Even though the COVID-19 epidemic in China has been successfully put under control within a few months, it is still very important to infer the origin time and genetic diversity from the perspective of the whole genome sequence of its agent, SARS-CoV-2. Yet, the sequence of the entire virus genome from China in the current public database is very unevenly distributed with reference to time and place of collection. In particular, only one sequence was obtained in Henan province, adjacent to China's worst-case province, Hubei Province. Herein, we used high-throughput sequencing techniques to get 19 whole-genome sequences of SARS-CoV-2 from 18 severe patients admitted to the First Affiliated Hospital of Zhengzhou University, a provincial designated hospital for the treatment of severe COVID-19 cases in Henan province. The demographic, baseline, and clinical characteristics of these patients were described. To investigate the molecular epidemiology of SARS-CoV-2 of the current COVID-19 outbreak in China, 729 genome sequences (including 19 sequences from this study) sampled from Mainland China were analyzed with state-of-the-art comprehensive methods, including likelihood-mapping, split network, ML phylogenetic, and Bayesian time-scaled phylogenetic analyses. We estimated that the evolutionary rate and the time to the most recent common ancestor (TMRCA) of SARS-CoV-2 from Mainland China were 9.25 × 10−4 substitutions per site per year (95% BCI: 6.75 × 10−4 to 1.28 × 10−3) and October 1, 2019 (95% BCI: August 22, 2019 to November 6, 2019), respectively. Our results contribute to studying the molecular epidemiology and genetic diversity of SARS-CoV-2 over time in Mainland China.

Download Full-text

Evolution and Genetic Diversity of SARSCoV-2 in Africa Using Whole Genome Sequences

10.1101/2020.07.27.222901 ◽

2020 ◽

Author(s):

Babatunde Olarenwaju Motayo ◽

Olukunle Oluwapamilerin Oluwasemowo ◽

Paul Akiniyi Akinduti ◽

Babatunde Adebiyi Olusola ◽

Olumide T Aerege ◽

...

Keyword(s):

Genetic Diversity ◽

Amino Acid ◽

Spike Protein ◽

Recent Common Ancestor ◽

Whole Genome ◽

Genome Sequences ◽

Protein Amino Acid ◽

Protein Variant ◽

Health Crisis ◽

Most Recent Common Ancestor

ABSTRACTThe ongoing SARSCoV-2 pandemic was introduced into Africa on 14th February 2020 and has rapidly spread across the continent causing severe public health crisis and mortality. We investigated the genetic diversity and evolution of this virus during the early outbreak months using whole genome sequences. We performed; recombination analysis against closely related CoV, Bayesian time scaled phylogeny and investigated spike protein amino acid mutations. Results from our analysis showed recombination signals between the AfrSARSCoV-2 sequences and reference sequences within the N and S genes. The evolutionary rate of the AfrSARSCoV-2 was 4.133 × 10−4 high posterior density HPD (4.132 × 10−4 to 4.134 × 10−4) substitutions/site/year. The time to most recent common ancestor TMRCA of the African strains was December 7th 2019. The AfrSARCoV-2 sequences diversified into two lineages A and B with B being more diverse with multiple sub-lineages confirmed by both maximum clade credibility MCC tree and PANGOLIN software. There was a high prevalence of the D614-G spike protein amino acid mutation (82.61%) among the African strains. Our study has revealed a rapidly diversifying viral population with the G614 spike protein variant dominating, we advocate for up scaling NGS sequencing platforms across Africa to enhance surveillance and aid control effort of SARSCoV-2 in Africa.

Download Full-text

Phylogenetic and genetic characterization of Treponema pallidum strains from syphilis patients in Japan by whole-genome sequence analysis from global perspectives

Scientific Reports ◽

10.1038/s41598-021-82337-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Shingo Nishiki ◽

Kenichi Lee ◽

Mizue Kanai ◽

Shu-ichi Nakayama ◽

Makoto Ohnishi

Keyword(s):

Genetic Difference ◽

Phylogenetic Analyses ◽

Treponema Pallidum ◽

Whole Genome Sequence ◽

Epidemiological Surveillance ◽

Recent Common Ancestor ◽

Whole Genome ◽

Global Perspectives ◽

Most Recent Common Ancestor ◽

Sex With Men

AbstractJapan has had a substantial increase in syphilis cases since 2013. However, research on the genomic features of the Treponema pallidum subspecies pallidum (TPA) strains from these cases has been limited. Here, we elucidated the genetic variations and relationships between TPA strains in Japan (detected between 2014 and 2018) and other countries by whole-genome sequencing and phylogenetic analyses, including syphilis epidemiological surveillance data and information on patient sexual orientation. Seventeen of the 20 strains in Japan were SS14- and the remaining 3 were Nichols-lineage. Sixteen of the 17 SS14-lineage strains were classified into previously reported Sub-lineage 1B. Sub-lineage 1B strains in Japan have formed distinct sub-clusters of strains from heterosexuals and strains from men who have sex with men. These strains were closely related to reported TPA strains in China, forming an East-Asian cluster. However, those strains in these countries evolved independently after diverging from their most recent common ancestor and expanded their genetic diversity during the time of syphilis outbreak in each country. The genetic difference between the TPA strains in these countries was characterized by single-nucleotide-polymorphism analyses of their penicillin binding protein genes. Taken together, our results elucidated the detailed phylogenetic features and transmission networks of syphilis.

Download Full-text

Molecular epidemiological characteristics of echovirus 6 in mainland China: extensive circulation of genotype F from 2007 to 2018

Archives of Virology ◽

10.1007/s00705-020-04934-7 ◽

2021 ◽

Author(s):

Wenjun Cheng ◽

Tianjiao Ji ◽

Shuaifeng Zhou ◽

Yong Shi ◽

Lili Jiang ◽

...

Keyword(s):

Common Ancestor ◽

Mainland China ◽

Recent Common Ancestor ◽

Genetic Characteristics ◽

Most Recent Common Ancestor ◽

Significant Difference ◽

Echovirus 6 ◽

Epidemiological Characteristics ◽

Highest Posterior Density ◽

Genotype F

AbstractEchovirus 6 (E6) is associated with various clinical diseases and is frequently detected in environmental sewage. Despite its high prevalence in humans and the environment, little is known about its molecular phylogeography in mainland China. In this study, 114 of 21,539 (0.53%) clinical specimens from hand, foot, and mouth disease (HFMD) cases collected between 2007 and 2018 were positive for E6. The complete VP1 sequences of 87 representative E6 strains, including 24 strains from this study, were used to investigate the evolutionary genetic characteristics and geographical spread of E6 strains. Phylogenetic analysis based on VP1 nucleotide sequence divergence showed that, globally, E6 strains can be grouped into six genotypes, designated A to F. Chinese E6 strains collected between 1988 and 2018 were found to belong to genotypes C, E, and F, with genotype F being predominant from 2007 to 2018. There was no significant difference in the geographical distribution of each genotype. The evolutionary rate of E6 was estimated to be 3.631 × 10-3 substitutions site-1 year-1 (95% highest posterior density [HPD]: 3.2406 × 10-3-4.031 × 10-3 substitutions site-1 year-1) by Bayesian MCMC analysis. The most recent common ancestor of the E6 genotypes was traced back to 1863, whereas their common ancestor in China was traced back to around 1962. A small genetic shift was detected in the Chinese E6 population size in 2009 according to Bayesian skyline analysis, which indicated that there might have been an epidemic around that year.

Download Full-text

Molecular epidemiology of coxsackievirus A16 circulating in children in Beijing, China from 2010 to 2019

World Journal of Pediatrics ◽

10.1007/s12519-021-00451-y ◽

2021 ◽

Author(s):

Ya-Fang Hu ◽

Li-Ping Jia ◽

Fang-Yuan Yu ◽

Li-Ying Liu ◽

Qin-Wei Song ◽

...

Keyword(s):

Molecular Epidemiology ◽

Evolutionary Rate ◽

Foot And Mouth Disease ◽

Recent Common Ancestor ◽

Rt Pcr ◽

Coxsackievirus A16 ◽

Most Recent Common Ancestor ◽

Etiological Agents ◽

High Level ◽

And Control

Abstract Background Coxsackievirus A16 (CVA16) is one of the major etiological agents of hand, foot and mouth disease (HFMD). This study aimed to investigate the molecular epidemiology and evolutionary characteristics of CVA16. Methods Throat swabs were collected from children with HFMD and suspected HFMD during 2010–2019. Enteroviruses (EVs) were detected and typed by real-time reverse transcription-polymerase chain reaction (RT-PCR) and RT-PCR. The genotype, evolutionary rate, the most recent common ancestor, population dynamics and selection pressure of CVA16 were analyzed based on viral protein gene (VP1) by bioinformatics software. Results A total of 4709 throat swabs were screened. EVs were detected in 3180 samples and 814 were CVA16 positive. More than 81% of CVA16-positive children were under 5 years old. The prevalence of CVA16 showed obvious periodic fluctuations with a high level during 2010–2012 followed by an apparent decline during 2013–2017. However, the activities of CVA16 increased gradually during 2018–2019. All the Beijing CVA16 strains belonged to sub-genotype B1, and B1b was the dominant strain. One B1c strain was detected in Beijing for the first time in 2016. The estimated mean evolutionary rate of VP1 gene was 4.49 × 10–3 substitution/site/year. Methionine gradually fixed at site-23 of VP1 since 2012. Two sites were detected under episodic positive selection, one of which (site-223) located in neutralizing linear epitope PEP71. Conclusions The dominant strains of CVA16 belonged to clade B1b and evolved in a fast evolutionary rate during 2010–2019 in Beijing. To provide more favorable data for HFMD prevention and control, it is necessary to keep attention on molecular epidemiological and evolutionary characteristics of CVA16.

Download Full-text

Whole-Genome Sequences of Influenza A(H1N1)pdm09 Virus Isolates from Kerala, India

Genome Announcements ◽

10.1128/genomea.00598-17 ◽

2017 ◽

Vol 5 (28) ◽

Cited By ~ 1

Author(s):

Sara Jones ◽

Raji Prasad ◽

Anjana S. Nair ◽

Sanjai Dharmaseelan ◽

Remya Usha ◽

...

Keyword(s):

Amino Acid ◽

Amino Acid Analysis ◽

Influenza A ◽

Whole Genome Sequence ◽

Whole Genome ◽

H1n1 Pandemic ◽

Genome Sequences ◽

Influenza A H1n1 ◽

Virus Isolates ◽

New Mutations

ABSTRACT We report here the whole-genome sequence of six clinical isolates of influenza A(H1N1)pdm09, isolated from Kerala, India. Amino acid analysis of all gene segments from the A(H1N1)pdm09 isolates obtained in 2014 and 2015 identified several new mutations compared to the 2009 A(H1N1) pandemic strain.

Download Full-text

Whole genome characterization of strains belonging to the Ralstonia solanacearum species complex and in silico analysis of TaqMan assays for detection in this heterogenous species complex

European Journal of Plant Pathology ◽

10.1007/s10658-020-02190-8 ◽

2021 ◽

Author(s):

Viola Kurm ◽

Ilse Houwers ◽

Claudia E. Coipan ◽

Peter Bonants ◽

Cees Waalwijk ◽

...

Keyword(s):

Ralstonia Solanacearum ◽

In Silico ◽

Species Complex ◽

Sequence Data ◽

In Silico Analysis ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequences ◽

Pcr Assays

AbstractIdentification and classification of members of the Ralstonia solanacearum species complex (RSSC) is challenging due to the heterogeneity of this complex. Whole genome sequence data of 225 strains were used to classify strains based on average nucleotide identity (ANI) and multilocus sequence analysis (MLSA). Based on the ANI score (>95%), 191 out of 192(99.5%) RSSC strains could be grouped into the three species R. solanacearum, R. pseudosolanacearum, and R. syzygii, and into the four phylotypes within the RSSC (I,II, III, and IV). R. solanacearum phylotype II could be split in two groups (IIA and IIB), from which IIB clustered in three subgroups (IIBa, IIBb and IIBc). This division by ANI was in accordance with MLSA. The IIB subgroups found by ANI and MLSA also differed in the number of SNPs in the primer and probe sites of various assays. An in-silico analysis of eight TaqMan and 11 conventional PCR assays was performed using the whole genome sequences. Based on this analysis several cases of potential false positives or false negatives can be expected upon the use of these assays for their intended target organisms. Two TaqMan assays and two PCR assays targeting the 16S rDNA sequence should be able to detect all phylotypes of the RSSC. We conclude that the increasing availability of whole genome sequences is not only useful for classification of strains, but also shows potential for selection and evaluation of clade specific nucleic acid-based amplification methods within the RSSC.

Download Full-text

Whole-genome analysis-based phylogeographic investigation of Streptococcus pneumoniae serotype 19A sequence type 320 isolates in Japan.

Antimicrobial Agents and Chemotherapy ◽

10.1128/aac.01395-21 ◽

2021 ◽

Author(s):

Satoshi Nakano ◽

Takao Fujisawa ◽

Bin Chang ◽

Yutaka Ito ◽

Hideki Akeda ◽

...

Keyword(s):

Common Ancestor ◽

Sampling Bias ◽

Health Concern ◽

Recent Common Ancestor ◽

Whole Genome ◽

Whole Genome Analysis ◽

Identification Rate ◽

Most Recent Common Ancestor ◽

The Common ◽

The U.S

After the introduction of the seven-valent pneumococcal conjugate vaccine, the global spread of multidrug resistant serotype 19A-ST320 strains became a public health concern. In Japan, the main genotype of serotype 19A was ST3111, and the identification rate of ST320 was low. Although the isolates were sporadically detected in both adults and children, their origin remains unknown. Thus, by combining pneumococcal isolates collected in three nationwide pneumococcal surveillance studies conducted in Japan between 2008 and 2020, we analyzed 56 serotype 19A-ST320 isolates along with 931 global isolates, using whole-genome sequencing to uncover the transmission route of the globally distributed clone in Japan. The clone was frequently detected in Okinawa Prefecture, where the U.S. returned to Japan in 1972. Phylogenetic analysis demonstrated that the isolates from Japan were genetically related to those from the U.S.; therefore, the common ancestor may have originated in the U.S. In addition, Bayesian analysis suggested that the time to the most recent common ancestor of the isolates form Japan and the U.S. was approximately the 1990s to 2000, suggesting the possibility that the common ancestor could have already spread in the U.S. before the Taiwan 19F-14 isolate was first identified in a Taiwanese hospital in 1997. The phylogeographical analysis supported the transmission of the clone from the U.S. to Japan, but the analysis could be influenced by sampling bias. These results suggested the possibility that the serotype 19A-ST320 clone had already spread in the U.S. before being imported into Japan.

Download Full-text

Development of a Rapid, Efficient, Intelligent and Cost-Saving Tool to Diagnose Pasteurella Multocida by Using Whole Genome Sequence and Genotypes of Pasteurella Multocida From Different Hosts

10.21203/rs.3.rs-104569/v1 ◽

2020 ◽

Author(s):

Zhong Peng ◽

Junyang Liu ◽

Wan Liang ◽

Fei Wang ◽

Li Wang ◽

...

Keyword(s):

Cost Saving ◽

Host Species ◽

Pasteurella Multocida ◽

Epidemiological Studies ◽

Whole Genome Sequence ◽

Clinical Settings ◽

Whole Genome ◽

Genome Sequences ◽

Host Tropism ◽

Tropism Prediction

Abstract Background: Different typing systems including capsular genotyping, lipopolysaccharide (LPS) genotyping, multilocus sequence typing (MLST), and virulence genotyping based on the detection of different virulence factor-encoding gene (VFG) profiles have been applied to characterize Pasteurella multocida strains from different host species. However, these methods require much time and effort in laboratories. Particularly, relying on one of these methods is difficult to address the biology of P. multocida from host species. Recently, we found that assigning P. multocida strains according to the combination of their capsular, LPS, and MLST genotypes (marked as capsular genotype: LPS genotype: MLST genotype) could help address the biological characteristics of P. multocida circulation in multiple hosts. However, it is still lack of a rapid, efficient, intelligent and cost-saving tool to diagnose P. multocida according to this system. Results: We have developed an intelligent genotyping and host tropism prediction tool PmGT for P. multocida strains according to their whole genome sequences by using machine learning and web 2.0 technologies. By using this tool, the capsular genotypes, LPS genotypes, and MLST genotypes as well as the main VFGs of P. multocida isolates in different host species were determined based on whole genome sequences. The results revealed a closer association between the genotypes and pasteurellosis rather than between genotypes and host species. Finally, we also used PmGT to predict the host species of P. multocida strains with the same capsular: lipopolysaccharide: MLST genotypes. Conclusions: With the advent of high-quality, inexpensive DNA sequencing, this platform represents a more efficient and cost-saving tool for P. multocida diagnosis in both epidemiological studies and clinical settings.

Download Full-text

Misclassification of a whole genome sequence reference defined by the Human Microbiome Project: a detrimental carryover effect to microbiome studies

10.1101/19000489 ◽

2019 ◽

Author(s):

DJ Darwin R. Bandoy ◽

B Carol Huang ◽

Bart C. Weimer

Keyword(s):

Human Microbiome ◽

Human Microbiome Project ◽

Outbreak Detection ◽

Whole Genome Sequence ◽

Reference Database ◽

Whole Genome ◽

Reference Species ◽

Genome Sequences ◽

Genome Homology ◽

Microbiome Data

AbstractTaxonomic classification is an essential step in the analysis of microbiome data that depends on a reference database of whole genome sequences. Taxonomic classifiers are built on established reference species, such as the Human Microbiome Project database, that is growing rapidly. While constructing a population wide pangenome of the bacterium Hungatella, we discovered that the Human Microbiome Project reference species Hungatella hathewayi (WAL 18680) was significantly different to other members of this genus. Specifically, the reference lacked the core genome as compared to the other members. Further analysis, using average nucleotide identity (ANI) and 16s rRNA comparisons, indicated that WAL18680 was misclassified as Hungatella. The error in classification is being amplified in the taxonomic classifiers and will have a compounding effect as microbiome analyses are done, resulting in inaccurate assignment of community members and will lead to fallacious conclusions and possibly treatment. As automated genome homology assessment expands for microbiome analysis, outbreak detection, and public health reliance on whole genomes increases this issue will likely occur at an increasing rate. These observations highlight the need for developing reference free methods for epidemiological investigation using whole genome sequences and the criticality of accurate reference databases.

Download Full-text

Whole-proteome tree of life suggests a deep burst of organism diversity

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1915766117 ◽

2020 ◽

Vol 117 (7) ◽

pp. 3678-3686 ◽

Cited By ~ 5

Author(s):

JaeJin Choi ◽

Sung-Hou Kim

Keyword(s):

Information Theory ◽

Genome Sequence ◽

Tree Of Life ◽

Whole Genome Sequence ◽

Whole Genome ◽

Genome Sequences ◽

Alignment Free ◽

Whole Transcriptome ◽

Evolutionary Progression ◽

Feature Frequency

An organism tree of life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms. Such a tree cannot be experimentally validated but may be reconstructed based on characteristics associated with the organisms. Since the whole-genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a whole-genome sequence-based ToL can be an empirically derivable surrogate for the organism ToL. However, experimentally determining the whole-genome sequences of many diverse organisms was practically impossible until recently. We have constructed three types of ToLs for diversely sampled organisms using the sequences of whole genome, of whole transcriptome, and of whole proteome. Of the three, whole-proteome sequence-based ToL (whole-proteome ToL), constructed by applying information theory-based feature frequency profile method, an “alignment-free” method, gave the most topologically stable ToL. Here, we describe the main features of a whole-proteome ToL for 4,023 species with known complete or almost complete genome sequences on grouping and kinship among the groups at deep evolutionary levels. The ToL reveals 1) all extant organisms of this study can be grouped into 2 “Supergroups,” 6 “Major Groups,” or 35+ “Groups”; 2) the order of emergence of the “founders” of all of the groups may be assigned on an evolutionary progression scale; 3) all of the founders of the groups have emerged in a “deep burst” at the very beginning period near the root of the ToL—an explosive birth of life’s diversity.

Download Full-text