scholarly journals Synonymous mutations and the molecular evolution of SARS-Cov-2 origins

Author(s):  
Hongru Wang ◽  
Lenore Pipes ◽  
Rasmus Nielsen

AbstractHuman severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin isolated strain, GD410721, in the receptor binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN/dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of non-synonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence, is 51.71 years (95% C.I., 28.11-75.31) and 37.02 years (95% C.I., 18.19-55.85), respectively.

2020 ◽  
Author(s):  
Hongru Wang ◽  
Lenore Pipes ◽  
Rasmus Nielsen

Abstract Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin-isolated strain, GD410721, in the receptor-binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat-derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead, it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN/dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of nonsynonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence is 51.71 years (95% CI, 28.11–75.31) and 37.02 years (95% CI, 18.19–55.85), respectively.


2020 ◽  
Author(s):  
Chul Lee ◽  
Seoae Cho ◽  
Kyu-Won Kim ◽  
DongAhn Yoo ◽  
Jae Yong Han ◽  
...  

Abstract Single amino acid variants (SAVs) may provide clues to understanding evolution of traits. A complex trait that has evolved convergently among species is vocal learning, the rare ability to imitate sounds heard and an important component of spoken-language. Here we assessed whether convergent vocal learning bird species have convergent SAVs (CSAVs) that could be associated with their specialized trait. We analyzed avian genomes and identified CSAVs in vocal learners, but also in most species combinations tested. The number of CSAVs among species was proportional to the product of the most recent common ancestor (MRCA; origin) branch lengths of the species in question, and vocal learning birds did not exceed the overall proportion in most test. However, genes with identical CSAVs (iCSAVs) in vocal learning species were uniquely enriched in ‘learning’ functions, and a subset of iCSAV genes were under positive selection and had enriched specialized regulation in vocal learning and their adjacent brain subdivisions. Several top candidate genes converge on the cAMP signaling pathway, including DRD1B and PRKAR2B. Our findings suggest a complex mechanism of amino acid convergences and specialized gene regulation upon which selection acts for specialized convergent traits.


Author(s):  
Francisco Díez-Fuertes ◽  
María Iglesias-Caballero ◽  
Javier García Pérez ◽  
Sara Monzón ◽  
Pilar Jiménez ◽  
...  

SARS-CoV-2 whole-genome analysis has identified five large clades worldwide, emerged in 2019 (19A and 19B) and in 2020 (20A, 20B and 20C). This study aims to analyze the diffusion of SARS-CoV-2 in Spain using maximum likelihood phylogenetic and Bayesian phylodynamic analyses. The most recent common ancestor (MRCA) of the SARS-CoV-2 pandemic was estimated in Wuhan, China, around November 24, 2019. Phylogenetic analyses of the first 12,511 SARS-CoV-2 whole genome sequences obtained worldwide, including 290 from 11 different regions of Spain, revealed 62 independent introductions of the virus in the country. Most sequences from Spain were distributed in clades characterized by D614G substitution in S gene (20A, 20B and 20C) and L84S substitution in ORF8 (19B) with 163 and 118 sequences, respectively, with the remaining sequences branching in 19A. A total of 110 (38%) sequences from Spain grouped in four different monophyletic clusters of 20A clade (20A-Sp1 and 20A-Sp2) and 19B clade (19B-Sp1 and 19B-Sp2) along with sequences from 29 countries worldwide. The MRCA of 19A-Sp1, 20A-Sp1, 19A-Sp2 and 20A-Sp2 clusters were estimated in Spain around January 21 and 29, and February 6 and 17, 2020, respectively. The prevalence of 19B clade in Spain (40%) was by far higher than in any other European country during the first weeks of the epidemic, probably by a founder effect. However, this variant was replaced by G614-bearing viruses in April. In vitro assays showed an enhanced infectivity of pseudotyped virions displaying G614 substitution compared with D614, suggesting a fitness advantage of D614G. IMPORTANCE Multiple SARS-CoV-2 introductions have been detected in Spain and at least four resulted in the emergence of locally transmitted clusters originated not later than mid-February, with further dissemination to many other countries around the world and a few weeks before the explosion of COVID-19 cases detected in Spain during the first week of March. The majority of the earliest variants detected in Spain branched in 19B clade (D614 viruses), which was the most prevalent clade during the first weeks of March, pointing to a founder effect. However, from mid-March to June, 2020, G614-bearing viruses (20A, 20B and 20C clades) overcame D614 variants in Spain, probably as a consequence of an evolutionary advantage of this substitution in the spike protein. A higher infectivity of G614-bearing viruses compared to D614 variants was detected, suggesting that this substitution in SARS-CoV-2 spike protein could be behind the variant shift observed in Spain.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Monica Colombo ◽  
Simona Masiero ◽  
Stefano Rosa ◽  
Elisabetta Caporali ◽  
Silvia Laura Toffolatti ◽  
...  

Abstract Grapevine (Vitis vinifera L.) is a crop of major economic importance. However, grapevine yield is guaranteed by the massive use of pesticides to counteract pathogen infections. Under temperate-humid climate conditions, downy mildew is a primary threat for viticulture. Downy mildew is caused by the biotrophic oomycete Plasmopara viticola Berl. & de Toni, which can attack grapevine green tissues. In lack of treatments and with favourable weather conditions, downy mildew can devastate up to 75% of grape cultivation in one season and weaken newly born shoots, causing serious economic losses. Nevertheless, the repeated and massive use of some fungicides can lead to environmental pollution, negative impact on non-targeted organisms, development of resistance, residual toxicity and can foster human health concerns. In this manuscript, we provide an innovative approach to obtain specific pathogen protection for plants. By using the yeast two-hybrid approach and the P. viticola cellulose synthase 2 (PvCesA2), as target enzyme, we screened a combinatorial 8 amino acid peptide library with the aim to identify interacting peptides, potentially able to inhibit PvCesa2. Here, we demonstrate that the NoPv1 peptide aptamer prevents P. viticola germ tube formation and grapevine leaf infection without affecting the growth of non-target organisms and without being toxic for human cells. Furthermore, NoPv1 is also able to counteract Phytophthora infestans growth, the causal agent of late blight in potato and tomato, possibly as a consequence of the high amino acid sequence similarity between P. viticola and P. infestans cellulose synthase enzymes.


2021 ◽  
Author(s):  
Claudio Casola ◽  
Jingjia Li

AbstractBackgroundThe recurrent evolution of the C4 photosynthetic pathway in angiosperms represents one of the most extraordinary examples of convergent evolution of a complex trait. Comparative genomic analyses have unveiled some of the molecular changes associated with the C4 pathway. For instance, several key enzymes involved in the transition from C3 to C4 photosynthesis have been found to share convergent amino acid replacements along C4 lineages. However, the extent of convergent replacements potentially associated with the emergence of C4 plants remains to be fully assessed. Here, we introduced a robust empirical approach to test molecular convergence along a phylogeny including multiple C3 and C4 taxa. By analyzing proteins encoded by chloroplast genes, we tested if convergent replacements occurred more frequently than expected in C4 lineages compared to C3 lineages. Furthermore, we sought to determine if convergent evolution occurred in multiple chloroplast proteins beside the well-known case of the large RuBisCO subunit encoded by the chloroplast gene rbcL.MethodsOur study was based on the comparative analysis of 43 C4 and 21 C3 grass species belonging to the PACMAD clade, a focal taxonomic group in many investigations of C4 evolution. We first used protein sequences of 67 orthologous chloroplast genes to build an accurate phylogeny of these species. Then, we inferred amino acid replacements along 13 C4 lineages and 9 C3 lineages using reconstructed protein sequences of their ancestral branches, corresponding to the most recent common ancestor of each lineage. Pairwise comparisons between ancestral branches allowed us to identify both convergent and divergent amino acid replacements between C4-C4, C3-C3 and C3-C4 lineages.ResultsThe reconstructed phylogenetic tree of 64 PACMAD grasses was characterized by strong supports in all nodes used for analyses of convergence. We identified 217 convergent replacements and 201 divergent replacements in 45/67 chloroplast proteins in both C4 and C3 ancestral branches. Pairs of C4-C4 ancestral branches showed higher levels of convergent replacements than C3-C3 and C3-C4 pairs. Furthermore, we found that more proteins shared unique convergent replacements in C4 lineages, with both RbcL and RpoC1 (the RNA polymerase beta’ subunit 1) showing a significantly higher convergent/divergent replacements ratio in C4 branches. Notably, significantly more C4-C4 pairs of ancestral branches showed higher numbers of convergent vs. divergent replacements than C3-C3 and C3-C4 pairs. Our results demonstrated that, in the PACMAD clade, C4 grasses experienced higher levels of molecular convergence than C3 species across multiple chloroplast genes. These findings have important implications for both our understanding of the evolution of photosynthesis and the goal of engineering improved crop varieties that integrates components of the C4 pathway.


Author(s):  
Jianguo Li ◽  
Zhen Li ◽  
Xiaogang Cui ◽  
Changxin Wu

Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread widely from China to the world. Although the viral genome has been well characterized, the evolutionary origin and global transmission dynamics of SARS-CoV-2 remain poorly investigated. To address this, we retrieved 313 SARS-CoV-2 genomes from the GISAID database (https://www.gisaid.org), from which 99 genomes generated from original clinical specimens with exact collection dates from 16 countries were selected and enrolled for Bayesian phylodynamic analysis. Here we show that the time to the Most Recent Common Ancestor (tMRCA) of SARS-CoV-2 is Dec 11, 2019 (95%HPD, Nov 21 - Dec 24). Two clades of global circulating strains of SARS-CoV-2 were suggested by Bayesian Maximum Clade Credibility (MCC) tree. The USA circulating strains of SARS-CoV-2 seemed to be from both of the two clades, the UK and Australia circulating strains were from Clade 1, the circulating strains in Singapore, Japan, Germany, France, and Italy were from Clade 2. Although we have not found any obvious bottle-neck-effect from the Bayesian Skyline Plot of the viral population dynamics reconstruction, a sharp reduction of the lower 95% HPD of the relative genetic diversity was observed from Feb 5, 2020, suggesting a possible initiation of a bottle-neck-effect. Thirteen (6 synonymous and 7 non-synonymous) mutations in the viral genome were observed, including two clade-specific mutations (C8782T and T1844C in Clade 1 rather than Clade 2) and eleven sub-clade specific mutations. All of the observed mutations occurred in the USA circulating strains, except one mutation T18488C only occurred in the UK circulating strains. A non-synonymous mutation in the 3’-UTR was also observed, suggesting an altered RNA replication capacity of SARS-CoV-2. We thus came to the conclusion that continuous evolution occurred in almost all regions of the SARS-CoV-2 genome and potentially in a country-specific manner. Further efforts on monitoring the genomic mutations of SARS-CoV-2 from different countries are recommended.


2020 ◽  
Author(s):  
Babatunde Olarenwaju Motayo ◽  
Olukunle Oluwapamilerin Oluwasemowo ◽  
Paul Akiniyi Akinduti ◽  
Babatunde Adebiyi Olusola ◽  
Olumide T Aerege ◽  
...  

ABSTRACTThe ongoing SARSCoV-2 pandemic was introduced into Africa on 14th February 2020 and has rapidly spread across the continent causing severe public health crisis and mortality. We investigated the genetic diversity and evolution of this virus during the early outbreak months using whole genome sequences. We performed; recombination analysis against closely related CoV, Bayesian time scaled phylogeny and investigated spike protein amino acid mutations. Results from our analysis showed recombination signals between the AfrSARSCoV-2 sequences and reference sequences within the N and S genes. The evolutionary rate of the AfrSARSCoV-2 was 4.133 × 10−4 high posterior density HPD (4.132 × 10−4 to 4.134 × 10−4) substitutions/site/year. The time to most recent common ancestor TMRCA of the African strains was December 7th 2019. The AfrSARCoV-2 sequences diversified into two lineages A and B with B being more diverse with multiple sub-lineages confirmed by both maximum clade credibility MCC tree and PANGOLIN software. There was a high prevalence of the D614-G spike protein amino acid mutation (82.61%) among the African strains. Our study has revealed a rapidly diversifying viral population with the G614 spike protein variant dominating, we advocate for up scaling NGS sequencing platforms across Africa to enhance surveillance and aid control effort of SARSCoV-2 in Africa.


2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Austin L. Hughes

Phylogenetic analysis of heme peroxidases (HPXs) of Culicidae and other insects revealed six highly conserved ancient HPX lineages, each of which originated by gene duplication prior to the most recent common ancestor (MRCA) of Hemimetabola and Holmetabola. In addition, culicid HPX7 and HPX12 arose by gene duplication after the MRCA of Culicidae and Drosophilidae, while HPX2 orthologs were not found in any other order analyzed except Diptera. Within Diptera, HPX2, HPX7, and HPX12 were relatively poorly conserved at the amino acid level in comparison to the six ancient lineages. The genome ofAnopheles gambiaeincluded genes ecoding five proteins (HPX10, HPX11, HPX13, HXP14, and HPX15) without ortholgs in other genomes analyzed. Overall, gene expression patterns did not seem to reflect phylogenetic relationships, but genes that evolved rapidly at the amino acid sequence level tended to have divergent expression patterns as well. The uniquely high level of duplication of HPXs inA. gambiaemay have played a role in coevolution with malaria parasites.


2015 ◽  
Vol 282 (1812) ◽  
pp. 20151105 ◽  
Author(s):  
Austin L. Hughes

Avian genomes typically encode three distinct vitellogenin (VTG) egg yolk proteins (VTG1, VTG2 and VTG3), which arose by gene duplication prior to the most recent common ancestor of birds. Analysis of VTG sequences from 34 avian species in a phylogenetic framework supported the hypothesis that VTG amino acid composition has co-evolved with embryo incubation time. Embryo incubation time was positively correlated with the proportions of dietary essential amino acids (EAAs) in VTG1 and VTG2, and with the proportion of sulfur-containing amino acids in VTG3. These patterns were seen even when only semi-altricial and/or altricial species were considered, suggesting that the duration of embryo incubation is a major selective factor on the amino acid composition of VTGs, rather than developmental mode alone. The results are consistent with the hypothesis that the level of EAAs provided to the egg represents an adaptation to the loss of amino acids through breakdown over the course of incubation and imply that life-history phenotypes and VTG amino acid composition have co-evolved throughout the evolutionary history of birds.


2009 ◽  
Vol 99 (8) ◽  
pp. 943-950 ◽  
Author(s):  
Satyanarayana Tatineni ◽  
Amy D. Ziems ◽  
Stephen N. Wegulo ◽  
Roy French

The complete genome sequence of Triticum mosaic virus (TriMV), a member in the family Potyviridae, has been determined to be 10,266 nucleotides (nt) excluding the 3′ polyadenylated tail. The genome encodes a large polyprotein of 3,112 amino acids with the “hall-mark proteins” of potyviruses, including a small overlapping gene, PIPO, in the P3 cistron. The genome of TriMV has an unusually long 5′ nontranslated region of 739 nt with 12 translation initiation codons and three small open reading frames, which resemble those of the internal ribosome entry site containing 5′ leader sequences of the members of Picornaviridae. Pairwise comparison of 10 putative mature proteins of TriMV with those of representative members of genera in the family Potyviridae revealed 33 to 44% amino acid identity within the highly conserved NIb protein sequence and 15 to 29% amino acid identity within the least conserved P1 protein, suggesting that TriMV is a distinct member in the family Potyviridae. In contrast, TriMV displayed 47 to 65% amino acid sequence identity with available sequences of mature proteins of Sugarcane streak mosaic virus (SCSMV), an unassigned member of the Potyviridae. Phylogenetic analyses of the complete polyprotein, NIa-Pro, NIb, and coat protein sequences of representative species of six genera and unassigned members of the family Potyviridae suggested that TriMV and SCSMV are sister taxa and share a most recent common ancestor with tritimoviruses or ipomoviruses. These results suggest that TriMV and SCSMV should be classified in a new genus, and we propose the genus Poacevirus in the family Potyviridae, with TriMV as the type member.


Sign in / Sign up

Export Citation Format

Share Document