Rules of amino acid convergence: Not how many, but who in avian vocal learning clades

Author(s):  
Chul Lee ◽  
Seoae Cho ◽  
Kyu-Won Kim ◽  
DongAhn Yoo ◽  
Jae Yong Han ◽  
...  

Abstract Single amino acid variants (SAVs) may provide clues to understanding evolution of traits. A complex trait that has evolved convergently among species is vocal learning, the rare ability to imitate sounds heard and an important component of spoken-language. Here we assessed whether convergent vocal learning bird species have convergent SAVs (CSAVs) that could be associated with their specialized trait. We analyzed avian genomes and identified CSAVs in vocal learners, but also in most species combinations tested. The number of CSAVs among species was proportional to the product of the most recent common ancestor (MRCA; origin) branch lengths of the species in question, and vocal learning birds did not exceed the overall proportion in most test. However, genes with identical CSAVs (iCSAVs) in vocal learning species were uniquely enriched in ‘learning’ functions, and a subset of iCSAV genes were under positive selection and had enriched specialized regulation in vocal learning and their adjacent brain subdivisions. Several top candidate genes converge on the cAMP signaling pathway, including DRD1B and PRKAR2B. Our findings suggest a complex mechanism of amino acid convergences and specialized gene regulation upon which selection acts for specialized convergent traits.

2021 ◽  
Author(s):  
Claudio Casola ◽  
Jingjia Li

AbstractBackgroundThe recurrent evolution of the C4 photosynthetic pathway in angiosperms represents one of the most extraordinary examples of convergent evolution of a complex trait. Comparative genomic analyses have unveiled some of the molecular changes associated with the C4 pathway. For instance, several key enzymes involved in the transition from C3 to C4 photosynthesis have been found to share convergent amino acid replacements along C4 lineages. However, the extent of convergent replacements potentially associated with the emergence of C4 plants remains to be fully assessed. Here, we introduced a robust empirical approach to test molecular convergence along a phylogeny including multiple C3 and C4 taxa. By analyzing proteins encoded by chloroplast genes, we tested if convergent replacements occurred more frequently than expected in C4 lineages compared to C3 lineages. Furthermore, we sought to determine if convergent evolution occurred in multiple chloroplast proteins beside the well-known case of the large RuBisCO subunit encoded by the chloroplast gene rbcL.MethodsOur study was based on the comparative analysis of 43 C4 and 21 C3 grass species belonging to the PACMAD clade, a focal taxonomic group in many investigations of C4 evolution. We first used protein sequences of 67 orthologous chloroplast genes to build an accurate phylogeny of these species. Then, we inferred amino acid replacements along 13 C4 lineages and 9 C3 lineages using reconstructed protein sequences of their ancestral branches, corresponding to the most recent common ancestor of each lineage. Pairwise comparisons between ancestral branches allowed us to identify both convergent and divergent amino acid replacements between C4-C4, C3-C3 and C3-C4 lineages.ResultsThe reconstructed phylogenetic tree of 64 PACMAD grasses was characterized by strong supports in all nodes used for analyses of convergence. We identified 217 convergent replacements and 201 divergent replacements in 45/67 chloroplast proteins in both C4 and C3 ancestral branches. Pairs of C4-C4 ancestral branches showed higher levels of convergent replacements than C3-C3 and C3-C4 pairs. Furthermore, we found that more proteins shared unique convergent replacements in C4 lineages, with both RbcL and RpoC1 (the RNA polymerase beta’ subunit 1) showing a significantly higher convergent/divergent replacements ratio in C4 branches. Notably, significantly more C4-C4 pairs of ancestral branches showed higher numbers of convergent vs. divergent replacements than C3-C3 and C3-C4 pairs. Our results demonstrated that, in the PACMAD clade, C4 grasses experienced higher levels of molecular convergence than C3 species across multiple chloroplast genes. These findings have important implications for both our understanding of the evolution of photosynthesis and the goal of engineering improved crop varieties that integrates components of the C4 pathway.


2020 ◽  
Author(s):  
Babatunde Olarenwaju Motayo ◽  
Olukunle Oluwapamilerin Oluwasemowo ◽  
Paul Akiniyi Akinduti ◽  
Babatunde Adebiyi Olusola ◽  
Olumide T Aerege ◽  
...  

ABSTRACTThe ongoing SARSCoV-2 pandemic was introduced into Africa on 14th February 2020 and has rapidly spread across the continent causing severe public health crisis and mortality. We investigated the genetic diversity and evolution of this virus during the early outbreak months using whole genome sequences. We performed; recombination analysis against closely related CoV, Bayesian time scaled phylogeny and investigated spike protein amino acid mutations. Results from our analysis showed recombination signals between the AfrSARSCoV-2 sequences and reference sequences within the N and S genes. The evolutionary rate of the AfrSARSCoV-2 was 4.133 × 10−4 high posterior density HPD (4.132 × 10−4 to 4.134 × 10−4) substitutions/site/year. The time to most recent common ancestor TMRCA of the African strains was December 7th 2019. The AfrSARCoV-2 sequences diversified into two lineages A and B with B being more diverse with multiple sub-lineages confirmed by both maximum clade credibility MCC tree and PANGOLIN software. There was a high prevalence of the D614-G spike protein amino acid mutation (82.61%) among the African strains. Our study has revealed a rapidly diversifying viral population with the G614 spike protein variant dominating, we advocate for up scaling NGS sequencing platforms across Africa to enhance surveillance and aid control effort of SARSCoV-2 in Africa.


2012 ◽  
Vol 2012 ◽  
pp. 1-6 ◽  
Author(s):  
Austin L. Hughes

Phylogenetic analysis of heme peroxidases (HPXs) of Culicidae and other insects revealed six highly conserved ancient HPX lineages, each of which originated by gene duplication prior to the most recent common ancestor (MRCA) of Hemimetabola and Holmetabola. In addition, culicid HPX7 and HPX12 arose by gene duplication after the MRCA of Culicidae and Drosophilidae, while HPX2 orthologs were not found in any other order analyzed except Diptera. Within Diptera, HPX2, HPX7, and HPX12 were relatively poorly conserved at the amino acid level in comparison to the six ancient lineages. The genome ofAnopheles gambiaeincluded genes ecoding five proteins (HPX10, HPX11, HPX13, HXP14, and HPX15) without ortholgs in other genomes analyzed. Overall, gene expression patterns did not seem to reflect phylogenetic relationships, but genes that evolved rapidly at the amino acid sequence level tended to have divergent expression patterns as well. The uniquely high level of duplication of HPXs inA. gambiaemay have played a role in coevolution with malaria parasites.


Author(s):  
Hongru Wang ◽  
Lenore Pipes ◽  
Rasmus Nielsen

AbstractHuman severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin isolated strain, GD410721, in the receptor binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN/dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of non-synonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence, is 51.71 years (95% C.I., 28.11-75.31) and 37.02 years (95% C.I., 18.19-55.85), respectively.


2016 ◽  
Author(s):  
Tin Yau Pang

ABSTRACTA frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many existing algorithms require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption. Here, we propose a novel coarse-graining approach for phylogenetic reconstruction (CGP), which is recombination-aware, applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches.AUTHOR SUMMARYIn homologous recombination, segments of foreign DNA overwrite similar segments of a prokaryotic genome. A single recombination event can simultaneously introduce many DNA substitutions. This disturbs phylogenetic signals, making it difficult to reconstruct prokaryotic family trees. While a handful of recombination-aware phylogenetic algorithms have been proposed, most do not take all effects of recombination into account; others rely on the frequently unrealistic assumption that a substantial part of a genome has not been affected by recombination at all. Here, we introduce a novel approach to phylogenetic reconstruction, which estimates the age of the most recent common ancestor of two strains from the density distribution of DNA or amino acid substitutions between their genomes. The proposed phylogenetic tree is the tree most compatible with these age estimates. Based on nucleotide or amino acid sequences, our approach accurately predicts the topology, branch lengths, and root positioning of prokaryotic family trees.


2015 ◽  
Vol 282 (1812) ◽  
pp. 20151105 ◽  
Author(s):  
Austin L. Hughes

Avian genomes typically encode three distinct vitellogenin (VTG) egg yolk proteins (VTG1, VTG2 and VTG3), which arose by gene duplication prior to the most recent common ancestor of birds. Analysis of VTG sequences from 34 avian species in a phylogenetic framework supported the hypothesis that VTG amino acid composition has co-evolved with embryo incubation time. Embryo incubation time was positively correlated with the proportions of dietary essential amino acids (EAAs) in VTG1 and VTG2, and with the proportion of sulfur-containing amino acids in VTG3. These patterns were seen even when only semi-altricial and/or altricial species were considered, suggesting that the duration of embryo incubation is a major selective factor on the amino acid composition of VTGs, rather than developmental mode alone. The results are consistent with the hypothesis that the level of EAAs provided to the egg represents an adaptation to the loss of amino acids through breakdown over the course of incubation and imply that life-history phenotypes and VTG amino acid composition have co-evolved throughout the evolutionary history of birds.


2009 ◽  
Vol 99 (8) ◽  
pp. 943-950 ◽  
Author(s):  
Satyanarayana Tatineni ◽  
Amy D. Ziems ◽  
Stephen N. Wegulo ◽  
Roy French

The complete genome sequence of Triticum mosaic virus (TriMV), a member in the family Potyviridae, has been determined to be 10,266 nucleotides (nt) excluding the 3′ polyadenylated tail. The genome encodes a large polyprotein of 3,112 amino acids with the “hall-mark proteins” of potyviruses, including a small overlapping gene, PIPO, in the P3 cistron. The genome of TriMV has an unusually long 5′ nontranslated region of 739 nt with 12 translation initiation codons and three small open reading frames, which resemble those of the internal ribosome entry site containing 5′ leader sequences of the members of Picornaviridae. Pairwise comparison of 10 putative mature proteins of TriMV with those of representative members of genera in the family Potyviridae revealed 33 to 44% amino acid identity within the highly conserved NIb protein sequence and 15 to 29% amino acid identity within the least conserved P1 protein, suggesting that TriMV is a distinct member in the family Potyviridae. In contrast, TriMV displayed 47 to 65% amino acid sequence identity with available sequences of mature proteins of Sugarcane streak mosaic virus (SCSMV), an unassigned member of the Potyviridae. Phylogenetic analyses of the complete polyprotein, NIa-Pro, NIb, and coat protein sequences of representative species of six genera and unassigned members of the family Potyviridae suggested that TriMV and SCSMV are sister taxa and share a most recent common ancestor with tritimoviruses or ipomoviruses. These results suggest that TriMV and SCSMV should be classified in a new genus, and we propose the genus Poacevirus in the family Potyviridae, with TriMV as the type member.


Viruses ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1525
Author(s):  
Safia Zeghbib ◽  
Balázs A. Somogyi ◽  
Brigitta Zana ◽  
Gábor Kemenesi ◽  
Róbert Herczeg ◽  
...  

To explore the SARS-CoV-2 pandemic in Algeria, a dataset comprising ninety-five genomes originating from SARS-CoV-2 sampled from Algeria and other countries worldwide, from 24 December 2019, through 4 March 2021, was thoroughly examined. While performing a multi-component analysis regarding the Algerian outbreak, the toolkit of phylogenetic, phylogeographic, haplotype, and genomic analysis were effectively implemented. We estimated the Time to the Most Recent Common Ancestor (TMRCA) in reference to the Algerian pandemic and highlighted the multiple introductions of the disease and the missing data depicted in the transmission loop. In addition, we emphasized the significant role played by local and international travels in disease dissemination. Most importantly, we unveiled mutational patterns, the effect of unique mutations on corresponding proteins, and the relatedness regarding the Algerian sequences to other sequences worldwide. Our results revealed individual amino-acid replacements such as the deleterious replacement A23T in the orf3a gene in Algeria_EPI_ISL_418241. Additionally, a connection between Algeria_EPI_ISL_420037 and sequences originating from the USA was observed through a USA characteristic amino-acid replacement T1004I in the nsp3 gene, found in the aforementioned Algerian sequence. Similarly, successful tracing could be established, such as Algeria/G37318-8849/2020|EPI_ISL_766863, which was imported from Saudi Arabia during the pilgrimage. Lastly, we assessed the Algerian mitigation measures regarding disease containment using statistical analyses.


2016 ◽  
Vol 113 (29) ◽  
pp. 8002-8009 ◽  
Author(s):  
Rohan S. Mehta ◽  
David Bryant ◽  
Noah A. Rosenberg

Monophyletic groups—groups that consist of all of the descendants of a most recent common ancestor—arise naturally as a consequence of descent processes that result in meaningful distinctions between organisms. Aspects of monophyly are therefore central to fields that examine and use genealogical descent. In particular, studies in conservation genetics, phylogeography, population genetics, species delimitation, and systematics can all make use of mathematical predictions under evolutionary models about features of monophyly. One important calculation, the probability that a set of gene lineages is monophyletic under a two-species neutral coalescent model, has been used in many studies. Here, we extend this calculation for a species tree model that contains arbitrarily many species. We study the effects of species tree topology and branch lengths on the monophyly probability. These analyses reveal new behavior, including the maintenance of nontrivial monophyly probabilities for gene lineage samples that span multiple species and even for lineages that do not derive from a monophyletic species group. We illustrate the mathematical results using an example application to data from maize and teosinte.


2020 ◽  
Author(s):  
Hongru Wang ◽  
Lenore Pipes ◽  
Rasmus Nielsen

Abstract Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin-isolated strain, GD410721, in the receptor-binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat-derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead, it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN/dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of nonsynonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence is 51.71 years (95% CI, 28.11–75.31) and 37.02 years (95% CI, 18.19–55.85), respectively.


Sign in / Sign up

Export Citation Format

Share Document