ancestral sequences
Recently Published Documents


TOTAL DOCUMENTS

80
(FIVE YEARS 37)

H-INDEX

16
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Yongtao Ye ◽  
Marcus Shum ◽  
Joseph Tsui ◽  
Guangchuang Yu ◽  
David Smith ◽  
...  

Massive sequencing of SARS-CoV-2 genomes has led to a great demand for adding new samples to a reference phylogeny instead of building the tree from scratch. To address such challenge, we proposed an algorithm 'TIPars' by integrating parsimony analysis with pre-computed ancestral sequences. Compared to four state-of-the-art methods on four benchmark datasets (SARS-CoV-2, Influenza virus, Newcastle disease virus and 16S rRNA genes), TIPars achieved the best performance in most tests. It took only 21 seconds to insert 100 SARS-CoV-2 genomes to a 100k-taxa reference tree using near 1.4 gigabytes of memory. Its efficient and accurate phylogenetic placements and incrementation for phylogenies with highly similar and divergent sequences suggest that it will be useful in a wide range of studies including pathogen molecular epidemiology, microbiome diversity and systematics.


2021 ◽  
Author(s):  
Chris Papadopoulos ◽  
Isabelle Callebaut ◽  
Jean-Christophe Gelly ◽  
Isabelle Hatin ◽  
Olivier Namy ◽  
...  

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences’ properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic open reading frames (ORFs) of S. cerevisiae with the aim of (1) exploring whether the structural states’ diversity of proteomes is already present in noncoding sequences, and (2) estimating the potential of the noncoding genome to produce novel protein bricks that could either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural state diversity of canonical proteins, with the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by reconstructing the ancestral sequences of 70 yeast de novo genes and characterized the sequence and structural properties of intergenic ORFs with a strong translation signal. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.


2021 ◽  
Vol 18 (184) ◽  
Author(s):  
Patrick C. F. Buchholz ◽  
Bert van Loo ◽  
Bernard D. G. Eenink ◽  
Erich Bornberg-Bauer ◽  
Jürgen Pleiss

Evolutionary relationships of protein families can be characterized either by networks or by trees. Whereas trees allow for hierarchical grouping and reconstruction of the most likely ancestral sequences, networks lack a time axis but allow for thresholds of pairwise sequence identity to be chosen and, therefore, the clustering of family members with presumably more similar functions. Here, we use the large family of arylsulfatases and phosphonate monoester hydrolases to investigate similarities, strengths and weaknesses in tree and network representations. For varying thresholds of pairwise sequence identity, values of betweenness centrality and clustering coefficients were derived for nodes of the reconstructed ancestors to measure the propensity to act as a bridge in a network. Based on these properties, ancestral protein sequences emerge as bridges in protein sequence networks. Interestingly, many ancestral protein sequences appear close to extant sequences. Therefore, reconstructed ancestor sequences might also be interpreted as yet-to-be-identified homologues. The concept of ancestor reconstruction is compared to consensus sequences, too. It was found that hub sequences in a network, e.g. reconstructed ancestral sequences that are connected to many neighbouring sequences, share closer similarity with derived consensus sequences. Therefore, some reconstructed ancestor sequences can also be interpreted as consensus sequences.


2021 ◽  
Vol 17 (10) ◽  
pp. e1009535
Author(s):  
Antonina Kalkus ◽  
Joy Barrett ◽  
Theyjasvi Ashok ◽  
Brian R. Morton

The codon usage of the Angiosperm psbA gene is atypical for flowering plant chloroplast genes but similar to the codon usage observed in highly expressed plastid genes from some other Plantae, particularly Chlorobionta, lineages. The pattern of codon bias in these genes is suggestive of selection for a set of translationally optimal codons but the degree of bias towards these optimal codons is much weaker in the flowering plant psbA gene than in high expression plastid genes from lineages such as certain green algal groups. Two scenarios have been proposed to explain these observations. One is that the flowering plant psbA gene is currently under weak selective constraints for translation efficiency, the other is that there are no current selective constraints and we are observing the remnants of an ancestral codon adaptation that is decaying under mutational pressure. We test these two models using simulations studies that incorporate the context-dependent mutational properties of plant chloroplast DNA. We first reconstruct ancestral sequences and then simulate their evolution in the absence of selection on codon usage by using mutation dynamics estimated from intergenic regions. The results show that psbA has a significantly higher level of codon adaptation than expected while other chloroplast genes are within the range predicted by the simulations. These results suggest that there have been selective constraints on the codon usage of the flowering plant psbA gene during Angiosperm evolution.


2021 ◽  
Author(s):  
Kristen J. Wade ◽  
Samantha Tisa ◽  
Chloe Barrington ◽  
Kristy R. Crooks ◽  
Chris R. Gignoux ◽  
...  

ABSTRACTSince the initial reported discovery of SARS-CoV-2 in late 2019, genomic surveillance has been an important tool to understand its transmission and evolution. Here, we describe a case study of genomic sequencing of Colorado SARS-CoV-2 samples collected August through November 2020 at the University of Colorado Anschutz Medical campus in Aurora and the United States Air Force Academy in Colorado Springs. We obtained nearly complete sequences for 44 genomes, inferred ancestral sequences shared among these local samples, and used NextStrain variant and clade frequency monitoring in North America to place the Colorado sequences into their continental context. Furthermore, we describe genomic monitoring of a lineage that likely originated in the local Colorado Springs community and expanded rapidly over the course of two months in an outbreak within the well-controlled environment of the United States Air Force Academy. This variant contained a number of amino acid-altering mutations that may have contributed to its spread, but it appears to have been controlled using extensive contact tracing and strict quarantine protocols. The genome sequencing allowed validation of the transmission pathways inferred by the United States Air Force Academy and provides a window into the evolutionary process and transmission dynamics of a potentially dangerous but ultimately contained variant.SIGNIFICANCESARS-CoV-2 spreads and mutates, negatively impacting containment. In this study, we use long-read sequencing to generate 44 SARS-CoV-2 genomes from COVID-19 patients associated with a rapid-spreading event on the USAFA campus, as well as a neighboring community for reference. We reconstruct the genomic and evolutionary signatures of the rapid-spreading event, and pin-point novel, protein-altering mutations that may have impacted viral fitness. These insights into viral evolutionary dynamics, in the context of contact tracing and a rigorous containment program, help to inform response efforts in the future.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Andreas Lange ◽  
Prajal H. Patel ◽  
Brennen Heames ◽  
Adam M. Damry ◽  
Thorsten Saenger ◽  
...  

AbstractComparative genomic studies have repeatedly shown that new protein-coding genes can emerge de novo from noncoding DNA. Still unknown is how and when the structures of encoded de novo proteins emerge and evolve. Combining biochemical, genetic and evolutionary analyses, we elucidate the function and structure of goddard, a gene which appears to have evolved de novo at least 50 million years ago within the Drosophila genus. Previous studies found that goddard is required for male fertility. Here, we show that Goddard protein localizes to elongating sperm axonemes and that in its absence, elongated spermatids fail to undergo individualization. Combining modelling, NMR and circular dichroism (CD) data, we show that Goddard protein contains a large central α-helix, but is otherwise partially disordered. We find similar results for Goddard’s orthologs from divergent fly species and their reconstructed ancestral sequences. Accordingly, Goddard’s structure appears to have been maintained with only minor changes over millions of years.


2021 ◽  
Author(s):  
Deivid Almeida de Jesus ◽  
Darlisson Mesquista Batista ◽  
Shayla Salzman ◽  
Lucas Miguel Carvalho ◽  
Kaue Santana ◽  
...  

Abstract Regulation of flowering is a crucial event in the evolutionary history of angiosperms. The production of flowers is regulated through the integration of different environmental and endogenous stimuli, many of which involve the activation of different genes in a hierarchical and complex signaling network. The FLOWERING LOCUS T/TERMINAL FLOWER 1 (FT/TFL1) gene family is known to regulate important aspects of flowering in plants. To better understand the pivotal events that changed FT and TFL1 functions during the evolution of angiosperms, we reconstructed the ancestral sequences of FT/TFL1-like genes and predicted protein structures to identify determinant sites that evolved in both proteins and allowed the adaptative diversification in the flowering phenology and developmental processes. Residues from the P-loop domain of the analyzed FT structures showed predominantly high destabilizing mutations which is consistent with constant selective pressure found for this region. In addition, we demonstrate that the occurrence of destabilizing mutations in residues located at the phosphatidylcholine binding sites of FT structure experience positive selection, and some residues of 4th exon are under negative selection, which is compensated by the occurrence of stabilizing mutations in key regions and the P-loop to maintain the overall protein stability. Our results shed light on the evolutionary history of key genes involved in the diversification of angiosperms.


2021 ◽  
Author(s):  
Andreas Lange ◽  
Prajal H. Patel ◽  
Brennen Heames ◽  
Adam M. Damry ◽  
Thorsten Saenger ◽  
...  

AbstractComparative genomic studies have repeatedly shown that new protein-coding genes can emerge de novo from non-coding DNA. Still unknown is how and when the structures of encoded de novo proteins emerge and evolve. Combining biochemical, genetic and evolutionary analyses, we elucidate the function and structure of goddard, a gene which appears to have evolved de novo at least 50 million years ago within the Drosophila genus.Previous studies found that goddard is required for male fertility. Here, we show that Goddard protein localizes to elongating sperm axonemes and that in its absence, elongated spermatids fail to undergo individualization. Combining modelling, NMR and CD data, we show that Goddard protein contains a large central α-helix, but is otherwise partially disordered. We find similar results for Goddard’s orthologs from divergent fly species and their reconstructed ancestral sequences. Accordingly, Goddard’s structure appears to have been maintained with only minor changes over millions of years.


2021 ◽  
Author(s):  
Lenore Pipes ◽  
Rasmus Nielsen

Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.


Author(s):  
Gonzalo Fernández Balaguer ◽  
Carmen del Águila ◽  
Carolina Hurtado Marcos ◽  
Rubén Agudo Torres

The β-lactamases are proteins of bacterial origin that are characterized by hydrolyzing antibiotics β-lactams, conferring microbial resistance against them. They are a heterogeneous family of proteins very relevant from a health point of view due to the ease they present to acquire resistance to new drugs due to their high capacity for evolution. The in vitro evolution of these proteins has served not only to develop their characterization and improve their knowledge, but as a new line of research that allows to predictively identify residues involved in the acquisition of antibiotic resistance. At the same time, the method of ancestral protein reconstruction has been revealed as a novel and useful tool to understand the evolution of β-lactamases and understand some of their characteristics such as their promiscuity. In this work, a study of ancestral β-lactamases reconstructed from the phylogeny of existing class A β-lactamases has been carried out. Of the four ancestral proteins studied, one has been obtained that is functional and has compared its hydrolytic activity with that of four of its current counterparts against eight β-lactam drugs. This ancestral protein has been shown to have a more generalistic antibiotic activity than any of the current proteins studied. In addition, the active ancestral protein showed more resistance to one of the drugs used than the rest of β-lactamases existing. Finally these results have been discussed and from them it is argued why reconstructed ancestral sequences can be a very attractive starting point when it comes to direct evolution of proteins for obtaining proteins of biotechnological interest.


Sign in / Sign up

Export Citation Format

Share Document