scholarly journals A coarse-graining, ultrametric approach to resolve the phylogeny of prokaryotic strains with frequent homologous recombination

2016 ◽  
Author(s):  
Tin Yau Pang

ABSTRACTA frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many existing algorithms require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption. Here, we propose a novel coarse-graining approach for phylogenetic reconstruction (CGP), which is recombination-aware, applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches.AUTHOR SUMMARYIn homologous recombination, segments of foreign DNA overwrite similar segments of a prokaryotic genome. A single recombination event can simultaneously introduce many DNA substitutions. This disturbs phylogenetic signals, making it difficult to reconstruct prokaryotic family trees. While a handful of recombination-aware phylogenetic algorithms have been proposed, most do not take all effects of recombination into account; others rely on the frequently unrealistic assumption that a substantial part of a genome has not been affected by recombination at all. Here, we introduce a novel approach to phylogenetic reconstruction, which estimates the age of the most recent common ancestor of two strains from the density distribution of DNA or amino acid substitutions between their genomes. The proposed phylogenetic tree is the tree most compatible with these age estimates. Based on nucleotide or amino acid sequences, our approach accurately predicts the topology, branch lengths, and root positioning of prokaryotic family trees.

2020 ◽  
Author(s):  
Tin Yau Pang

Abstract Background A frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many of them reconstruct the ancestral recombination graph (ARG) to help infer the genome tree, and require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption. Results Here, we propose a coarse-graining approach for phylogenetic reconstruction (CGP), which is recombination-aware but forgoes ARG reconstruction. It accounts for the tendency of a higher effective recombination rate between genomes with a lower phylogenetic distance. It is applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches. Conclusions The CGP algorithm is more accurate than alternative recombination-aware methods for ultrametric phylogenetic reconstructions.


2020 ◽  
Author(s):  
Tin Yau Pang

Abstract Background A frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many of them reconstruct the ancestral recombination graph (ARG) to help infer the genome tree, and require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption. Methods Here, we propose a coarse-graining approach for phylogenetic reconstruction (CGP), which is recombination-aware but forgoes ARG reconstruction. It accounts for the tendency of a higher effective recombination rate between genomes with a lower phylogenetic distance. It is applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. Results Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches.Conclusions The CGP algorithm is more accurate than alternative recombination-aware methods for ultrametric phylogenetic reconstructions.


2019 ◽  
Author(s):  
Tin Yau Pang

Abstract Background: A frequent event in the evolution of prokaryotic genomes is homologous recombination, where a foreign DNA stretch replaces a genomic region similar in sequence. Recombination can affect the relative position of two genomes in a phylogenetic reconstruction in two different ways: (i) one genome can recombine with a DNA stretch that is similar to the other genome, thereby reducing their pairwise sequence divergence; (ii) one genome can recombine with a DNA stretch from an outgroup genome, increasing the pairwise divergence. While several recombination-aware phylogenetic algorithms exist, many of these cannot account for both types of recombination; some algorithms can, but do so inefficiently. Moreover, many of them reconstruct the ancestral recombination graph (ARG) to help infer the genome tree, and require that a substantial portion of each genome has not been affected by recombination, a sometimes unrealistic assumption.Results: Here, we propose a coarse-graining approach for phylogenetic reconstruction (CGP), which is recombination-aware but forgoes ARG reconstruction, applicable even if all genomic regions have experienced substantial amounts of recombination, and can be used on both nucleotide and amino acid sequences. CGP considers the local density of substitutions along pairwise genome alignments, fitting a model to the empirical distribution of substitution density to infer the pairwise coalescent time. Given all pairwise coalescent times, CGP reconstructs an ultrametric tree representing vertical inheritance. Based on simulations, we show that the proposed approach can reconstruct ultrametric trees with accurate topology, branch lengths, and root positioning. Applied to a set of E. coli strains, the reconstructed trees are most consistent with gene distributions when inferred from amino acid sequences, a data type that cannot be utilized by many alternative approaches.Conclusions The CGP algorithm is more accurate than alternative recombination-aware methods for ultrametric phylogenetic reconstructions.


2018 ◽  
Vol 44 (1) ◽  
pp. 20
Author(s):  
Eloiza Teles Caldart ◽  
Helena Mata ◽  
Cláudio Wageck Canal ◽  
Ana Paula Ravazzolo

Background: Phylogenetic analyses are an essential part in the exploratory assessment of nucleic acid and amino acid sequences. Particularly in virology, they are able to delineate the evolution and epidemiology of disease etiologic agents and/or the evolutionary path of their hosts. The objective of this review is to help researchers who want to use phylogenetic analyses as a tool in virology and molecular epidemiology studies, presenting the most commonly used methodologies, describing the importance of the different techniques, their peculiar vocabulary and some examples of their use in virology.Review: This article starts presenting basic concepts of molecular epidemiology and molecular evolution, emphasizing their relevance in the context of viral infectious diseases. It presents a session on the vocabulary relevant to the subject, bringing readers to a minimum level of knowledge needed throughout this literature review. Within its main subject, the text explains what a molecular phylogenetic analysis is, starting from a multiple alignment of nucleotide or amino acid sequences. The different software used to perform multiple alignments may apply different algorithms. To build a phylogeny based on amino acid or nucleotide sequences it is necessary to produce a data matrix based on a model for nucleotide or amino acid replacement, also called evolutionary model. There are a number of evolutionary models available, varying in complexity according to the number of parameters (transition, transversion, GC content, nucleotide position in the codon, among others). Some papers presented herein provide techniques that can be used to choose evolutionary models. After the model is chosen, the next step is to opt for a phylogenetic reconstruction method that best fits the available data and the selected model. Here we present the most common reconstruction methods currently used, describing their principles, advantages and disadvantages. Distance methods, for example, are simpler and faster, however, they do not provide reliable estimations when the sequences are highly divergent. The accuracy of the analysis with probabilistic models (neighbour joining, maximum likelihood and bayesian inference) strongly depends on the adherence of the actual data to the chosen development model. Finally, we also explore topology confidence tests, especially the most used one, the bootstrap. To assist the reader, this review presents figures to explain specific situations discussed in the text and numerous examples of previously published scientific articles in virology that demonstrate the importance of the techniques discussed herein, as well as their judicious use.Conclusion: The DNA sequence is not only a record of phylogeny and divergence times, but also keeps signs of how the evolutionary process has shaped its history and also the elapsed time in the evolutionary process of the population. Analyses of genomic sequences by molecular phylogeny have demonstrated a broad spectrum of applications. It is important to note that for the different available data and different purposes of phylogenies, reconstruction methods and evolutionary models should be wisely chosen. This review provides theoretical basis for the choice of evolutionary models and phylogenetic reconstruction methods best suited to each situation. In addition, it presents examples of diverse applications of molecular phylogeny in virology.


2020 ◽  
Author(s):  
Chul Lee ◽  
Seoae Cho ◽  
Kyu-Won Kim ◽  
DongAhn Yoo ◽  
Jae Yong Han ◽  
...  

Abstract Single amino acid variants (SAVs) may provide clues to understanding evolution of traits. A complex trait that has evolved convergently among species is vocal learning, the rare ability to imitate sounds heard and an important component of spoken-language. Here we assessed whether convergent vocal learning bird species have convergent SAVs (CSAVs) that could be associated with their specialized trait. We analyzed avian genomes and identified CSAVs in vocal learners, but also in most species combinations tested. The number of CSAVs among species was proportional to the product of the most recent common ancestor (MRCA; origin) branch lengths of the species in question, and vocal learning birds did not exceed the overall proportion in most test. However, genes with identical CSAVs (iCSAVs) in vocal learning species were uniquely enriched in ‘learning’ functions, and a subset of iCSAV genes were under positive selection and had enriched specialized regulation in vocal learning and their adjacent brain subdivisions. Several top candidate genes converge on the cAMP signaling pathway, including DRD1B and PRKAR2B. Our findings suggest a complex mechanism of amino acid convergences and specialized gene regulation upon which selection acts for specialized convergent traits.


Genetics ◽  
1995 ◽  
Vol 141 (4) ◽  
pp. 1641-1650 ◽  
Author(s):  
Z Yang ◽  
S Kumar ◽  
M Nei

Abstract A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.


1993 ◽  
Vol 69 (04) ◽  
pp. 351-360 ◽  
Author(s):  
Masahiro Murakawa ◽  
Takashi Okamura ◽  
Takumi Kamura ◽  
Tsunefumi Shibuya ◽  
Mine Harada ◽  
...  

SummaryThe partial amino acid sequences of fibrinogen Aα-chains from five mammalian species have been inferred by means of the polymerase chain reaction (PCR). From the genomic DNA of the rhesus monkey, pig, dog, mouse and Syrian hamster, the DNA fragments coding for α-C domains in the Aα-chains were amplified and sequenced. In all species examined, four cysteine residues were always conserved at the homologous positions. The carboxy- and amino-terminal portions of the α-C domains showed a considerable homology among the species. However, the sizes of the middle portions, which corresponded to the internal repeat structures, showed an apparent variability because of several insertions and/or deletions. In the rhesus monkey, pig, mouse and Syrian hamster, 13 amino acid tandem repeats fundamentally similar to those in humans and the rat were identified. In the dog, however, tandem repeats were found to consist of 18 amino acids, suggesting an independent multiplication of the canine repeats. The sites of the α-chain cross-linking acceptor and α2-plasmin inhibitor cross-linking donor were not always evolutionally conserved. The arginyl-glycyl-aspartic acid (RGD) sequence was not found in the amplified region of either the rhesus monkey or the pig. In the canine α-C domain, two RGD sequences were identified at the homologous positions to both rat and human RGD S. In the Syrian hamster, a single RGD sequence was found at the same position to that of the rat. Triplication of the RGD sequences was seen in the murine fibrinogen α-C domain around the homologous site to the rat RGDS sequence. These findings are of some interest from the point of view of structure-function and evolutionary relationships in the mammalian fibrinogen Aα-chains.


1979 ◽  
Author(s):  
Takashi Morita ◽  
Craig Jackson

Bovine Factor X is eluted in two forms (X1and X2) from anion exchange chromatographic columns. These two forms have indistinguishable amino acid compositions, molecular weights and specific activities. The amino acid sequences containing the γ-carboxyglutamic acid residues have been shown to be identical in X1 and X2(H. Morris, personal communication). An activation peptide is released from the N-terminal region of the heavy chain of Factor X by an activator from Russell’s viper venom. This peptide can be isolated after activation by gel filtration on Sephadex G-100 under nondenaturing conditions. The activation peptides from a mixture of Factors X1 and X2 were separated into two forms by anion-exchange chromatography. The activation peptide (AP1) which eluted first was shown to be derived from Factor X1. while the activation peptiae (AP2) which eluted second was shown to be derived from X2 on the basis of chromatographic separations carried out on Factors X1 and X2 separately. Factor Xa was eluted as a symmetrical single peak. On the basis of these and other data characterizing these products, we conclude that the difference between X1 and X2 are properties of the structures of the activation peptides. (Supported by a grant HL 12820 from the National Heart, Lung and Blood Institute. C.M.J. is an Established Investigator of the American Heart Association).


2020 ◽  
Vol 44 (3) ◽  
pp. 177-189
Author(s):  
Momir Dunjic ◽  
Stefano Turini ◽  
Dejan Krstic ◽  
Katarina Dunjic ◽  
Marija Dunjic ◽  
...  

Radiofrequency therapy is an unconventional method, already applied for some time, with numerous results in numerous clinical pictures. Our group has developed a software, later called SONGENPROT-SOLARIS, capable of directly converting nucleotide sequences (DNA and/or RNA) and amino acid sequences (polypeptides and proteins) into musical sequences, based on mathematic matrices, designed by the French physicist and musician Joel Sternheimer, which allows to associate a musical note with a nucleotide or an amino acid. Innovation in our software is that, in the algorithm that defines it, a variant is directly implemented that allows the reproduction of sounds, phase-shifted by 30 Hz, between one ear and another reproducing the phenomenon of Binaural Tones, capable of induce a specific brain activity and also the release of particles called solitons. Thanks to this software we have developed a technique called MMT (Molecular Music Therapy) and currently, we are in the phase of applying the technique on a cohort of 91 patients, with a high spectrum of clinical pictures, examining the same, using the technique Bi-Digital-ORing-Test (BDORT), before and after treatment with MMT. Aim of project is to stimulate the expression of a specific gene (the same genetic sequence that the patient listens to, translated into music), only through the use of sound sequences. We have concentrated our attention on three main molecules: Sirtuin-1, Telomers and TP-53. The results obtained with BDORT, after treatment with MMT, showed a significant increase in the values of the three molecules, on all the examined patients, demonstrating the operative efficacy of the technique and the its applicability to numerous diseases. In order to confirm the data obtained by BDORT, we propose, with the help of an accredited laboratory, to perform epigenetic tests on the three parameters listed above, paving the way to understanding how frequencies can influence gene expression.


Sign in / Sign up

Export Citation Format

Share Document