scholarly journals Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root

Biology ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 64 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.

Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of exposed sites (on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the CAT model, a mixture model that is often used for analyses of protein datasets. In fact, the heterogeneous CAT model resulted in several rearrangements that are unlikely to represent evolutionary history. However, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acids both trees supported placement of ctenophores sister to all other animals. These results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.


2018 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

AbstractPhylogenomics has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life. This could reflect the poor-fit of the models used to analyze heterogeneous datasets; that heterogeneity is likely to have many explanations. However, it seems reasonable to hypothesize that the different patterns of selection on proteins based on their structures might represent a source of heterogeneity. To test that hypothesis, we developed an efficient pipeline to divide phylogenomic datasets that comprise proteins into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had different signals for the deepest branches in the metazoan tree of life. Sites located in different structural environments did support distinct tree topologies. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of sites on the surface of proteins yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the site-heterogeneous CAT model, a mixture model that is often used for analyses of protein datasets. In fact, analyses using the CAT model actually resulted in rearrangements that are unlikely to represent evolutionary history. These results provide striking evidence that it will be necessary to achieve a better understanding the constraints due to protein structure to improve phylogenetic estimation.


2019 ◽  
Author(s):  
Ana Filipa Moutinho ◽  
Fernanda Fontes Trancoso ◽  
Julien Yann Dutheil

AbstractAdaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intra-molecular level is poorly understood. To address this, we analysed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function and protein-protein interactions. We found that the relative solvent accessibility is a major driver of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signalling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by inter-molecular interactions, with host-pathogen coevolution likely playing a major role.


2019 ◽  
Vol 36 (9) ◽  
pp. 2013-2028 ◽  
Author(s):  
Ana Filipa Moutinho ◽  
Fernanda Fontes Trancoso ◽  
Julien Yann Dutheil

Abstract Adaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intramolecular level are poorly understood. To address this, we analyzed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene, and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function, and protein–protein interactions. We found that the relative solvent accessibility is a major determinant of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signaling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by intermolecular interactions, with host–pathogen coevolution likely playing a major role.


Genetics ◽  
1998 ◽  
Vol 149 (1) ◽  
pp. 445-458 ◽  
Author(s):  
Nick Goldman ◽  
Jeffrey L Thorne ◽  
David T Jones

Abstract Empirically derived models of amino acid replacement are employed to study the association between various physical features of proteins and evolution. The strengths of these associations are statistically evaluated by applying the models of protein evolution to 11 diverse sets of protein sequences. Parametric bootstrap tests indicate that the solvent accessibility status of a site has a particularly strong association with the process of amino acid replacement that it experiences. Significant association between secondary structure environment and the amino acid replacement process is also observed. Careful description of the length distribution of secondary structure elements and of the organization of secondary structure and solvent accessibility along a protein did not always significantly improve the fit of the evolutionary models to the data sets that were analyzed. As indicated by the strength of the association of both solvent accessibility and secondary structure with amino acid replacement, the process of protein evolution—both above and below the species level—will not be well understood until the physical constraints that affect protein evolution are identified and characterized.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Pablo Mier ◽  
Miguel A. Andrade-Navarro

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.


2020 ◽  
Vol 36 (11) ◽  
pp. 3372-3378
Author(s):  
Alexander Gress ◽  
Olga V Kalinina

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2000 ◽  
Vol 66 (4) ◽  
pp. 1354-1359 ◽  
Author(s):  
Liesbeth Rijnen ◽  
Pascal Courtin ◽  
Jean-Claude Gripon ◽  
Mireille Yvon

ABSTRACT The first step of amino acid degradation in lactococci is a transamination, which requires an α-keto acid as the amino group acceptor. We have previously shown that the level of available α-keto acid in semihard cheese is the first limiting factor for conversion of amino acids to aroma compounds, since aroma formation is greatly enhanced by adding α-ketoglutarate to cheese curd. In this study we introduced a heterologous catabolic glutamate dehydrogenase (GDH) gene into Lactococcus lactis so that this organism could produce α-ketoglutarate from glutamate, which is present at high levels in cheese. Then we evaluated the impact of GDH activity on amino acid conversion in in vitro tests and in a cheese model by using radiolabeled amino acids as tracers. The GDH-producing lactococcal strain degraded amino acids without added α-ketoglutarate to the same extent that the wild-type strain degraded amino acids with added α-ketoglutarate. Interestingly, the GDH-producing lactococcal strain produced a higher proportion of carboxylic acids, which are major aroma compounds. Our results demonstrated that a GDH-producing lactococcal strain could be used instead of adding α-ketoglutarate to improve aroma development in cheese.


2006 ◽  
Vol 72 (2) ◽  
pp. 1239-1247 ◽  
Author(s):  
Takashi Yoshida ◽  
Yukari Takashima ◽  
Yuji Tomaru ◽  
Yoko Shirai ◽  
Yoshitake Takao ◽  
...  

ABSTRACT We isolated a cyanophage (Ma-LMM01) that specifically infects a toxic strain of the bloom-forming cyanobacterium Microcystis aeruginosa. Transmission electron microscopy showed that the virion is composed of anisometric head and a tail complex consisting of a central tube and a contractile sheath with helical symmetry. The morphological features and the host specificity suggest that Ma-LMM01 is a member of the cyanomyovirus group. Using semi-one-step growth experiments, the latent period and burst size were estimated to be 6 to 12 h and 50 to 120 infectious units per cell, respectively. The size of the phage genome was estimated to be ca. 160 kbp using pulse-field gel electrophoresis; the nucleic acid was sensitive to DNase I, Bal31, and all 14 restriction enzymes tested, suggesting that it is a linear double-stranded DNA having a low level of methylation. Phylogenetic analyses based on the deduced amino acid sequences of two open reading frames coding for ribonucleotide reductase alpha- and beta-subunits showed that Ma-LMM01 forms a sister group with marine and freshwater cyanobacteria and is apparently distinct from T4-like phages. Phylogenetic analysis of the deduced amino acid sequence of the putative sheath protein showed that Ma-LMM01 does not form a monophyletic group with either the T4-like phages or prophages, suggesting that Ma-LMM01 is distinct from other T4-like phages that have been described despite morphological similarity. The host-phage system which we studied is expected to contribute to our understanding of the ecology of Microcystis blooms and the genetics of cyanophages, and our results suggest the phages could be used to control toxic cyanobacterial blooms.


Sign in / Sign up

Export Citation Format

Share Document