scholarly journals Phylogenetic Analyses of Sites in Different Protein Structural Environments Result in Distinct Placements of the Metazoan Root

Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of exposed sites (on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the CAT model, a mixture model that is often used for analyses of protein datasets. In fact, the heterogeneous CAT model resulted in several rearrangements that are unlikely to represent evolutionary history. However, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acids both trees supported placement of ctenophores sister to all other animals. These results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.

Biology ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 64 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.


2018 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

AbstractPhylogenomics has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life. This could reflect the poor-fit of the models used to analyze heterogeneous datasets; that heterogeneity is likely to have many explanations. However, it seems reasonable to hypothesize that the different patterns of selection on proteins based on their structures might represent a source of heterogeneity. To test that hypothesis, we developed an efficient pipeline to divide phylogenomic datasets that comprise proteins into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had different signals for the deepest branches in the metazoan tree of life. Sites located in different structural environments did support distinct tree topologies. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of sites on the surface of proteins yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the site-heterogeneous CAT model, a mixture model that is often used for analyses of protein datasets. In fact, analyses using the CAT model actually resulted in rearrangements that are unlikely to represent evolutionary history. These results provide striking evidence that it will be necessary to achieve a better understanding the constraints due to protein structure to improve phylogenetic estimation.


2020 ◽  
Vol 36 (11) ◽  
pp. 3372-3378
Author(s):  
Alexander Gress ◽  
Olga V Kalinina

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Ana Filipa Moutinho ◽  
Fernanda Fontes Trancoso ◽  
Julien Yann Dutheil

AbstractAdaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intra-molecular level is poorly understood. To address this, we analysed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function and protein-protein interactions. We found that the relative solvent accessibility is a major driver of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signalling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by inter-molecular interactions, with host-pathogen coevolution likely playing a major role.


Author(s):  
Emily L. Gordon ◽  
Rebecca T. Kimball ◽  
Edward L. Braun

Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acids exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution dataset for transmembrane helices from a variety of sampled set of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.


2019 ◽  
Vol 36 (9) ◽  
pp. 2013-2028 ◽  
Author(s):  
Ana Filipa Moutinho ◽  
Fernanda Fontes Trancoso ◽  
Julien Yann Dutheil

Abstract Adaptive mutations play an important role in molecular evolution. However, the frequency and nature of these mutations at the intramolecular level are poorly understood. To address this, we analyzed the impact of protein architecture on the rate of adaptive substitutions, aiming to understand how protein biophysics influences fitness and adaptation. Using Drosophila melanogaster and Arabidopsis thaliana population genomics data, we fitted models of distribution of fitness effects and estimated the rate of adaptive amino-acid substitutions both at the protein and amino-acid residue level. We performed a comprehensive analysis covering genome, gene, and protein structure, by exploring a multitude of factors with a plausible impact on the rate of adaptive evolution, such as intron number, protein length, secondary structure, relative solvent accessibility, intrinsic protein disorder, chaperone affinity, gene expression, protein function, and protein–protein interactions. We found that the relative solvent accessibility is a major determinant of adaptive evolution, with most adaptive mutations occurring at the surface of proteins. Moreover, we observe that the rate of adaptive substitutions differs between protein functional classes, with genes encoding for protein biosynthesis and degradation signaling exhibiting the fastest rates of protein adaptation. Overall, our results suggest that adaptive evolution in proteins is mainly driven by intermolecular interactions, with host–pathogen coevolution likely playing a major role.


1997 ◽  
Vol 161 ◽  
pp. 179-187
Author(s):  
Clifford N. Matthews ◽  
Rose A. Pesce-Rodriguez ◽  
Shirley A. Liebman

AbstractHydrogen cyanide polymers – heterogeneous solids ranging in color from yellow to orange to brown to black – may be among the organic macromolecules most readily formed within the Solar System. The non-volatile black crust of comet Halley, for example, as well as the extensive orangebrown streaks in the atmosphere of Jupiter, might consist largely of such polymers synthesized from HCN formed by photolysis of methane and ammonia, the color observed depending on the concentration of HCN involved. Laboratory studies of these ubiquitous compounds point to the presence of polyamidine structures synthesized directly from hydrogen cyanide. These would be converted by water to polypeptides which can be further hydrolyzed to α-amino acids. Black polymers and multimers with conjugated ladder structures derived from HCN could also be formed and might well be the source of the many nitrogen heterocycles, adenine included, observed after pyrolysis. The dark brown color arising from the impacts of comet P/Shoemaker-Levy 9 on Jupiter might therefore be mainly caused by the presence of HCN polymers, whether originally present, deposited by the impactor or synthesized directly from HCN. Spectroscopic detection of these predicted macromolecules and their hydrolytic and pyrolytic by-products would strengthen significantly the hypothesis that cyanide polymerization is a preferred pathway for prebiotic and extraterrestrial chemistry.


1993 ◽  
Vol 28 (1) ◽  
pp. 83-110 ◽  
Author(s):  
Richard E. Farrell ◽  
Jae E. Yang ◽  
P. Ming Huang ◽  
Wen K. Liaw

Abstract Porewater samples from the upper Qu’Appelle River basin in Saskatchewan, Canada, were analyzed to obtain metal, inorganic ligand and amino add profiles. These data were used to compute the aqueous speciation of the metals in each porewater using the computer program GEOCHEM-PC. The porewaters were classified as slightly to moderately saline. Metal concentrations reflected both the geology of the drainage basin and the impact of anthropogenic activities. Whereas K and Na were present almost entirely as the free aquo ions, carbonate equilibria dominated the speciation of Ca. Mg and Mn (the predominant metal ligand species were of the type MCO3 (s). MCO30. and MHCO3+). Trace metal concentrations were generally within the ranges reported for non-polluted freshwater systems. Whereas the speciation of the trace metals Cr(III) and Co(II) was dominated by carbonate equilibria, Hg(II)-, Zn(II)- and Fe(II)-speciation was dominated by hydroxy-metal complexes of the type M(OH)+ and M(OH)2°. The speciation of Fe(III) was dominated by Fe(OH)3 (s). In porewaters with high chloride concentrations (> 2 mM), however, significant amounts of Hg(II) were bound as HgCl20 and HgClOH0. The aqueous speciation of Al was dominated by Al(OH)4− and Al2Si2O4(OH)6 (s). Total concentrations of dissolved free amino acids varied from 15.21 to 25.17 umole L−1. The most important metal scavenging amino acids were histidine (due to high stability constants for the metal-histidine complexes) and tryptophan (due to its relatively high concentration in the porewaters. i.e., 5.96 to 7.73 umole L−1). Secondary concentrations of various trace metal-amino add complexes were computed for all the porewaters, but metal-amino acid complexes dominated the speciation of Cu(II) in all the porewaters and Ni(II) in two of the porewaters.


Sign in / Sign up

Export Citation Format

Share Document