scholarly journals Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds

Author(s):  
Emily L. Gordon ◽  
Rebecca T. Kimball ◽  
Edward L. Braun

Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acids exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution dataset for transmembrane helices from a variety of sampled set of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.

Diversity ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 555
Author(s):  
Emily L. Gordon ◽  
Rebecca T. Kimball ◽  
Edward L. Braun

Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein-coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acid exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution matrix for transmembrane helices estimated using a variety of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.


2018 ◽  
Author(s):  
Jeffrey I. Boucher ◽  
Troy W. Whitfield ◽  
Ann Dauphin ◽  
Gily Nachum ◽  
Carl Hollins ◽  
...  

AbstractThe evolution of HIV-1 protein sequences should be governed by a combination of factors including nucleotide mutational probabilities, the genetic code, and fitness. The impact of these factors on protein sequence evolution are interdependent, making it challenging to infer the individual contribution of each factor from phylogenetic analyses alone. We investigated the protein sequence evolution of HIV-1 by determining an experimental fitness landscape of all individual amino acid changes in protease. We compared our experimental results to the frequency of protease variants in a publicly available dataset of 32,163 sequenced isolates from drug-naïve individuals. The most common amino acids in sequenced isolates supported robust experimental fitness, indicating that the experimental fitness landscape captured key features of selection acting on protease during viral infections of hosts. Amino acid changes requiring multiple mutations from the likely ancestor were slightly less likely to support robust experimental fitness than single mutations, consistent with the genetic code favoring chemically conservative amino acid changes. Amino acids that were common in sequenced isolates were predominantly accessible by single mutations from the likely protease ancestor. Multiple mutations commonly observed in isolates were accessible by mutational walks with highly fit single mutation intermediates. Our results indicate that the prevalence of multiple base mutations in HIV-1 protease is strongly influenced by mutational sampling.


Biology ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 64 ◽  
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.


Author(s):  
Yong-Chan Kim ◽  
Byung-Hoon Jeong

AbstractInterferon-induced transmembrane protein 3 (IFITM3) plays a pivotal role in antiviral capacity in several species. However, to date, investigations of the IFITM3 protein in cattle have been rare. According to recent studies, interspecific differences in the IFITM3 protein result in several unique features of the IFITM3 protein relative to primates and birds. Thus, in the present study, we investigated the bovine IFITM3 protein based on nucleotide and amino acid sequences to find its distinct features. We found that the bovine IFITM3 gene showed a significantly different length and homology relative to other species, including primates, rodents and birds. Phylogenetic analyses indicated that the bovine IFITM3 gene and IFITM3 protein showed closer evolutionary distance with primates than with rodents. However, cattle showed an independent clade among primates, rodents and birds. Multiple sequence alignment of the IFITM3 protein indicated that the bovine IFITM3 protein contains 36 bovine-specific amino acids. Notably, the bovine IFITM3 protein was predicted to prefer inside-to-outside topology of intramembrane domain 1 (IMD1) and inside-to-outside topology of transmembrane domain 2 by TMpred and three membrane embedding domains according to the SOSUI system.


2018 ◽  
Author(s):  
Zackery A. Ely ◽  
Jiyun M. Moon ◽  
Gregory R. Sliwoski ◽  
Amandeep K. Sangha ◽  
Xing-Xing Shen ◽  
...  

AbstractImmunity genes have repeatedly experienced natural selection during mammalian evolution. Galectins are carbohydrate-binding proteins that regulate diverse immune responses, including maternal-fetal immune tolerance in placental pregnancy. Seven human galectins, four conserved across vertebrates and three specific to primates, are involved in placental development. To comprehensively study the molecular evolution of these galectins both across mammals and within humans, we conducted a series of between-and within-species evolutionary analyses. By examining patterns of sequence evolution between species, we found that primate-specific galectins showed uniformly high substitution rates, whereas two of the four other galectins experienced accelerated evolution in primates. By examining human population genomic variation, we found that galectin genes and variants, including variants previously linked to immune diseases, showed signatures of recent positive selection in specific human populations. By examining one nonsynonymous variant in Galectin-8 previously associated with autoimmune diseases, we further discovered that it is tightly linked to three other nonsynonymous variants; surprisingly, the global frequency of this four-variant haplotype is ∼50%. To begin understanding the impact of this major haplotype on Galectin-8 protein structure, we modeled its 3D protein structure and found that it differed substantially from the reference protein structure. These results suggest that placentally expressed galectins experienced both ancient and more recent selection in a lineage-and population-specific manner. Furthermore, our discovery that the major Galectin-8 haplotype is structurally distinct from and more commonly found than the reference haplotype illustrates the significance of understanding the evolutionary processes that sculpted variants associated with human genetic disease.


2019 ◽  
Vol 11 (9) ◽  
pp. 2574-2592 ◽  
Author(s):  
Zackery A Ely ◽  
Jiyun M Moon ◽  
Gregory R Sliwoski ◽  
Amandeep K Sangha ◽  
Xing-Xing Shen ◽  
...  

Abstract Immunity genes have repeatedly experienced natural selection during mammalian evolution. Galectins are carbohydrate-binding proteins that regulate diverse immune responses, including maternal–fetal immune tolerance in placental pregnancy. Seven human galectins, four conserved across vertebrates and three specific to primates, are involved in placental development. To comprehensively study the molecular evolution of these galectins, both across mammals and within humans, we conducted a series of between- and within-species evolutionary analyses. By examining patterns of sequence evolution between species, we found that primate-specific galectins showed uniformly high substitution rates, whereas two of the four other galectins experienced accelerated evolution in primates. By examining human population genomic variation, we found that galectin genes and variants, including variants previously linked to immune diseases, showed signatures of recent positive selection in specific human populations. By examining one nonsynonymous variant in Galectin-8 previously associated with autoimmune diseases, we further discovered that it is tightly linked to three other nonsynonymous variants; surprisingly, the global frequency of this four-variant haplotype is ∼50%. To begin understanding the impact of this major haplotype on Galectin-8 protein structure, we modeled its 3D protein structure and found that it differed substantially from the reference protein structure. These results suggest that placentally expressed galectins experienced both ancient and more recent selection in a lineage- and population-specific manner. Furthermore, our discovery that the major Galectin-8 haplotype is structurally distinct from and more commonly found than the reference haplotype illustrates the significance of understanding the evolutionary processes that sculpted variants associated with human genetic disease.


2019 ◽  
Vol 36 (4) ◽  
pp. 798-810 ◽  
Author(s):  
Jeffrey I Boucher ◽  
Troy W Whitfield ◽  
Ann Dauphin ◽  
Gily Nachum ◽  
Carl Hollins ◽  
...  

Abstract The evolution of HIV-1 protein sequences should be governed by a combination of factors including nucleotide mutational probabilities, the genetic code, and fitness. The impact of these factors on protein sequence evolution is interdependent, making it challenging to infer the individual contribution of each factor from phylogenetic analyses alone. We investigated the protein sequence evolution of HIV-1 by determining an experimental fitness landscape of all individual amino acid changes in protease. We compared our experimental results to the frequency of protease variants in a publicly available data set of 32,163 sequenced isolates from drug-naïve individuals. The most common amino acids in sequenced isolates supported robust experimental fitness, indicating that the experimental fitness landscape captured key features of selection acting on protease during viral infections of hosts. Amino acid changes requiring multiple mutations from the likely ancestor were slightly less likely to support robust experimental fitness than single mutations, consistent with the genetic code favoring chemically conservative amino acid changes. Amino acids that were common in sequenced isolates were predominantly accessible by single mutations from the likely protease ancestor. Multiple mutations commonly observed in isolates were accessible by mutational walks with highly fit single mutation intermediates. Our results indicate that the prevalence of multiple-base mutations in HIV-1 protease is strongly influenced by mutational sampling.


2021 ◽  
Author(s):  
Ning Wang ◽  
Edward L Braun ◽  
Bin Liang ◽  
Joel Cracraft ◽  
Stephen A. Smith

Phylogenetic analyses of large-scale datasets sometimes fail to yield a satisfactory resolution of the relationships among taxa for a number of nodes in the tree of life. This has even been true for genome-scale datasets, where the failure to resolve relationships is unlikely to reflect limitations in the amount of data. Gene tree conflicts are particularly notable in studies focused on these contentious nodes in the tree of life, and taxon sampling, different analytical methods, and/or data-type effects are thought to further confound analyses. Observed conflicts among gene trees arise from both biological processes and artefactual sources of noise in analyses. Although many efforts have been made to incorporate biological conflicts, few studies have curated individual genes for their efficiency in phylogenomic studies. Here, we conduct an edge-based analysis of Neoavian evolution, examining the phylogenetic efficacy of two recent phylogenomic bird datasets and three datatypes (ultraconserved elements [UCEs], introns, and coding regions). We assess the potential causes for biases in signal-resolution for three difficult nodes: the earliest divergence of Neoaves, the position of the enigmatic Hoatzin (Opisthocomus hoazin), and the position of owls (Strigidae). We observed extensive conflict among genes for all data types and datasets even after we removed potentially problematic loci. Edge-based analyses increased congruence and examined the impact of data type, GC content variation (GCCV), and outlier genes on analyses. These factors had different impact on each of nodes we examined. First, outlier gene signals appeared to drive different patterns of support for the relationships among the earliest diverging Neoaves. Second, the position of Hoatzin was highly variable, but we found that data type was correlated with the signals that support different placements of the Hoatzin. However, the resolution with the most support in our analyses was Hoatzin + shorebirds. Finally, GCCV, rather than data type (i.e., coding vs non-coding) per se, was correlated with an owl + Accipitriformes signal. Eliminating high GCCV loci increased the signal for an owl + mousebird relationship. Difficult edges (i.e., characterized by deep coalescence and high gene-tree estimation error) are hard to recover with all methods (including concatenation, multispecies coalescent, and edge-based analyses), whereas "easy" edges (e.g., flamingos + grebes) can be recovered without ambiguity. Thus, the nature of the edges, rather than the methods, is the limiting factor. Categorical edge-based analyses can reveal the nature of each edge and provide a way to highlight especially problematic branches that warrant further examination in future phylogenomic studies. We suggest that edge-based analyses provide a tool that can increase our understanding about the parts of the avian tree that remain unclear, even with large-scale data. In fact, our results emphasize that the conflicts associated with edges that remain contentious in the bird tree may be even greater than appreciated based on previous studies.


Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. The most striking difference in phylogenetic signal reflected relative solvent accessibility; analyses of exposed sites (on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge-ctenophore clade. These differences in phylogenetic signal were not ameliorated when we repeated our analyses using the CAT model, a mixture model that is often used for analyses of protein datasets. In fact, the heterogeneous CAT model resulted in several rearrangements that are unlikely to represent evolutionary history. However, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acids both trees supported placement of ctenophores sister to all other animals. These results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.


1997 ◽  
Vol 161 ◽  
pp. 179-187
Author(s):  
Clifford N. Matthews ◽  
Rose A. Pesce-Rodriguez ◽  
Shirley A. Liebman

AbstractHydrogen cyanide polymers – heterogeneous solids ranging in color from yellow to orange to brown to black – may be among the organic macromolecules most readily formed within the Solar System. The non-volatile black crust of comet Halley, for example, as well as the extensive orangebrown streaks in the atmosphere of Jupiter, might consist largely of such polymers synthesized from HCN formed by photolysis of methane and ammonia, the color observed depending on the concentration of HCN involved. Laboratory studies of these ubiquitous compounds point to the presence of polyamidine structures synthesized directly from hydrogen cyanide. These would be converted by water to polypeptides which can be further hydrolyzed to α-amino acids. Black polymers and multimers with conjugated ladder structures derived from HCN could also be formed and might well be the source of the many nitrogen heterocycles, adenine included, observed after pyrolysis. The dark brown color arising from the impacts of comet P/Shoemaker-Levy 9 on Jupiter might therefore be mainly caused by the presence of HCN polymers, whether originally present, deposited by the impactor or synthesized directly from HCN. Spectroscopic detection of these predicted macromolecules and their hydrolytic and pyrolytic by-products would strengthen significantly the hypothesis that cyanide polymerization is a preferred pathway for prebiotic and extraterrestrial chemistry.


Sign in / Sign up

Export Citation Format

Share Document