scholarly journals The Roles of Protein Structure, Taxon Sampling, and Model Complexity in Phylogenomics: A Case Study Focused on Early Animal Divergences

Biophysica ◽  
2021 ◽  
Vol 1 (2) ◽  
pp. 87-105
Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Despite the long history of using protein sequences to infer the tree of life, the potential for different parts of protein structures to retain historical signal remains unclear. We propose that it might be possible to improve analyses of phylogenomic datasets by incorporating information about protein structure. We test this idea using the position of the root of Metazoa (animals) as a model system. We examined the distribution of “strongly decisive” sites (alignment positions that support a specific tree topology) in a dataset comprising >1500 proteins and almost 100 taxa. The proportion of each class of strongly decisive sites in different structural environments was very sensitive to the model used to analyze the data when a limited number of taxa were used but they were stable when taxa were added. As long as enough taxa were analyzed, sites in all structural environments supported the same topology regardless of whether standard tree searches or decisive sites were used to select the optimal tree. However, the use of decisive sites revealed a difference between the support for minority topologies for sites in different structural environments: buried sites and sites in sheet and coil environments exhibited equal support for the minority topologies, whereas solvent-exposed and helix sites had unequal numbers of sites, supporting the minority topologies. This suggests that the relatively slowly evolving buried, sheet, and coil sites are giving an accurate picture of the true species tree and the amount of conflict among gene trees. Taken as a whole, this study indicates that phylogenetic analyses using sites in different structural environments can yield different topologies for the deepest branches in the animal tree of life and that analyzing larger numbers of taxa eliminates this conflict. More broadly, our results highlight the desirability of incorporating information about protein structure into phylogenomic analyses.

Author(s):  
Akanksha Pandey ◽  
Edward L. Braun

Despite the long history of using protein sequences to infer the tree of life the potential for different parts of protein structures to retain historical signal remains unclear. We propose that it might be possible to improve analyses of phylogenomic datasets by incorporating information about protein structure; we test this idea using the position of the root of Metazoa (animals) as a model system. We examined the distribution of “strongly decisive” sites (alignment positions that support a specific tree topology) in a dataset comprising >1,500 proteins and almost 100 taxa. The proportion of each class of strongly decisive sites in different structural environments was very sensitive to the model used to analyze the data when a limited number of taxa were used but they were stable when taxa were added. As long as enough taxa were analyzed, sites in all structural environments supported the same topology (ctenophores sister to other animals) regardless of whether standard tree searches or decisive sites were used to select the optimal tree. However, the use of decisive sites revealed a difference between the support for minority topologies for sites in different structural environments; buried sites and sites in sheet and coil environments exhibited equal support for the minority topologies whereas solvent exposed and helix sites had unequal numbers of sites supporting the minority topologies. Given the plausible trees equal support for minority topologies is consistent with discordance among gene trees, making it possible the relatively slowly evolving buried (and sheet and coil) sites are giving an accurate picture of the true species tree as well as the amount of conflict among gene trees. Alternatively, the apparent support could reflect currently uncharacterized processes of molecular evolution. Regardless, it is clear that analyses of the deepest branches in the animal tree of life using sites in different structural environments are associated with a subtle data type effect that results in distinct phylogenetic signals.


2019 ◽  
Author(s):  
Matthew H. Van Dam ◽  
James B. Henderson ◽  
Lauren Esposito ◽  
Michelle Trautwein

ABSTRACTUltraconserved genomic elements (UCEs), are generally treated as independent loci in phylogenetic analyses. The identification pipeline for UCE probes is agnostic to genetic identity, only selecting loci that are highly conserved, single copy, without repeats, and of a particular length. Here we characterized UCEs from 12 phylogenomic studies across the animal tree of life, from birds to marine invertebrates. We found that within vertebrate lineages, UCEs are mostly intronic and intergenic, while in invertebrates, the majority are in exons. We then curated 4 different sets of UCE markers by genomic category from 5 different studies including; birds, mammals, fish, Hymenoptera (ants, wasps and bees) and Coleoptera (beetles). Of genes captured by UCEs, we find that many are represented by 2 or more UCEs, corresponding to non-overlapping segments of a single gene. We considered these UCEs to be non-independent, merged all UCEs that belonged to a particular gene, constructed gene and species trees, and then evaluated the subsequent effect of merging co-genic UCEs on gene and species tree reconstruction. Average bootstrap support for merged UCE gene trees were significantly improved across all datasets. Increased loci length appears to drive this increase in bootstrap support. Additionally, we found that gene trees generated from merged UCEs were more accurate than those generated by unmerged and randomly merged UCEs, based on our simulation study. This modest degree of UCE characterization and curation impacts downstream analyses and demonstrates the advantages of incorporating basic genomic characterizations into phylogenomic analyses.


1997 ◽  
Vol 61 (4) ◽  
pp. 456-502
Author(s):  
J R Brown ◽  
W F Doolittle

Since the late 1970s, determining the phylogenetic relationships among the contemporary domains of life, the Archaea (archaebacteria), Bacteria (eubacteria), and Eucarya (eukaryotes), has been central to the study of early cellular evolution. The two salient issues surrounding the universal tree of life are whether all three domains are monophyletic (i.e., all equivalent in taxanomic rank) and where the root of the universal tree lies. Evaluation of the status of the Archaea has become key to answering these questions. This review considers our cumulative knowledge about the Archaea in relationship to the Bacteria and Eucarya. Particular attention is paid to the recent use of molecular phylogenetic approaches to reconstructing the tree of life. In this regard, the phylogenetic analyses of more than 60 proteins are reviewed and presented in the context of their participation in major biochemical pathways. Although many gene trees are incongruent, the majority do suggest a sisterhood between Archaea and Eucarya. Altering this general pattern of gene evolution are two kinds of potential interdomain gene transferrals. One horizontal gene exchange might have involved the gram-positive Bacteria and the Archaea, while the other might have occurred between proteobacteria and eukaryotes and might have been mediated by endosymbiosis.


Horticulturae ◽  
2021 ◽  
Vol 7 (9) ◽  
pp. 301
Author(s):  
Guanglong Hu ◽  
Yiheng Wang ◽  
Yan Wang ◽  
Shuqi Zheng ◽  
Wenxuan Dong ◽  
...  

Hawthorns (Crataegus L.) are one of the most important processing and table fruits in China, due to their medicinal properties and health benefits. However, the interspecific relationships and evolution history of cultivated Crataegus in China remain unclear. Our previously published data showed C. bretschneideri may be derived from the hybridization of C. pinnatifida with C. maximowiczii, and that introgression occurs between C. hupehensis, C. pinnatifida, and C. pinnatifida var. major. In the present study, chloroplast sequences were used to further elucidate the phylogenetic relationships of cultivated Crataegus native to China. The chloroplast genomes of three cultivated species and one related species of Crataegus were sequenced for comparative and phylogenetic analyses. The four chloroplast genomes of Crataegus exhibited typical quadripartite structures and ranged from 159,607 bp (C. bretschneideri) to 159,875 bp (C. maximowiczii) in length. The plastomes of the four species contained 113 genes consisting of 79 protein-coding genes, 30 tRNA genes, and 4 rRNA genes. Six hypervariable regions (ndhC-trnV(UAC)-trnM(CAU), ndhA, atpH-atpI, ndhF, trnR(UCU)-atpA, and ndhF-rpl32), 196 repeats, and a total of 386 simple sequence repeats were detected as potential variability makers for species identification and population genetic studies. In the phylogenomic analyses, we also compared the entire chloroplast genomes of three published Crataegus species: C. hupehensis (MW201730.1), C. pinnatifida (MN102356.1), and C. marshallii (MK920293.1). Our phylogenetic analyses grouped the seven Crataegus taxa into two main clusters. One cluster included C. bretschneideri, C. maximowiczii, and C. marshallii, whereas the other included C. hupehensis, C. pinnatifida, and C. pinnatifida var. major. Taken together, our findings indicate that C. maximowiczii is the maternal origin of C. bretschneideri. This work provides further evidence of introgression between C. hupehensis, C. pinnatifida, and C. pinnatifida var. major, and suggests that C. pinnatifida var. major might have been artificially selected and domesticated from hybrid populations, rather than evolved from C. pinnatifida.


2006 ◽  
Vol 17 (3) ◽  
Author(s):  
Andreas Düring ◽  
Martina Brückner ◽  
Dietrich Mossakowski

Phylogenetic analyses of Chrysocarabus taxa using different markers result in different phylogenetic trees. In particular, the mitochondrial gene tree contradicts the results of morphological and inbreeding studies. Two very different haplotypes of Carabus splendens Olivier, 1790 do not form a clade within this phylogenetic tree. We have earlier proposed that contradictory results are due to introgression. To verify our hypothesis, we analysed the internal transcribed spacer 2. No substitutions were observed in these nuclear sequences between the individuals of Carabus splendens, which contain the different mitochondrial haplotypes in question. The differences in the gene trees based on mitochondrial and nuclear sequences can be explained with at least two introgression events.


2017 ◽  
Author(s):  
Matthew W. Brown ◽  
Aaron Heiss ◽  
Ryoma Kamikawa ◽  
Yuji Inagaki ◽  
Akinori Yabuki ◽  
...  

AbstractRecent phylogenetic analyses position certain ‘orphan’ protist lineages deep in the tree of eukaryotic life, but their exact placements are poorly resolved. We conducted phylogenomic analyses that incorporate deeply sequenced transcriptomes from representatives of collodictyonids (diphylleids), rigifilids, Mantamonas and ancyromonads (planomonads). Analyses of 351 genes, using site-heterogeneous mixture models, strongly support a novel supergroup-level clade that includes collodictyonids, rigifilids and Mantamonas, which we name ‘CRuMs’. Further, they robustly place CRuMs as the closest branch to Amorphea (including animals and fungi). Ancyromonads are strongly inferred to be more distantly related to Amorphea than are CRuMs. They emerge either as sister to malawimonads, or as a separate deeper branch. CRuMs and ancyromonads represent two distinct major groups that branch deeply on the lineage that includes animals, near the most commonly inferred root of the eukaryote tree. This makes both groups crucial in examinations of the deepest-level history of extant eukaryotes.


2020 ◽  
Author(s):  
Paul D. Blischak ◽  
Coleen E. Thompson ◽  
Emiko M. Waight ◽  
Laura S. Kubatko ◽  
Andrea D. Wolfe

AbstractReticulate evolutionary events are hallmarks of plant phylogeny, and are increasingly recognized as common occurrences in other branches of the Tree of Life. However, inferring the evolutionary history of admixed lineages presents a difficult challenge for systematists due to genealogical discordance caused by both incomplete lineage sorting (ILS) and hybridization. Methods that accommodate both of these processes are continuing to be developed, but they often do not scale well to larger numbers of species. An additional complicating factor for many plant species is the occurrence of whole genome duplication (WGD), which can have various outcomes on the genealogical history of haplotypes sampled from the genome. In this study, we sought to investigate patterns of hybridization and WGD in two subsections from the genus Penstemon (Plantaginaceae; subsect. Humiles and Proceri), a speciose group of angiosperms that has rapidly radiated across North America. Species in subsect. Humiles and Proceri occur primarily in the Pacific Northwest of the United States, occupying habitats such as mesic, subalpine meadows, as well as more well-drained substrates at varying elevations. Ploidy levels in the subsections range from diploid to hexaploid, and it is hypothesized that most of the polyploids are hybrids (i.e., allopolyploids). To estimate phylogeny in these groups, we first developed a method for estimating quartet concordance factors (QCFs) from multiple sequences sampled per lineage, allowing us to model all haplotypes from a polyploid. QCFs represent the proportion of gene trees that support a particular species quartet relationship, and are used for species network estimation in the program SNaQ (Solís-Lemus & Ané. 2016. PLoS Genet. 12:e1005896). Using phased haplotypes for nuclear amplicons, we inferred species trees and networks for 38 taxa from P. subsect. Humiles and Proceri. Our phylogenetic analyses recovered two clades comprising a mix of taxa from both subsections, indicating that the current taxonomy for these groups is inconsistent with our estimates of phylogeny. In addition, there was little support for hypotheses regarding the formation of putative allopolyploid lineages. Overall, we found evidence for the effects of both ILS and admixture on the evolutionary history of these species, but were able to evaluate our taxonomic hypotheses despite high levels of gene tree discordance. Our method for estimating QCFs from multiple haplotypes also allowed us to include species of varying ploidy levels in our analyses, which we anticipate will help to facilitate estimation of species networks in other plant groups as well.


2021 ◽  
Vol 8 ◽  
Author(s):  
Nicola Bordin ◽  
Ian Sillitoe ◽  
Jonathan G. Lees ◽  
Christine Orengo

This article is dedicated to the memory of Cyrus Chothia, who was a leading light in the world of protein structure evolution. His elegant analyses of protein families and their mechanisms of structural and functional evolution provided important evolutionary and biological insights and firmly established the value of structural perspectives. He was a mentor and supervisor to many other leading scientists who continued his quest to characterise structure and function space. He was also a generous and supportive colleague to those applying different approaches. In this article we review some of his accomplishments and the history of protein structure classifications, particularly SCOP and CATH. We also highlight some of the evolutionary insights these two classifications have brought. Finally, we discuss how the expansion and integration of protein sequence data into these structural families helps reveal the dark matter of function space and can inform the emergence of novel functions in Metazoa. Since we cover 25 years of structural classification, it has not been feasible to review all structure based evolutionary studies and hence we focus mainly on those undertaken by the SCOP and CATH groups and their collaborators.


1970 ◽  
Vol 19 (2) ◽  
pp. 217-226
Author(s):  
S. M. Minhaz Ud-Dean ◽  
Mahdi Muhammad Moosa

Protein structure prediction and evaluation is one of the major fields of computational biology. Estimation of dihedral angle can provide information about the acceptability of both theoretically predicted and experimentally determined structures. Here we report on the sequence specific dihedral angle distribution of high resolution protein structures available in PDB and have developed Sasichandran, a tool for sequence specific dihedral angle prediction and structure evaluation. This tool will allow evaluation of a protein structure in pdb format from the sequence specific distribution of Ramachandran angles. Additionally, it will allow retrieval of the most probable Ramachandran angles for a given sequence along with the sequence specific data. Key words: Torsion angle, φ-ψ distribution, sequence specific ramachandran plot, Ramasekharan, protein structure appraisal D.O.I. 10.3329/ptcb.v19i2.5439 Plant Tissue Cult. & Biotech. 19(2): 217-226, 2009 (December)


2020 ◽  
Vol 15 (7) ◽  
pp. 732-740
Author(s):  
Neetu Kumari ◽  
Anshul Verma

Background: The basic building block of a body is protein which is a complex system whose structure plays a key role in activation, catalysis, messaging and disease states. Therefore, careful investigation of protein structure is necessary for the diagnosis of diseases and for the drug designing. Protein structures are described at their different levels of complexity: primary (chain), secondary (helical), tertiary (3D), and quaternary structure. Analyzing complex 3D structure of protein is a difficult task but it can be analyzed as a network of interconnection between its component, where amino acids are considered as nodes and interconnection between them are edges. Objective: Many literature works have proven that the small world network concept provides many new opportunities to investigate network of biological systems. The objective of this paper is analyzing the protein structure using small world concept. Methods: Protein is analyzed using small world network concept, specifically where extreme condition is having a degree distribution which follows power law. For the correct verification of the proposed approach, dataset of the Oncogene protein structure is analyzed using Python programming. Results: Protein structure is plotted as network of amino acids (Residue Interaction Graph (RIG)) using distance matrix of nodes with given threshold, then various centrality measures (i.e., degree distribution, Degree-Betweenness correlation, and Betweenness-Closeness correlation) are calculated for 1323 nodes and graphs are plotted. Conclusion: Ultimately, it is concluded that there exist hubs with higher centrality degree but less in number, and they are expected to be robust toward harmful effects of mutations with new functions.


Sign in / Sign up

Export Citation Format

Share Document