scholarly journals Relative model selection of evolutionary substitution models can be sensitive to multiple sequence alignment uncertainty

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Stephanie J. Spielman ◽  
Molly L. Miraglia

Abstract Background Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. Results We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. Conclusions We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.

2021 ◽  
Author(s):  
Stephanie J Spielman ◽  
Molly Miraglia

Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. In this study, we assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.


2009 ◽  
Vol 58 (1) ◽  
pp. 130-145 ◽  
Author(s):  
Alan R. Lemmon ◽  
Jeremy M. Brown ◽  
Kathrin Stanger-Hall ◽  
Emily Moriarty Lemmon

Abstract Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. We use 4-taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian frameworks. By introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. We find that in both ML and Bayesian frameworks, among-site rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. Furthermore, within a Bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, resulting in strongly misleading bipartition posterior probabilities. The magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis.


2019 ◽  
Vol 187 (2) ◽  
pp. 378-412 ◽  
Author(s):  
Fabiana Criste Massariol ◽  
Daniela Maeda Takiya ◽  
Frederico Falcão Salles

AbstractOligoneuriidae is a Pantropical family of Ephemeroptera, with 68 species described in 12 genera. Three subfamilies are recognized: Chromarcyinae, with a single species from East Asia; Colocrurinae, with two fossil species from Brazil; and Oligoneuriinae, with the remaining species distributed in the Neotropical, Nearctic, Afrotropical and Palaearctic regions. Phylogenetic and biogeographical analyses were performed for the family based on 2762 characters [73 morphological and 2689 molecular (COI, 16S, 18S and 28S)]. Four major groups were recovered in all analyses (parsimony, maximum likelihood and Bayesian inference), and they were assigned to tribal level, namely Oligoneuriini, Homoeoneuriini trib. nov., Oligoneuriellini trib. nov. and Elassoneuriini trib. nov. In addition, Yawari and Madeconeuria were elevated to genus level. According to Statistical Dispersal-Vicariance (S-DIVA), Dispersal Extinction Cladogenesis (DEC) and divergence time estimation analyses, Oligoneuriidae originated ~150 Mya in the Gondwanan supercontinent, but was probably restricted to the currently delimited Neotropical region. The initial divergence of Oligoneuriidae involved a range expansion to Oriental and Afrotropical areas, sometime between 150 and 118 Mya. At ~118 Mya, the family started its diversification, reaching the Nearctic through dispersal from the Neotropical region and the Palaearctic and Madagascar from the Afrotropical region.


2022 ◽  
Vol 9 ◽  
Author(s):  
Jordan R Brock ◽  
Terezie Mandáková ◽  
Michael McKain ◽  
Martin A Lysak ◽  
Kenneth M Olsen

Abstract The genus Camelina (Brassicaceae) comprises 7–8 diploid, tetraploid, and hexaploid species. Of particular agricultural interest is the biofuel crop, C. sativa (gold-of-pleasure or false flax), an allohexaploid domesticated from the widespread weed, C. microcarpa. Recent cytogenetics and genomics work has uncovered the identity of the parental diploid species involved in ancient polyploidization events in Camelina. However, little is known about the maternal subgenome ancestry of contemporary polyploid species. To determine the diploid maternal contributors of polyploid Camelina lineages, we sequenced and assembled 84 Camelina chloroplast genomes for phylogenetic analysis. Divergence time estimation was used to infer the timing of polyploidization events. Chromosome counts were also determined for 82 individuals to assess ploidy and cytotypic variation. Chloroplast genomes showed minimal divergence across the genus, with no observed gene-loss or structural variation. Phylogenetic analyses revealed C. hispida as a maternal diploid parent to the allotetraploid Camelina rumelica, and C. neglecta as the closest extant diploid contributor to the allohexaploids C. microcarpa and C. sativa. The tetraploid C. rumelica appears to have evolved through multiple independent hybridization events. Divergence times for polyploid lineages closely related to C. sativa were all inferred to be very recent, at only ~65 thousand years ago. Chromosome counts confirm that there are two distinct cytotypes within C. microcarpa (2n = 38 and 2n = 40). Based on these findings and other recent research, we propose a model of Camelina subgenome relationships representing our current understanding of the hybridization and polyploidization history of this recently-diverged genus.


2020 ◽  
Vol 41 (1) ◽  
pp. 87-103 ◽  
Author(s):  
Ivan Prates ◽  
Paulo Roberto Melo-Sampaio ◽  
Kevin de Queiroz ◽  
Ana Carolina Carnaval ◽  
Miguel Trefaut Rodrigues ◽  
...  

Abstract Recent biological discoveries have changed our understanding of the distribution and evolution of neotropical biotas. In the Brazilian Atlantic Forest, the discovery of closely related species isolated on distant mountains has led to the hypothesis that the ancestors of montane species occupied and dispersed through lowland regions during colder periods. This process may explain the distribution of an undescribed Anolis lizard species that we recently discovered at a montane site in the Serra dos Órgãos National Park, a popular tourist destination close to the city of Rio de Janeiro. To investigate whether this species is closely related to other Atlantic Forest montane anoles, we implement phylogenetic analyses and divergence time estimation based on molecular data. We infer the new species nested within the Dactyloa clade of Anolis, forming a clade with A. nasofrontalis and A. pseudotigrinus, two species restricted to montane sites about 400 km northeast of Serra dos Órgãos. The new species diverged from its sister A. nasofrontalis around 5.24 mya, suggesting a cold-adapted lowland ancestor during the early Pliocene. Based on the phylogenetic results, we emend the definitions of the series taxa within Dactyloa, recognizing a clade containing the new species and several of its relatives as the nasofrontalis series. Lastly, we provide morphological data supporting the recognition of the new species and give it a formal scientific name. Future studies are necessary to assess how park visitors, pollutants, and shrinking montane habitats due to climate change will affect this previously overlooked anole species.


2018 ◽  
Vol 19 (1) ◽  
pp. 303-310 ◽  
Author(s):  
FITRA ARYA DWI NUGRAHA ◽  
FATCHIYAH FATCHIYAH ◽  
NIA KURNIAWAN ◽  
ERIC NELSON SMITH

Nugraha FAD, Fatchiyah F, Smith EN, Nia Kurniawan N. 2018. Phylogenetic analysis of colubrid snakes based on 12S rDNA reveals distinct lineages of Dendrelaphis pictus (Gmelin, 1789) populations in Sumatra and Java. Biodiversitas 19: 303-310. The phylogenetic relationship among the major colubrid snakes, particularly those of the subfamily Colubrinae, has been the subject of much debate. Also, there was limited data on the molecular relationships of Sundaland colubrid snakes. This study aimed to examine the relationships among colubrid snakes from Sumatra and Java based on fragments of 12S rDNA gene. We sequenced 17 specimens of colubrid snakes representing 5 genera and 2 subfamilies: Colubrinae and Ahaetullinae. We used maximum likelihood, maximum parsimony and Bayesian inference methods for inferring phylogenetic relationships. The result of our phylogenetic analyses is in line with the previous findings for the separation between Colubrinae and Ahaetullinae. Interestingly, we found two distinct clades of Dendrelaphis pictus species with the high genetic divergence between them where D. pictus from Sumatra and West Java separated from Central and East Java clade. Our divergence time estimation showed that the differentiation between these clades of D. pictus occurred in the late Miocene epoch (8.9 Ma) when Sumatra and Java separated after being inundated in the early Miocene epoch.


Zootaxa ◽  
2009 ◽  
Vol 2216 (1) ◽  
pp. 22-36 ◽  
Author(s):  
JESSICA L. WARE ◽  
JOHN P. SIMAIKA ◽  
MICHAEL J. SAMWAYS

Syncordulia (Odonata: Anisoptera: Libelluloidea) inhabits mostly cool mountainous streams in the Cape Floristic Region of South Africa. It is found at low densities in geographically restricted areas. Syncordulia is endemic to South Africa and, until recently, only two species were known, S. venator (Barnard, 1933) and S. gracilis (Burmeister 1839), both considered Vulnerable by the World Conservation Union (IUCN). Two new species, S. serendipator Dijsktra, Samways & Simaika 2007 and S. legator Dijsktra, Samways & Simaika 2007, were described from previously unrecognized museum specimens and new field collections. Here we corroborate the validity of these two new species using multiple genes and propose intergeneric relationships within Syncordulia. Molecular data from two independent gene fragments (nuclear 28S and ribosomal and cytochrome oxidase subunit I mitochondrial data) were sequenced and/or downloaded from GenBank for 7 libelluloid families, including 12 Syncordulia specimens (2 Syncordulia gracilis, 4 S. serendipator, 2 S. legator and 4 S. venator). The lower libelluloid group GSI (sensu Ware et al. 2007), a diverse group of non– corduliine taxa, is strongly supported as monophyletic. Syncordulia is well supported by both methods of phylogenetic analyses as a monophyletic group deeply nested within the GSI clade. A DIVA biogeographical analysis suggests that the ancestor to the genus Syncordulia may have arisen consequent to the break–up of Gondwana (>120 Mya). Divergence time estimates suggest that Syncordulia diverged well after the breakup of Gondwana, approximately 60 million years ago (Mya), which coincides with the divergence of several Cape fynbos taxa, between 86 – 60 Mya. DIVA analyses suggest that the present distributions of Syncordulia may be the result of dispersal events. We relate these phylogenetic data to the historical biogeography of the genus and to the importance of conservation action.


2004 ◽  
Vol 359 (1450) ◽  
pp. 1477-1483 ◽  
Author(s):  
Thomas J. Near ◽  
Michael J. Sanderson

Estimates of species divergence times using DNA sequence data are playing an increasingly important role in studies of evolution, ecology and biogeography. Most work has centred on obtaining appropriate kinds of data and developing optimal estimation procedures, whereas somewhat less attention has focused on the calibration of divergences using fossils. Case studies with multiple fossil calibration points provide important opportunities to examine the divergence time estimation problem in new ways. We discuss two cross–validation procedures that address different aspects of inference in divergence time estimation. ‘Fossil cross–validation’ is a procedure used to identify the impact of different individual calibrations on overall estimation. This can identify fossils that have an exceptionally large error effect and may warrant further scrutiny. ‘Fossil–based model cross–validation’ is an entirely different procedure that uses fossils to identify the optimal model of molecular evolution in the context of rate smoothing or other inference methods. Both procedures were applied to two recent studies: an analysis of monocot angiosperms with eight fossil calibrations and an analysis of placental mammals with nine fossil calibrations. In each case, fossil calibrations could be ranked from most to least influential, and in one of the two studies, the fossils provided decisive evidence about the optimal molecular evolutionary model.


2021 ◽  
Author(s):  
Xiurong Ke ◽  
Diego F Morales-Briones ◽  
Hongxin Wang ◽  
Qinghui Sun ◽  
Landis B Jacob ◽  
...  

Disjunctive distribution patterns and drivers of the Sino-Japanese flora in East Asia have attracted much attention in the past decades, which is also served as an important glacial sanctuary during the quaternary glacial period. However, few studies have focused on the phylogeography, diversification and evolution of morphological character at the genus level with both nuclear and plastid data. Diabelia (Caprifoliaceae) is an East Asian genus, with a disjunctive distribution across China, Japan and Korea, serving as an ideal group to explore the mechanism of East Asian flora speciation and diversification. However, the phylogenetic relationships among Diabelia remain elusive and species delimitation within the genus (three species or four species) are still controversial. In this study, we reconstructed the phylogenetic relationships among Diabelia based on nuclear and cpDNA by using target enrichment and genome skimming approaches, respectively. We found that the main clades within Diabelia were discordant between nuclear and plastid genome trees. Both nuclear and plastid phylogenetic analyses supported five main clades: D. serrata, D. tetrasepala, D. spathulata var. sanguinea, D. spathulata var. stenophylla and D. spathulata var. spathulata. Diabelia tetrasepala was inferred to be the result of a hybridization event from the species network analyses. The result of divergence time estimation and ancestral area reconstruction showed that Diabelia originated in Japan during the early Miocene, with subsequent gene flow between China, Japan and Korea. Overall, our results support the division of Diabelia into five main clades and this research provides new insights for the speciation process and taxonomy within Diabelia.


2021 ◽  
Author(s):  
Li luo ◽  
Pan Huang ◽  
Bin Chen ◽  
Tingjing Li

Abstract Background Social wasps Polistes, Ropalidia, and Parapolybia, belonging to the subfamily Polistinae, have obviously different distribution patterns, yet the factors leading to this difference remain unknown. Results In this study, mitochondrial genomes (mitogenomes) of 21 species of these three wasp genera were used to phylogenetic analyses, including 17 newly sequenced ones. It is revealed that both evolutionary selection pressure of protein-coding genes (PCGs) and gene rearrangement events are related to the corresponding distribution patterns. In addition, our fossil-calibrated divergence time estimation suggests the diversification of Polistes was in the Late Cretaceous (~ 69 million years ago, Ma), and that of Ropalidia and Parapolybia occurred in the Tertiary (~ 61 Ma). In view of the divergence time and the history of continental drifts, we speculate that Polistes may spread from Africa to South America via the Atlantic Ocean rather than from Asia to South America. On the other hand, combining divergence time and climate changes of both past and the present-day, it is inferred that Quaternary Ice Ages and temperature could be limitation factors in their present distribution patterns. Conclusions There are obvious differences in the mitochondrial composition of Polistes, Ropalidia, and Parapolybia with different distribution ranges. According to the reconstructed time-calibrated framework, we found that the climate and the continental drifts are diffusion limiting factors of the three genera.


Sign in / Sign up

Export Citation Format

Share Document