Effectiveness of phylogenomic data and coalescent species-tree methods for resolving difficult nodes in the phylogeny of advanced snakes (Serpentes: Caenophidia)

Impact of Ghost Introgression on Coalescent-based Species Tree Inference and Estimation of Divergence Time

10.1101/2022.01.11.475787 ◽

2022 ◽

Author(s):

XiaoXu Pang ◽

Da-Yong Zhang

Keyword(s):

Incomplete Lineage Sorting ◽

Divergence Time ◽

Species Tree ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

Multispecies Coalescent ◽

Tree Inference ◽

Tree Methods ◽

The Impact

The species studied in any evolutionary investigation generally constitute a very small proportion of all the species currently existing or that have gone extinct. It is therefore likely that introgression, which is widespread across the tree of life, involves "ghosts," i.e., unsampled, unknown, or extinct lineages. However, the impact of ghost introgression on estimations of species trees has been rarely studied and is thus poorly understood. In this study, we use mathematical analysis and simulations to examine the robustness of species tree methods based on a multispecies coalescent model under gene flow sourcing from an extant or ghost lineage. We found that very low levels of extant or ghost introgression can result in anomalous gene trees (AGTs) on three-taxon rooted trees if accompanied by strong incomplete lineage sorting (ILS). In contrast, even massive introgression, with more than half of the recipient genome descending from the donor lineage, may not necessarily lead to AGTs. In cases involving an ingroup lineage (defined as one that diverged no earlier than the most basal species under investigation) acting as the donor of introgression, the time of root divergence among the investigated species was either underestimated or remained unaffected, but for the cases of outgroup ghost lineages acting as donors, the divergence time was generally overestimated. Under many conditions of ingroup introgression, the stronger the ILS was, the higher was the accuracy of estimating the time of root divergence, although the topology of the species tree is more prone to be biased by the effect of introgression.

Download Full-text

Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses?

Systematic Biology ◽

10.1093/sysbio/syaa064 ◽

2020 ◽

Cited By ~ 1

Author(s):

Daniel M Portik ◽

John J Wiens

Keyword(s):

Missing Data ◽

Molecular Phylogenetics ◽

Species Tree ◽

Sequence Length ◽

Data Sets ◽

Full Data ◽

Tree Methods ◽

Phylogenomic Analyses ◽

Alignment Errors ◽

The Impact

Abstract Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from hundreds or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e., removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these data sets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data ($\sim $5000 loci) and subsampled data sets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic data sets (e.g., length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several “best practices” for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the phylogenetic results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses. [Alignment; concatenated analysis; phylogenomics; sequence length heterogeneity; species-tree analysis; trimming]

Download Full-text

Molecular Systematics of the Deep-Sea Hydrothermal Vent Endemic Brachyuran Family Bythograeidae: A Comparison of Three Bayesian Species Tree Methods

PLoS ONE ◽

10.1371/journal.pone.0032066 ◽

2012 ◽

Vol 7 (3) ◽

pp. e32066 ◽

Cited By ~ 19

Author(s):

Mariana Mateos ◽

Luis A. Hurtado ◽

Carlos A. Santamaria ◽

Vincent Leignel ◽

Danièle Guinot

Keyword(s):

Molecular Systematics ◽

Deep Sea ◽

Hydrothermal Vent ◽

Species Tree ◽

Tree Methods

Download Full-text

Phylogenetic conflicts, combinability, and deep phylogenomics in plants

10.1101/371930 ◽

2018 ◽

Cited By ~ 1

Author(s):

Stephen A. Smith ◽

Nathanael Walker-Hale ◽

Joseph F. Walker ◽

Joseph W. Brown

Keyword(s):

Phylogenetic Relationships ◽

Phylogenetic Signal ◽

Gene Tree ◽

Species Tree ◽

Gene Trees ◽

Data Filtering ◽

Tree Inference ◽

Tree Methods ◽

Inference Methods ◽

Species Tree Inference

AbstractStudies have demonstrated that pervasive gene tree conflict underlies several important phylogenetic relationships where different species tree methods produce conflicting results. Here, we present a means of dissecting the phylogenetic signal for alternative resolutions within a dataset in order to resolve recalcitrant relationships and, importantly, identify what the dataset is unable to resolve. These procedures extend upon methods for isolating conflict and concordance involving specific candidate relationships and can be used to identify systematic error and disambiguate sources of conflict among species tree inference methods. We demonstrate these on a large phylogenomic plant dataset. Our results support the placement of Amborella as sister to the remaining extant angiosperms, Gnetales as sister to pines, and the monophyly of extant gymnosperms. Several other contentious relationships, including the resolution of relationships within the bryophytes and the eudicots, remain uncertain given the low number of supporting gene trees. To address whether concatenation of filtered genes amplified phylogenetic signal for relationships, we implemented a combinatorial heuristic to test combinability of genes. We found that nested conflicts limited the ability of data filtering methods to fully ameliorate conflicting signal amongst gene trees. These analyses confirmed that the underlying conflicting signal does not support broad concatenation of genes. Our approach provides a means of dissecting a specific dataset to address deep phylogenetic relationships while also identifying the inferential boundaries of the dataset.

Download Full-text

Partitioned coalescence support reveals biases in species-tree methods and detects gene trees that determine phylogenomic conflicts

Molecular Phylogenetics and Evolution ◽

10.1016/j.ympev.2019.106539 ◽

2019 ◽

Vol 139 ◽

pp. 106539 ◽

Cited By ~ 10

Author(s):

John Gatesy ◽

Daniel B. Sloan ◽

Jessica M. Warren ◽

Richard H. Baker ◽

Mark P. Simmons ◽

...

Keyword(s):

Species Tree ◽

Gene Trees ◽

Tree Methods

Download Full-text

A time-calibrated, multi-locus phylogeny of piranhas and pacus (Characiformes: Serrasalmidae) and a comparison of species tree methods

Molecular Phylogenetics and Evolution ◽

10.1016/j.ympev.2014.06.018 ◽

2014 ◽

Vol 81 ◽

pp. 242-257 ◽

Cited By ~ 38

Author(s):

Andrew W. Thompson ◽

Ricardo Betancur-R. ◽

Hernán López-Fernández ◽

Guillermo Ortí

Keyword(s):

Species Tree ◽

Tree Methods

Download Full-text

Complexity of the simplest species tree problem

Molecular Biology and Evolution ◽

10.1093/molbev/msab009 ◽

2021 ◽

Author(s):

Tianqi Zhu ◽

Ziheng Yang

Keyword(s):

Maximum Likelihood ◽

Estimation Error ◽

Gene Tree ◽

Species Tree ◽

Ancestral Population ◽

Statistical Efficiency ◽

Multispecies Coalescent ◽

Full Likelihood ◽

Tree Methods ◽

Tree Estimation

Abstract The multispecies coalescent (MSC) model provides a natural framework for species tree estimation accounting for gene-tree conflicts. While a number of species tree methods under the MSC have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood (ISML) and maximum likelihood (ML). We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case major differences exist among the methods. Fulllikelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes while these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.

Download Full-text