scholarly journals So many genes, so little time: a practical approach to divergence-time estimation in the genomic era

2017 ◽  
Author(s):  
Stephen A. Smith ◽  
Joseph W. Brown ◽  
Joseph F. Walker

AbstractPhylogenomic datasets have been successfully used to address questions involving evolutionary relationships, patterns of genome structure, signatures of selection, and gene and genome duplications. However, despite the recent explosion in genomic and transcriptomic data, the utility of these data sources for efficient divergence-time inference remains unexamined. Phylogenomic datasets pose two distinct problems for divergence-time estimation: (i) the volume of data makes inference of the entire dataset intractable, and (ii) the extent of underlying topological and rate heterogeneity across genes makes model mis-specification a real concern. “Gene shopping”, wherein a phylogenomic dataset is winnowed to a set of genes with desirable properties, represents an alternative approach that holds promise in alleviating these issues. We implemented an approach for phylogenomic datasets (available in SortaDate) that filters genes by three criteria: (i) clock-likeness, (ii) reasonable tree length (i.e., discernible information content), and (iii) least topological conflict with a focal species tree (presumed to have already been inferred). Such a winnowing procedure ensures that errors associated with model (both clock and topology) mis-specification are minimized, therefore reducing error in divergence-time estimation. We demonstrated the efficacy of this approach through simulation and applied it to published animal (Aves, Diplopoda, and Hymenoptera) and plant (carnivorous Caryophyllales, broad Caryophyllales, and Vitales) phylogenomic datasets. By quantifying rate heterogeneity across both genes and lineages we found that every empirical dataset examined included genes with clock-like, or nearly clock-like, behavior. Moreover, many datasets had genes that were clock-like, exhibited reasonable evolutionary rates, and were mostly compatible with the species tree. We identified overlap in age estimates when analyzing these filtered genes under strict clock and uncorrelated lognormal (UCLN) models. However, this overlap was often due to imprecise estimates from the UCLN model. We find that “gene shopping” can be an efficient approach to divergence-time inference for phylogenomic datasets that may otherwise be characterized by extensive gene tree heterogeneity.

2021 ◽  
Vol 12 ◽  
Author(s):  
Jeffrey P. Rose ◽  
Ricardo Kriebel ◽  
Larissa Kahan ◽  
Alexa DiNicola ◽  
Jesús G. González-Gallegos ◽  
...  

Next-generation sequencing technologies have facilitated new phylogenomic approaches to help clarify previously intractable relationships while simultaneously highlighting the pervasive nature of incongruence within and among genomes that can complicate definitive taxonomic conclusions. Salvia L., with ∼1,000 species, makes up nearly 15% of the species diversity in the mint family and has attracted great interest from biologists across subdisciplines. Despite the great progress that has been achieved in discerning the placement of Salvia within Lamiaceae and in clarifying its infrageneric relationships through plastid, nuclear ribosomal, and nuclear single-copy genes, the incomplete resolution has left open major questions regarding the phylogenetic relationships among and within the subgenera, as well as to what extent the infrageneric relationships differ across genomes. We expanded a previously published anchored hybrid enrichment dataset of 35 exemplars of Salvia to 179 terminals. We also reconstructed nearly complete plastomes for these samples from off-target reads. We used these data to examine the concordance and discordance among the nuclear loci and between the nuclear and plastid genomes in detail, elucidating both broad-scale and species-level relationships within Salvia. We found that despite the widespread gene tree discordance, nuclear phylogenies reconstructed using concatenated, coalescent, and network-based approaches recover a common backbone topology. Moreover, all subgenera, except for Audibertia, are strongly supported as monophyletic in all analyses. The plastome genealogy is largely resolved and is congruent with the nuclear backbone. However, multiple analyses suggest that incomplete lineage sorting does not fully explain the gene tree discordance. Instead, horizontal gene flow has been important in both the deep and more recent history of Salvia. Our results provide a robust species tree of Salvia across phylogenetic scales and genomes. Future comparative analyses in the genus will need to account for the impacts of hybridization/introgression and incomplete lineage sorting in topology and divergence time estimation.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Yan Du ◽  
Shaoyuan Wu ◽  
Scott V. Edwards ◽  
Liang Liu

Abstract Background The flood of genomic data to help build and date the tree of life requires automation at several critical junctures, most importantly during sequence assembly and alignment. It is widely appreciated that automated alignment protocols can yield inaccuracies, but the relative impact of various sources error on phylogenomic analysis is not yet known. This study employs an updated mammal data set of 5162 coding loci sampled from 90 species to evaluate the effects of alignment uncertainty, substitution models, and fossil priors on gene tree, species tree, and divergence time estimation. Additionally, a novel coalescent likelihood ratio test is introduced for comparing competing species trees against a given set of gene trees. Results The aligned DNA sequences of 5162 loci from 90 species were trimmed and filtered using trimAL and two filtering protocols. The final dataset contains 4 sets of alignments - before trimming, after trimming, filtered by a recently proposed pipeline, and further filtered by comparing ML gene trees for each locus with the concatenation tree. Our analyses suggest that the average discordance among the coalescent trees is significantly smaller than that among the concatenation trees estimated from the 4 sets of alignments or with different substitution models. There is no significant difference among the divergence times estimated with different substitution models. However, the divergence dates estimated from the alignments after trimming are more recent than those estimated from the alignments before trimming. Conclusions Our results highlight that alignment uncertainty of the updated mammal data set and the choice of substitution models have little impact on tree topologies yielded by coalescent methods for species tree estimation, whereas they are more influential on the trees made by concatenation. Given the choice of calibration scheme and clock models, divergence time estimates are robust to the choice of substitution models, but removing alignments deemed problematic by trimming algorithms can lead to more recent dates. Although the fossil prior is important in divergence time estimation, Bayesian estimates of divergence times in this data set are driven primarily by the sequence data.


2017 ◽  
Author(s):  
Madlen Stange ◽  
Marcelo R. Sánchez-Villagra ◽  
Walter Salzburger ◽  
Michael Matschiner

AbstractThe closure of the Isthmus of Panama has long been considered to be one of the best defined biogeographic calibration points for molecular divergence-time estimation. However, geological and biological evidence has recently cast doubt on the presumed timing of the initial isthmus closure around 3 Ma but has instead suggested the existence of temporary land bridges as early as the Middle or Late Miocene. The biological evidence supporting these earlier land bridges was based either on only few molecular markers or on concatenation of genome-wide sequence data, an approach that is known to result in potentially misleading branch lengths and divergence times, which could compromise the reliability of this evidence. To allow divergence-time estimation with genomic data using the more appropriate multi-species coalescent model, we here develop a new method combining the SNP-based Bayesian species-tree inference of the software SNAPP with a molecular clock model that can be calibrated with fossil or biogeographic constraints. We validate our approach with simulations and use our method to reanalyze genomic data of Neotropical army ants (Dorylinae) that previously supported divergence times of Central and South American populations before the isthmus closure around 3 Ma. Our reanalysis with the multi-species coalescent model shifts all of these divergence times to ages younger than 3 Ma, suggesting that the older estimates supporting the earlier existence of temporary land bridges were artifacts resulting at least partially from the use of concatenation. We then apply our method to a new RAD-sequencing data set of Neotropical sea catfishes (Ariidae) and calibrate their species tree with extensive information from the fossil record. We identify a series of divergences between groups of Caribbean and Pacific sea catfishes around 10 Ma, indicating that processes related to the emergence of the isthmus led to vicariant speciation already in the Late Miocene, millions of years before the final isthmus closure.


2021 ◽  
Author(s):  
Sebastian Hoehna ◽  
Sarah E Lower ◽  
Pablo Duchen ◽  
Ana Catalan

Fireflies (Coleoptera: Lampyridae) consist of over 2,000 described extant species. A well-resolved phylogeny of fireflies is important for the study of their bioluminescence, evolution, and conservation. We used a recently published anchored hybrid enrichment dataset (AHE; 436 loci for 88 Lampyridae species and 10 outgroup species) and state-of-the-art statistical methods (the fossilized birth-death-range process implemented in a Bayesian framework) to estimate a time-calibrated phylogeny of Lampyridae. Unfortunately, estimating calibrated phylogenies using AHE and the latest and most robust time-calibration strategies is not possible because of computational constraints. As a solution, we subset the full dataset and applied three different strategies: using the most complete loci, the most homogeneous loci, and the loci with the highest accuracy to infer the well established Photinus clade. The estimated topology using the three data subsets agreed on almost all major clades and only showed minor discordance with less supported nodes. The estimated divergence times overlapped for all nodes that are shared between the topologies. Thus, divergence time estimation is robust as long as the topology inference is robust and any well selected data subset suffices. Additionally, we observed an unexpected amount of gene tree discordance between the 436 AHE loci. Our assessment of model adequacy showed that standard phylogenetic substitution models are not adequate for any of the 436 AHE loci which is likely to bias phylogenetic inferences. We performed a simulation study to explore the impact of (a) incomplete lineage sorting, (b) uniformly distributed and systematic missing data, and (c) systematic bias in the position of highly variable and conserved sites. For our simulated data, we observed less gene tree variation and hence the empirically observed amount of gene tree discordance for the AHE dataset is unexpected.


AoB Plants ◽  
2021 ◽  
Author(s):  
Min-Jie Li ◽  
Huan-Xi Yu ◽  
Xian-Lin Guo ◽  
Xing-Jin He

Abstract The disjunctive distribution (Europe-Caucasus-Asia) and species diversification across Eurasia for the genus Allium sect. Daghestanica has fascinating attractions for researchers aiming to understanding the development and history of the modern Eurasia flora. However, no any studies have been carried out to address the evolutionary history of this section. Based on the nrITS and cpDNA fragments (trnL-trnF and rpl32-trnL), the evolutionary history of the third evolutionary line (EL3) of the genus Allium was reconstructed and we further elucidate the evolutionary line of sect. Daghestanica under this background. Our molecular phylogeny recovered two highly supported clades in sect. Daghestanica: the Clade I includes Caucasian-European species and Asian A. maowenense, A. xinlongense and A. carolinianum collected in Qinghai; the Clade II comprises Asian yellowish tepal species, A. chrysanthum, A. chrysocephalum, A. herderianum, A. rude and A. xichuanense. The divergence time estimation and biogeography inference indicated that Asian ancestor located in the QTP and the adjacent region could have migrated to Caucasus and Europe distributions around the Late Miocene and resulted in further divergence and speciation; Asian ancestor underwent the rapid radiation in the QTP and the adjacent region most likely due to the heterogeneous ecology of the QTP resulted from the orogeneses around 4–3 Mya. Our study provides a picture to understand the origin and species diversification across Eurasia for sect. Daghestanica.


Mycologia ◽  
2018 ◽  
Vol 110 (3) ◽  
pp. 526-545 ◽  
Author(s):  
Debora Cervieri Guterres ◽  
Samuel Galvão-Elias ◽  
Bruno Cézar Pereira de Souza ◽  
Danilo Batista Pinho ◽  
Maria do Desterro Mendes dos Santos ◽  
...  

PLoS ONE ◽  
2019 ◽  
Vol 14 (5) ◽  
pp. e0217959 ◽  
Author(s):  
Hussam Zaher ◽  
Robert W. Murphy ◽  
Juan Camilo Arredondo ◽  
Roberta Graboski ◽  
Paulo Roberto Machado-Filho ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document