scholarly journals Disentangling incomplete lineage sorting and introgression to refine species-tree estimates for Lake Tanganyika cichlid fishes

2016 ◽  
Author(s):  
Britta S Meyer ◽  
Michael Matschiner ◽  
Walter Salzburger

Adaptive radiation is thought to be responsible for the evolution of a great portion of the past and present diversity of life. Instances of adaptive radiation, characterized by the rapid emergence of an array of species as a consequence to their adaptation to distinct ecological niches, are important study systems in evolutionary biology. However, because of the rapid lineage formation in these groups, and the occurrence of hybridization between the participating species, it is often difficult to reconstruct the phylogenetic history of species that underwent an adaptive radiation. In this study, we present a novel approach for species-tree estimation in rapidly diversifying lineages, where introgression is known to occur, and apply it to a multimarker dataset containing up to 16 specimens per species for a set of 45 species of East African cichlid fishes (522 individuals in total), with a main focus on the cichlid species flock of Lake Tanganyika. We first identified, using age distributions of most recent common ancestors in individual gene trees, those lineages in our dataset that show strong signatures of past introgression. This led us to formulate three hypotheses of introgression between different lineages of Tanganyika cichlids: the ancestor of Boulengerochromini (or of Boulengerochromini and Bathybatini) received genomic material from the derived H-lineage; the common ancestor of Cyprichromini and Perissodini experienced, in turn, introgression from Boulengerochromini and/or Bathybatini; and the Lake Tanganyika Haplochromini and closely related riverine lineages received genetic material from Cyphotilapiini. We then applied the multispecies coalescent model to estimate the species tree of Lake Tanganyika cichlids, but excluded the lineages involved in these introgression events, as the multispecies coalescent model does not incorporate introgression. This resulted in a robust species tree, in which the Lamprologini were placed as sister lineage to the H-lineage (including the Eretmodini), and we identify a series of rapid splitting events at the base of the H-lineage. Divergence ages estimated with the multispecies coalescent model were substantially younger than age estimates based on concatenation, and agree with the geological history of the Great Lakes of East Africa. Finally, we formally tested the three hypotheses of introgression using a likelihood framework, and find strong support for introgression between some of the cichlid tribes of Lake Tanganyika.




Author(s):  
John A Rhodes ◽  
Hector Baños ◽  
Jonathan D Mitchell ◽  
Elizabeth S Allman

Abstract Summary MSCquartets is an R package for species tree hypothesis testing, inference of species trees, and inference of species networks under the Multispecies Coalescent model of incomplete lineage sorting and its network analog. Input for these analyses are collections of metric or topological locus trees which are then summarized by the quartets displayed on them. Results of hypothesis tests at user-supplied levels are displayed in a simplex plot by color-coded points. The package implements the QDC and WQDC algorithms for topological and metric species tree inference, and the NANUQ algorithm for level-1 topological species network inference, all of which give statistically consistent estimators under the model. Availability MSCquartets is available through the Comprehensive R Archive Network: https://CRAN.R-project.org/package=MSCquartets. Supplementary information Supplementary materials, including example data and analyses, are incorporated into the package.



2015 ◽  
Vol 61 (5) ◽  
pp. 854-865 ◽  
Author(s):  
Ziheng Yang

Abstract This paper provides an overview and a tutorial of the BPP program, which is a Bayesian MCMC program for analyzing multi-locus genomic sequence data under the multispecies coalescent model. An example dataset of five nuclear loci from the East Asian brown frogs is used to illustrate four different analyses, including estimation of species divergence times and population size parameters under the multispecies coalescent model on a fixed species phylogeny (A00), species tree estimation when the assignment and species delimitation are fixed (A01), species delimitation using a fixed guide tree (A10), and joint species delimitation and species-tree estimation or unguided species delimitation (A11). For the joint analysis (A11), two new priors are introduced, which assign uniform probabilities for the different numbers of delimited species, which may be useful when assignment, species delimitation, and species phylogeny are all inferred in one joint analysis. The paper ends with a discussion of the assumptions, the strengths and weaknesses of the BPP analysis.



2017 ◽  
Vol 80 (1) ◽  
pp. 64-103 ◽  
Author(s):  
Elizabeth S. Allman ◽  
James H. Degnan ◽  
John A. Rhodes


2015 ◽  
Vol 2 (3) ◽  
pp. 140498 ◽  
Author(s):  
Britta S. Meyer ◽  
Adrian Indermaur ◽  
Xenia Ehrensperger ◽  
Bernd Egger ◽  
Gaspard Banyankimbona ◽  
...  

The species flocks of cichlid fishes in the East African Great Lakes are the largest vertebrate adaptive radiations in the world and illustrious textbook examples of convergent evolution between independent species assemblages. Although recent studies suggest some degrees of genetic exchange between riverine taxa and the lake faunas, not a single cichlid species is known from Lakes Tanganyika, Malawi and Victoria that is derived from the radiation associated with another of these lakes. Here, we report the discovery of a haplochromine cichlid species in Lake Tanganyika, which belongs genetically to the species flock of haplochromines of the Lake Victoria region. The new species colonized Lake Tanganyika only recently, suggesting that faunal exchange across watersheds and, hence, between isolated ichthyofaunas, is more common than previously thought.



2019 ◽  
Author(s):  
Matthew Wascher ◽  
Laura Kubatko

AbtractNumerous methods for inferring species-level phylogenies under the coalescent model have been proposed within the last 20 years, and debates continue about the relative strengths and weaknesses of these methods. One desirable property of a phylogenetic estimator is that of statistical consistency, which means intuitively that as more data are collected, the probability that the estimated tree has the same topology as the true tree goes to 1. To date, consistency results for species tree inference under the multispecies coalescent have been derived only for summary statistics methods, such as ASTRAL and MP-EST. These methods have been found to be consistent given true gene trees, but may be inconsistent when gene trees are estimated from data for loci of finite length (Roch et al., 2019). Here we consider the question of statistical consistency for four taxa for SVDQuartets for general data types, as well as for the maximum likelihood (ML) method in the case in which the data are a collection of sites generated under the multispecies coalescent model such that the sites are conditionally independent given the species tree (we call these data Coalescent Independent Sites (CIS) data). We show that SVDQuartets is statistically consistent for all data types (i.e., for both CIS data and for multilocus data), and we derive its rate of convergence. We additionally show that ML is consistent for CIS data under the JC69 model, and discuss why a proof for the more general multilocus case is difficult. Finally, we compare the performance of maximum likelihood and SDVQuartets using simulation for both data types.



Genetics ◽  
2016 ◽  
Vol 204 (4) ◽  
pp. 1353-1368 ◽  
Author(s):  
Bo Xu ◽  
Ziheng Yang


2020 ◽  
Vol 37 (11) ◽  
pp. 3211-3224
Author(s):  
Jun Huang ◽  
Tomáš Flouri ◽  
Ziheng Yang

Abstract We use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.



Sign in / Sign up

Export Citation Format

Share Document